Flink - watermark生成
参考,Flink - Generating Timestamps / Watermarks
watermark,只有在有window的情况下才用到,所以在window operator前加上assignTimestampsAndWatermarks即可
不一定需要从source发出
1. 首先,source可以发出watermark
我们就看看kafka source的实现
protected AbstractFetcher(
SourceContext<T> sourceContext,
List<KafkaTopicPartition> assignedPartitions,
SerializedValue<AssignerWithPeriodicWatermarks<T>> watermarksPeriodic, //在创建KafkaConsumer的时候assignTimestampsAndWatermarks
SerializedValue<AssignerWithPunctuatedWatermarks<T>> watermarksPunctuated,
ProcessingTimeService processingTimeProvider,
long autoWatermarkInterval, //env.getConfig().setAutoWatermarkInterval()
ClassLoader userCodeClassLoader,
boolean useMetrics) throws Exception
{
//判断watermark的类型
if (watermarksPeriodic == null) {
if (watermarksPunctuated == null) {
// simple case, no watermarks involved
timestampWatermarkMode = NO_TIMESTAMPS_WATERMARKS;
} else {
timestampWatermarkMode = PUNCTUATED_WATERMARKS;
}
} else {
if (watermarksPunctuated == null) {
timestampWatermarkMode = PERIODIC_WATERMARKS;
} else {
throw new IllegalArgumentException("Cannot have both periodic and punctuated watermarks");
}
} // create our partition state according to the timestamp/watermark mode
this.allPartitions = initializePartitions(
assignedPartitions,
timestampWatermarkMode,
watermarksPeriodic, watermarksPunctuated,
userCodeClassLoader); // if we have periodic watermarks, kick off the interval scheduler
if (timestampWatermarkMode == PERIODIC_WATERMARKS) { //如果是定期发出WaterMark
KafkaTopicPartitionStateWithPeriodicWatermarks<?, ?>[] parts =
(KafkaTopicPartitionStateWithPeriodicWatermarks<?, ?>[]) allPartitions; PeriodicWatermarkEmitter periodicEmitter=
new PeriodicWatermarkEmitter(parts, sourceContext, processingTimeProvider, autoWatermarkInterval);
periodicEmitter.start();
}
}
FlinkKafkaConsumerBase
public FlinkKafkaConsumerBase<T> assignTimestampsAndWatermarks(AssignerWithPeriodicWatermarks<T> assigner) {
checkNotNull(assigner); if (this.punctuatedWatermarkAssigner != null) {
throw new IllegalStateException("A punctuated watermark emitter has already been set.");
}
try {
ClosureCleaner.clean(assigner, true);
this.periodicWatermarkAssigner = new SerializedValue<>(assigner);
return this;
} catch (Exception e) {
throw new IllegalArgumentException("The given assigner is not serializable", e);
}
}
这个接口的核心函数,定义,如何提取Timestamp和生成Watermark的逻辑
public interface AssignerWithPeriodicWatermarks<T> extends TimestampAssigner<T> {
Watermark getCurrentWatermark();
}
public interface TimestampAssigner<T> extends Function {
long extractTimestamp(T element, long previousElementTimestamp);
}
如果在初始化KafkaConsumer的时候,没有assignTimestampsAndWatermarks,就不会产生watermark
可以看到watermark有两种,
PERIODIC_WATERMARKS,定期发送的watermark
PUNCTUATED_WATERMARKS,由element触发的watermark,比如有element的特征或某种类型的element来表示触发watermark,这样便于开发者来控制watermark
initializePartitions
case PERIODIC_WATERMARKS: {
@SuppressWarnings("unchecked")
KafkaTopicPartitionStateWithPeriodicWatermarks<T, KPH>[] partitions =
(KafkaTopicPartitionStateWithPeriodicWatermarks<T, KPH>[])
new KafkaTopicPartitionStateWithPeriodicWatermarks<?, ?>[assignedPartitions.size()]; int pos = 0;
for (KafkaTopicPartition partition : assignedPartitions) {
KPH kafkaHandle = createKafkaPartitionHandle(partition); AssignerWithPeriodicWatermarks<T> assignerInstance =
watermarksPeriodic.deserializeValue(userCodeClassLoader); partitions[pos++] = new KafkaTopicPartitionStateWithPeriodicWatermarks<>(
partition, kafkaHandle, assignerInstance);
} return partitions;
}
KafkaTopicPartitionStateWithPeriodicWatermarks
这个类里面最核心的函数,
public long getTimestampForRecord(T record, long kafkaEventTimestamp) {
return timestampsAndWatermarks.extractTimestamp(record, kafkaEventTimestamp);
} public long getCurrentWatermarkTimestamp() {
Watermark wm = timestampsAndWatermarks.getCurrentWatermark();
if (wm != null) {
partitionWatermark = Math.max(partitionWatermark, wm.getTimestamp());
}
return partitionWatermark;
}
可以看到是调用你定义的AssignerWithPeriodicWatermarks来实现
PeriodicWatermarkEmitter
private static class PeriodicWatermarkEmitter implements ProcessingTimeCallback { public void start() {
timerService.registerTimer(timerService.getCurrentProcessingTime() + interval, this); //start定时器,定时触发
} @Override
public void onProcessingTime(long timestamp) throws Exception { //触发逻辑 long minAcrossAll = Long.MAX_VALUE;
for (KafkaTopicPartitionStateWithPeriodicWatermarks<?, ?> state : allPartitions) { //对于每个partitions // we access the current watermark for the periodic assigners under the state
// lock, to prevent concurrent modification to any internal variables
final long curr;
//noinspection SynchronizationOnLocalVariableOrMethodParameter
synchronized (state) {
curr = state.getCurrentWatermarkTimestamp(); //取出当前partition的WaterMark
} minAcrossAll = Math.min(minAcrossAll, curr); //求min,以partition中最小的partition作为watermark
} // emit next watermark, if there is one
if (minAcrossAll > lastWatermarkTimestamp) {
lastWatermarkTimestamp = minAcrossAll;
emitter.emitWatermark(new Watermark(minAcrossAll)); //emit
} // schedule the next watermark
timerService.registerTimer(timerService.getCurrentProcessingTime() + interval, this); //重新设置timer
}
}
2. DataStream也可以设置定时发送Watermark
其实实现是加了个chain的TimestampsAndPeriodicWatermarksOperator
DataStream
/**
* Assigns timestamps to the elements in the data stream and periodically creates
* watermarks to signal event time progress.
*
* <p>This method creates watermarks periodically (for example every second), based
* on the watermarks indicated by the given watermark generator. Even when no new elements
* in the stream arrive, the given watermark generator will be periodically checked for
* new watermarks. The interval in which watermarks are generated is defined in
* {@link ExecutionConfig#setAutoWatermarkInterval(long)}.
*
* <p>Use this method for the common cases, where some characteristic over all elements
* should generate the watermarks, or where watermarks are simply trailing behind the
* wall clock time by a certain amount.
*
* <p>For the second case and when the watermarks are required to lag behind the maximum
* timestamp seen so far in the elements of the stream by a fixed amount of time, and this
* amount is known in advance, use the
* {@link BoundedOutOfOrdernessTimestampExtractor}.
*
* <p>For cases where watermarks should be created in an irregular fashion, for example
* based on certain markers that some element carry, use the
* {@link AssignerWithPunctuatedWatermarks}.
*
* @param timestampAndWatermarkAssigner The implementation of the timestamp assigner and
* watermark generator.
* @return The stream after the transformation, with assigned timestamps and watermarks.
*
* @see AssignerWithPeriodicWatermarks
* @see AssignerWithPunctuatedWatermarks
* @see #assignTimestampsAndWatermarks(AssignerWithPunctuatedWatermarks)
*/
public SingleOutputStreamOperator<T> assignTimestampsAndWatermarks(
AssignerWithPeriodicWatermarks<T> timestampAndWatermarkAssigner) { // match parallelism to input, otherwise dop=1 sources could lead to some strange
// behaviour: the watermark will creep along very slowly because the elements
// from the source go to each extraction operator round robin.
final int inputParallelism = getTransformation().getParallelism();
final AssignerWithPeriodicWatermarks<T> cleanedAssigner = clean(timestampAndWatermarkAssigner); TimestampsAndPeriodicWatermarksOperator<T> operator =
new TimestampsAndPeriodicWatermarksOperator<>(cleanedAssigner); return transform("Timestamps/Watermarks", getTransformation().getOutputType(), operator)
.setParallelism(inputParallelism);
}
TimestampsAndPeriodicWatermarksOperator
public class TimestampsAndPeriodicWatermarksOperator<T>
extends AbstractUdfStreamOperator<T, AssignerWithPeriodicWatermarks<T>>
implements OneInputStreamOperator<T, T>, Triggerable { private transient long watermarkInterval;
private transient long currentWatermark; public TimestampsAndPeriodicWatermarksOperator(AssignerWithPeriodicWatermarks<T> assigner) {
super(assigner); //AbstractUdfStreamOperator(F userFunction)
this.chainingStrategy = ChainingStrategy.ALWAYS; //一定是chain
} @Override
public void open() throws Exception {
super.open(); currentWatermark = Long.MIN_VALUE;
watermarkInterval = getExecutionConfig().getAutoWatermarkInterval(); if (watermarkInterval > 0) {
registerTimer(System.currentTimeMillis() + watermarkInterval, this); //注册到定时器
}
} @Override
public void processElement(StreamRecord<T> element) throws Exception {
final long newTimestamp = userFunction.extractTimestamp(element.getValue(), //由element中基于AssignerWithPeriodicWatermarks提取时间戳
element.hasTimestamp() ? element.getTimestamp() : Long.MIN_VALUE); output.collect(element.replace(element.getValue(), newTimestamp)); //更新element的时间戳,再次发出
} @Override
public void trigger(long timestamp) throws Exception { //定时器触发trigger
// register next timer
Watermark newWatermark = userFunction.getCurrentWatermark(); //取得watermark
if (newWatermark != null && newWatermark.getTimestamp() > currentWatermark) {
currentWatermark = newWatermark.getTimestamp();
// emit watermark
output.emitWatermark(newWatermark); //发出watermark
} registerTimer(System.currentTimeMillis() + watermarkInterval, this); //重新注册到定时器
} @Override
public void processWatermark(Watermark mark) throws Exception {
// if we receive a Long.MAX_VALUE watermark we forward it since it is used
// to signal the end of input and to not block watermark progress downstream
if (mark.getTimestamp() == Long.MAX_VALUE && currentWatermark != Long.MAX_VALUE) {
currentWatermark = Long.MAX_VALUE;
output.emitWatermark(mark); //forward watermark
}
}
可以看到在processElement会调用AssignerWithPeriodicWatermarks.extractTimestamp提取event time
然后更新StreamRecord的时间
然后在Window Operator中,
@Override
public void processElement(StreamRecord<IN> element) throws Exception {
final Collection<W> elementWindows = windowAssigner.assignWindows(
element.getValue(), element.getTimestamp(), windowAssignerContext);
会在windowAssigner.assignWindows时以element的timestamp作为assign时间
对于watermark的处理,参考,Flink – window operator
Flink - watermark生成的更多相关文章
- [源码分析] 从源码入手看 Flink Watermark 之传播过程
[源码分析] 从源码入手看 Flink Watermark 之传播过程 0x00 摘要 本文将通过源码分析,带领大家熟悉Flink Watermark 之传播过程,顺便也可以对Flink整体逻辑有一个 ...
- Flink Program Guide (4) -- 时间戳和Watermark生成(DataStream API编程指导 -- For Java)
时间戳和Watermark生成 本文翻译自Generating Timestamp / Watermarks --------------------------------------------- ...
- flink watermark介绍
转发请注明原创地址 http://www.cnblogs.com/dongxiao-yang/p/7610412.html 一 概念 watermark是flink为了处理eventTime窗口计算提 ...
- Flink assignAscendingTimestamps 生成水印的三个重载方法
先简单介绍一下Timestamp 和Watermark 的概念: 1. Timestamp和Watermark都是基于事件的时间字段生成的 2. Timestamp和Watermark是两个不同的东西 ...
- flink WaterMark之TumblingEventWindow
1.WaterMark,翻译成水印或水位线,水印翻译更抽象,水位线翻译接地气. watermark是用于处理乱序事件的,通常用watermark机制结合window来实现. 流处理从事件产生,到流经s ...
- 【源码解析】Flink 是如何基于事件时间生成Timestamp和Watermark
生成Timestamp和Watermark 的三个重载方法介绍可参见上一篇博客: Flink assignAscendingTimestamps 生成水印的三个重载方法 之前想研究下Flink是怎么处 ...
- Flink Program Guide (5) -- 预定义的Timestamp Extractor / Watermark Emitter (DataStream API编程指导 -- For Java)
本文翻译自Pre-defined Timestamp Extractors / Watermark Emitter ------------------------------------------ ...
- Flink的时间类型和watermark机制
一FlinkTime类型 有3类时间,分别是数据本身的产生时间.进入Flink系统的时间和被处理的时间,在Flink系统中的数据可以有三种时间属性: Event Time 是每条数据在其生产设备上发生 ...
- 老板让阿粉学习 flink 中的 Watermark,现在他出教程了
1 前言 在时间 Time 那一篇中,介绍了三种时间概念 Event.Ingestin 和 Process, 其中还简单介绍了乱序 Event Time 事件和它的解决方案 Watermark 水位线 ...
随机推荐
- mongodb浅析
设计特征: MongoDB 的设计目标是高性能.可扩展.易部署.易使用,存储数据非常方便.其主要功能特性如下. (1)面向集合存储,容易存储对象类型的数据.在MongoDB 中数据被分组存储在集合中, ...
- Socket网络编程--小小网盘程序(5)
各位好呀!这一小节应该就是这个小小网盘程序的最后一小节了,这一节将实现最后的三个功能,即列出用户在服务器中的文件列表,还有删除用户在服务器中的文件,最后的可以共享文件给好友. 列出用户在服务器中的文件 ...
- 用Python来玩微信跳一跳
微信2017年12月28日发布了新版本,在小程序里面有一个跳一跳小游戏,试着点一点玩了下.第二天刚好在一篇技术公众号中,看到有大神用Python代码计算出按压时间,向手机发送android adb命令 ...
- asp.net Request、Request.Form、Request.QueryString的区别(转)
Request.Form:获取以POST方式提交的数据. Request.QueryString:获取地址栏参数(以GET方式提交的数据). Request:包含以上两种方式(优先获取GET方式提交的 ...
- oracle11g重新安装oem
1.重新设置sys sysman DBSNMP密码 alter user dbsnmp identified by **: 2.select 'drop public synonym '|| syno ...
- C#获取文件版本信息
使用FileVersionInfo获取版本信息 FileVersionInfo info = FileVersionInfo.GetVersionInfo(Application.Current.St ...
- Hadoop:HDFS NameNode内存全景
原文转自:https://tech.meituan.com/namenode.html 感谢原作者 一.概述 从整个HDFS系统架构上看,NameNode是其中最重要.最复杂也是最容易出现问题的地方, ...
- Android qualcomm WCNSS_qcom_cfg.ini 参数介绍
本文介绍WCNSS_qcom_cfg.ini中常用参数的作用. wifi 日志等级 vosTraceEnableBAP=255 vosTraceEnableTL=255 vosTraceEnableW ...
- 【转】在xcode5中修改整个项目名
本文转载自:http://www.cnblogs.com/tbfirstone/p/3601541.html 总会遇到几个项目,在做到一半的时候被要求改项目名,网上找了下相关的资料,大多数是xcode ...
- Springboot学习笔记(五)-条件化注入
前言 将Bean交给spring托管很简单,根据功能在类上添加@Component,@Service,@Controller等等都行,如果是第三方类,也可以通过标有@Configuration的配置类 ...