Flink - watermark生成
参考,Flink - Generating Timestamps / Watermarks
watermark,只有在有window的情况下才用到,所以在window operator前加上assignTimestampsAndWatermarks即可
不一定需要从source发出
1. 首先,source可以发出watermark
我们就看看kafka source的实现
- protected AbstractFetcher(
- SourceContext<T> sourceContext,
- List<KafkaTopicPartition> assignedPartitions,
- SerializedValue<AssignerWithPeriodicWatermarks<T>> watermarksPeriodic, //在创建KafkaConsumer的时候assignTimestampsAndWatermarks
- SerializedValue<AssignerWithPunctuatedWatermarks<T>> watermarksPunctuated,
- ProcessingTimeService processingTimeProvider,
- long autoWatermarkInterval, //env.getConfig().setAutoWatermarkInterval()
- ClassLoader userCodeClassLoader,
- boolean useMetrics) throws Exception
- {
- //判断watermark的类型
- if (watermarksPeriodic == null) {
- if (watermarksPunctuated == null) {
- // simple case, no watermarks involved
- timestampWatermarkMode = NO_TIMESTAMPS_WATERMARKS;
- } else {
- timestampWatermarkMode = PUNCTUATED_WATERMARKS;
- }
- } else {
- if (watermarksPunctuated == null) {
- timestampWatermarkMode = PERIODIC_WATERMARKS;
- } else {
- throw new IllegalArgumentException("Cannot have both periodic and punctuated watermarks");
- }
- }
- // create our partition state according to the timestamp/watermark mode
- this.allPartitions = initializePartitions(
- assignedPartitions,
- timestampWatermarkMode,
- watermarksPeriodic, watermarksPunctuated,
- userCodeClassLoader);
- // if we have periodic watermarks, kick off the interval scheduler
- if (timestampWatermarkMode == PERIODIC_WATERMARKS) { //如果是定期发出WaterMark
- KafkaTopicPartitionStateWithPeriodicWatermarks<?, ?>[] parts =
- (KafkaTopicPartitionStateWithPeriodicWatermarks<?, ?>[]) allPartitions;
- PeriodicWatermarkEmitter periodicEmitter=
- new PeriodicWatermarkEmitter(parts, sourceContext, processingTimeProvider, autoWatermarkInterval);
- periodicEmitter.start();
- }
- }
FlinkKafkaConsumerBase
- public FlinkKafkaConsumerBase<T> assignTimestampsAndWatermarks(AssignerWithPeriodicWatermarks<T> assigner) {
- checkNotNull(assigner);
- if (this.punctuatedWatermarkAssigner != null) {
- throw new IllegalStateException("A punctuated watermark emitter has already been set.");
- }
- try {
- ClosureCleaner.clean(assigner, true);
- this.periodicWatermarkAssigner = new SerializedValue<>(assigner);
- return this;
- } catch (Exception e) {
- throw new IllegalArgumentException("The given assigner is not serializable", e);
- }
- }
这个接口的核心函数,定义,如何提取Timestamp和生成Watermark的逻辑
- public interface AssignerWithPeriodicWatermarks<T> extends TimestampAssigner<T> {
- Watermark getCurrentWatermark();
- }
- public interface TimestampAssigner<T> extends Function {
- long extractTimestamp(T element, long previousElementTimestamp);
- }
如果在初始化KafkaConsumer的时候,没有assignTimestampsAndWatermarks,就不会产生watermark
可以看到watermark有两种,
PERIODIC_WATERMARKS,定期发送的watermark
PUNCTUATED_WATERMARKS,由element触发的watermark,比如有element的特征或某种类型的element来表示触发watermark,这样便于开发者来控制watermark
initializePartitions
- case PERIODIC_WATERMARKS: {
- @SuppressWarnings("unchecked")
- KafkaTopicPartitionStateWithPeriodicWatermarks<T, KPH>[] partitions =
- (KafkaTopicPartitionStateWithPeriodicWatermarks<T, KPH>[])
- new KafkaTopicPartitionStateWithPeriodicWatermarks<?, ?>[assignedPartitions.size()];
- int pos = 0;
- for (KafkaTopicPartition partition : assignedPartitions) {
- KPH kafkaHandle = createKafkaPartitionHandle(partition);
- AssignerWithPeriodicWatermarks<T> assignerInstance =
- watermarksPeriodic.deserializeValue(userCodeClassLoader);
- partitions[pos++] = new KafkaTopicPartitionStateWithPeriodicWatermarks<>(
- partition, kafkaHandle, assignerInstance);
- }
- return partitions;
- }
KafkaTopicPartitionStateWithPeriodicWatermarks
这个类里面最核心的函数,
- public long getTimestampForRecord(T record, long kafkaEventTimestamp) {
- return timestampsAndWatermarks.extractTimestamp(record, kafkaEventTimestamp);
- }
- public long getCurrentWatermarkTimestamp() {
- Watermark wm = timestampsAndWatermarks.getCurrentWatermark();
- if (wm != null) {
- partitionWatermark = Math.max(partitionWatermark, wm.getTimestamp());
- }
- return partitionWatermark;
- }
可以看到是调用你定义的AssignerWithPeriodicWatermarks来实现
PeriodicWatermarkEmitter
- private static class PeriodicWatermarkEmitter implements ProcessingTimeCallback {
- public void start() {
- timerService.registerTimer(timerService.getCurrentProcessingTime() + interval, this); //start定时器,定时触发
- }
- @Override
- public void onProcessingTime(long timestamp) throws Exception { //触发逻辑
- long minAcrossAll = Long.MAX_VALUE;
- for (KafkaTopicPartitionStateWithPeriodicWatermarks<?, ?> state : allPartitions) { //对于每个partitions
- // we access the current watermark for the periodic assigners under the state
- // lock, to prevent concurrent modification to any internal variables
- final long curr;
- //noinspection SynchronizationOnLocalVariableOrMethodParameter
- synchronized (state) {
- curr = state.getCurrentWatermarkTimestamp(); //取出当前partition的WaterMark
- }
- minAcrossAll = Math.min(minAcrossAll, curr); //求min,以partition中最小的partition作为watermark
- }
- // emit next watermark, if there is one
- if (minAcrossAll > lastWatermarkTimestamp) {
- lastWatermarkTimestamp = minAcrossAll;
- emitter.emitWatermark(new Watermark(minAcrossAll)); //emit
- }
- // schedule the next watermark
- timerService.registerTimer(timerService.getCurrentProcessingTime() + interval, this); //重新设置timer
- }
- }
2. DataStream也可以设置定时发送Watermark
其实实现是加了个chain的TimestampsAndPeriodicWatermarksOperator
DataStream
- /**
- * Assigns timestamps to the elements in the data stream and periodically creates
- * watermarks to signal event time progress.
- *
- * <p>This method creates watermarks periodically (for example every second), based
- * on the watermarks indicated by the given watermark generator. Even when no new elements
- * in the stream arrive, the given watermark generator will be periodically checked for
- * new watermarks. The interval in which watermarks are generated is defined in
- * {@link ExecutionConfig#setAutoWatermarkInterval(long)}.
- *
- * <p>Use this method for the common cases, where some characteristic over all elements
- * should generate the watermarks, or where watermarks are simply trailing behind the
- * wall clock time by a certain amount.
- *
- * <p>For the second case and when the watermarks are required to lag behind the maximum
- * timestamp seen so far in the elements of the stream by a fixed amount of time, and this
- * amount is known in advance, use the
- * {@link BoundedOutOfOrdernessTimestampExtractor}.
- *
- * <p>For cases where watermarks should be created in an irregular fashion, for example
- * based on certain markers that some element carry, use the
- * {@link AssignerWithPunctuatedWatermarks}.
- *
- * @param timestampAndWatermarkAssigner The implementation of the timestamp assigner and
- * watermark generator.
- * @return The stream after the transformation, with assigned timestamps and watermarks.
- *
- * @see AssignerWithPeriodicWatermarks
- * @see AssignerWithPunctuatedWatermarks
- * @see #assignTimestampsAndWatermarks(AssignerWithPunctuatedWatermarks)
- */
- public SingleOutputStreamOperator<T> assignTimestampsAndWatermarks(
- AssignerWithPeriodicWatermarks<T> timestampAndWatermarkAssigner) {
- // match parallelism to input, otherwise dop=1 sources could lead to some strange
- // behaviour: the watermark will creep along very slowly because the elements
- // from the source go to each extraction operator round robin.
- final int inputParallelism = getTransformation().getParallelism();
- final AssignerWithPeriodicWatermarks<T> cleanedAssigner = clean(timestampAndWatermarkAssigner);
- TimestampsAndPeriodicWatermarksOperator<T> operator =
- new TimestampsAndPeriodicWatermarksOperator<>(cleanedAssigner);
- return transform("Timestamps/Watermarks", getTransformation().getOutputType(), operator)
- .setParallelism(inputParallelism);
- }
TimestampsAndPeriodicWatermarksOperator
- public class TimestampsAndPeriodicWatermarksOperator<T>
- extends AbstractUdfStreamOperator<T, AssignerWithPeriodicWatermarks<T>>
- implements OneInputStreamOperator<T, T>, Triggerable {
- private transient long watermarkInterval;
- private transient long currentWatermark;
- public TimestampsAndPeriodicWatermarksOperator(AssignerWithPeriodicWatermarks<T> assigner) {
- super(assigner); //AbstractUdfStreamOperator(F userFunction)
- this.chainingStrategy = ChainingStrategy.ALWAYS; //一定是chain
- }
- @Override
- public void open() throws Exception {
- super.open();
- currentWatermark = Long.MIN_VALUE;
- watermarkInterval = getExecutionConfig().getAutoWatermarkInterval();
- if (watermarkInterval > 0) {
- registerTimer(System.currentTimeMillis() + watermarkInterval, this); //注册到定时器
- }
- }
- @Override
- public void processElement(StreamRecord<T> element) throws Exception {
- final long newTimestamp = userFunction.extractTimestamp(element.getValue(), //由element中基于AssignerWithPeriodicWatermarks提取时间戳
- element.hasTimestamp() ? element.getTimestamp() : Long.MIN_VALUE);
- output.collect(element.replace(element.getValue(), newTimestamp)); //更新element的时间戳,再次发出
- }
- @Override
- public void trigger(long timestamp) throws Exception { //定时器触发trigger
- // register next timer
- Watermark newWatermark = userFunction.getCurrentWatermark(); //取得watermark
- if (newWatermark != null && newWatermark.getTimestamp() > currentWatermark) {
- currentWatermark = newWatermark.getTimestamp();
- // emit watermark
- output.emitWatermark(newWatermark); //发出watermark
- }
- registerTimer(System.currentTimeMillis() + watermarkInterval, this); //重新注册到定时器
- }
- @Override
- public void processWatermark(Watermark mark) throws Exception {
- // if we receive a Long.MAX_VALUE watermark we forward it since it is used
- // to signal the end of input and to not block watermark progress downstream
- if (mark.getTimestamp() == Long.MAX_VALUE && currentWatermark != Long.MAX_VALUE) {
- currentWatermark = Long.MAX_VALUE;
- output.emitWatermark(mark); //forward watermark
- }
- }
可以看到在processElement会调用AssignerWithPeriodicWatermarks.extractTimestamp提取event time
然后更新StreamRecord的时间
然后在Window Operator中,
- @Override
- public void processElement(StreamRecord<IN> element) throws Exception {
- final Collection<W> elementWindows = windowAssigner.assignWindows(
- element.getValue(), element.getTimestamp(), windowAssignerContext);
会在windowAssigner.assignWindows时以element的timestamp作为assign时间
对于watermark的处理,参考,Flink – window operator
Flink - watermark生成的更多相关文章
- [源码分析] 从源码入手看 Flink Watermark 之传播过程
[源码分析] 从源码入手看 Flink Watermark 之传播过程 0x00 摘要 本文将通过源码分析,带领大家熟悉Flink Watermark 之传播过程,顺便也可以对Flink整体逻辑有一个 ...
- Flink Program Guide (4) -- 时间戳和Watermark生成(DataStream API编程指导 -- For Java)
时间戳和Watermark生成 本文翻译自Generating Timestamp / Watermarks --------------------------------------------- ...
- flink watermark介绍
转发请注明原创地址 http://www.cnblogs.com/dongxiao-yang/p/7610412.html 一 概念 watermark是flink为了处理eventTime窗口计算提 ...
- Flink assignAscendingTimestamps 生成水印的三个重载方法
先简单介绍一下Timestamp 和Watermark 的概念: 1. Timestamp和Watermark都是基于事件的时间字段生成的 2. Timestamp和Watermark是两个不同的东西 ...
- flink WaterMark之TumblingEventWindow
1.WaterMark,翻译成水印或水位线,水印翻译更抽象,水位线翻译接地气. watermark是用于处理乱序事件的,通常用watermark机制结合window来实现. 流处理从事件产生,到流经s ...
- 【源码解析】Flink 是如何基于事件时间生成Timestamp和Watermark
生成Timestamp和Watermark 的三个重载方法介绍可参见上一篇博客: Flink assignAscendingTimestamps 生成水印的三个重载方法 之前想研究下Flink是怎么处 ...
- Flink Program Guide (5) -- 预定义的Timestamp Extractor / Watermark Emitter (DataStream API编程指导 -- For Java)
本文翻译自Pre-defined Timestamp Extractors / Watermark Emitter ------------------------------------------ ...
- Flink的时间类型和watermark机制
一FlinkTime类型 有3类时间,分别是数据本身的产生时间.进入Flink系统的时间和被处理的时间,在Flink系统中的数据可以有三种时间属性: Event Time 是每条数据在其生产设备上发生 ...
- 老板让阿粉学习 flink 中的 Watermark,现在他出教程了
1 前言 在时间 Time 那一篇中,介绍了三种时间概念 Event.Ingestin 和 Process, 其中还简单介绍了乱序 Event Time 事件和它的解决方案 Watermark 水位线 ...
随机推荐
- Mac下软件包管理器-homebrew
类似于redhat系统的yum,ubuntu的apt-get,mac系统下也有相应的包管理容器:homebrew.用法与apt-get.yum大同小异,都是对安装软件做一些安装删除类的命令行操作,以下 ...
- Node入门教程(10)第八章:Node 的事件处理
Node中大量运用了事件回调,所以Node对事件做了单独的封装.所有能触发事件的对象都是 EventEmitter 类的实例,所以上一篇我们提到的文件操作的可读流.可写流等都是继承了 EventEmi ...
- wamp多站点多端口配置
1.配置httpd.conf 监听多个端口 #Listen 12.34.56.78:80 Listen 8081 Listen 8082 Listen 8083 可以通过netstat -n -a查看 ...
- .Net Reactor 单个dll或exe文件的保护
.Net Reactor配置如下: 点一下“Protect”能执行成功,就说明配置没问题.然后保存配置文件,在vs插件上就可以直接读取使用了. vs插件配置
- Visual自动添加CSS兼容前缀
安装方法 打开vs code 的 扩展 ---> 搜索 Autoprefixer,并安装. 使用方法 打开css文件,按F1,选择 Autoprefix CSS 这条命令 没执行命令之前: 执行 ...
- Android中Sqlite数据库多线程并发问题
最近在做一个Android项目, 为了改善用户体验,把原先必须让用户“等待”的过程改成在新线程中异步执行.但是这样做遇到了多个线程同时需要写Sqlite数据库,导致操作数据库失败. 本人对Java并不 ...
- mysql的text字段长度?mysql数据库中text字段长度不够的问题
类型是可变长度的字符串,最多65535个字符: 可以把字段类型改成MEDIUMTEXT(最多存放16777215个字符)或者LONGTEXT(最多存放4294967295个字符). MySQL ...
- No suitable servers found (`serverselectiontryonce` set): [Failed connecting to '115.28.161.44:27017': Connection timed out] php mongodb 异常
我 php mongodb 拓展使用的是 MongoDB driver 今天查询数据的时候 偶尔会提示, No suitable servers found (`serverselectiontry ...
- IDEA 2017 安装后 关联SVN
IDEA 2017 安装后,SVN checkout时候会出现如下错误: Cannot run program "svn" (in directory "D:\demo\ ...
- Math.ceil()、Math.floor()和Math.round()
下面来介绍将小数值舍入为整数的几个方法:Math.ceil().Math.floor()和Math.round(). 这三个方法分别遵循下列舍入规则: Math.ceil()执行向上舍入,即它总是将数 ...