参考,Flink - Generating Timestamps / Watermarks

watermark,只有在有window的情况下才用到,所以在window operator前加上assignTimestampsAndWatermarks即可

不一定需要从source发出

1. 首先,source可以发出watermark

我们就看看kafka source的实现

  1. protected AbstractFetcher(
  2. SourceContext<T> sourceContext,
  3. List<KafkaTopicPartition> assignedPartitions,
  4. SerializedValue<AssignerWithPeriodicWatermarks<T>> watermarksPeriodic, //在创建KafkaConsumer的时候assignTimestampsAndWatermarks
  5. SerializedValue<AssignerWithPunctuatedWatermarks<T>> watermarksPunctuated,
  6. ProcessingTimeService processingTimeProvider,
  7. long autoWatermarkInterval, //env.getConfig().setAutoWatermarkInterval()
  8. ClassLoader userCodeClassLoader,
  9. boolean useMetrics) throws Exception
  10. {
  11. //判断watermark的类型
  12. if (watermarksPeriodic == null) {
  13. if (watermarksPunctuated == null) {
  14. // simple case, no watermarks involved
  15. timestampWatermarkMode = NO_TIMESTAMPS_WATERMARKS;
  16. } else {
  17. timestampWatermarkMode = PUNCTUATED_WATERMARKS;
  18. }
  19. } else {
  20. if (watermarksPunctuated == null) {
  21. timestampWatermarkMode = PERIODIC_WATERMARKS;
  22. } else {
  23. throw new IllegalArgumentException("Cannot have both periodic and punctuated watermarks");
  24. }
  25. }
  26.  
  27. // create our partition state according to the timestamp/watermark mode
  28. this.allPartitions = initializePartitions(
  29. assignedPartitions,
  30. timestampWatermarkMode,
  31. watermarksPeriodic, watermarksPunctuated,
  32. userCodeClassLoader);
  33.  
  34. // if we have periodic watermarks, kick off the interval scheduler
  35. if (timestampWatermarkMode == PERIODIC_WATERMARKS) { //如果是定期发出WaterMark
  36. KafkaTopicPartitionStateWithPeriodicWatermarks<?, ?>[] parts =
  37. (KafkaTopicPartitionStateWithPeriodicWatermarks<?, ?>[]) allPartitions;
  38.  
  39. PeriodicWatermarkEmitter periodicEmitter=
  40. new PeriodicWatermarkEmitter(parts, sourceContext, processingTimeProvider, autoWatermarkInterval);
  41. periodicEmitter.start();
  42. }
  43. }

FlinkKafkaConsumerBase

  1. public FlinkKafkaConsumerBase<T> assignTimestampsAndWatermarks(AssignerWithPeriodicWatermarks<T> assigner) {
  2. checkNotNull(assigner);
  3.  
  4. if (this.punctuatedWatermarkAssigner != null) {
  5. throw new IllegalStateException("A punctuated watermark emitter has already been set.");
  6. }
  7. try {
  8. ClosureCleaner.clean(assigner, true);
  9. this.periodicWatermarkAssigner = new SerializedValue<>(assigner);
  10. return this;
  11. } catch (Exception e) {
  12. throw new IllegalArgumentException("The given assigner is not serializable", e);
  13. }
  14. }

这个接口的核心函数,定义,如何提取Timestamp和生成Watermark的逻辑

  1. public interface AssignerWithPeriodicWatermarks<T> extends TimestampAssigner<T> {
  2. Watermark getCurrentWatermark();
  3. }
  1. public interface TimestampAssigner<T> extends Function {
  2. long extractTimestamp(T element, long previousElementTimestamp);
  3. }

如果在初始化KafkaConsumer的时候,没有assignTimestampsAndWatermarks,就不会产生watermark

可以看到watermark有两种,

PERIODIC_WATERMARKS,定期发送的watermark

PUNCTUATED_WATERMARKS,由element触发的watermark,比如有element的特征或某种类型的element来表示触发watermark,这样便于开发者来控制watermark

initializePartitions

  1. case PERIODIC_WATERMARKS: {
  2. @SuppressWarnings("unchecked")
  3. KafkaTopicPartitionStateWithPeriodicWatermarks<T, KPH>[] partitions =
  4. (KafkaTopicPartitionStateWithPeriodicWatermarks<T, KPH>[])
  5. new KafkaTopicPartitionStateWithPeriodicWatermarks<?, ?>[assignedPartitions.size()];
  6.  
  7. int pos = 0;
  8. for (KafkaTopicPartition partition : assignedPartitions) {
  9. KPH kafkaHandle = createKafkaPartitionHandle(partition);
  10.  
  11. AssignerWithPeriodicWatermarks<T> assignerInstance =
  12. watermarksPeriodic.deserializeValue(userCodeClassLoader);
  13.  
  14. partitions[pos++] = new KafkaTopicPartitionStateWithPeriodicWatermarks<>(
  15. partition, kafkaHandle, assignerInstance);
  16. }
  17.  
  18. return partitions;
  19. }

KafkaTopicPartitionStateWithPeriodicWatermarks

这个类里面最核心的函数,

  1. public long getTimestampForRecord(T record, long kafkaEventTimestamp) {
  2. return timestampsAndWatermarks.extractTimestamp(record, kafkaEventTimestamp);
  3. }
  4.  
  5. public long getCurrentWatermarkTimestamp() {
  6. Watermark wm = timestampsAndWatermarks.getCurrentWatermark();
  7. if (wm != null) {
  8. partitionWatermark = Math.max(partitionWatermark, wm.getTimestamp());
  9. }
  10. return partitionWatermark;
  11. }

可以看到是调用你定义的AssignerWithPeriodicWatermarks来实现

PeriodicWatermarkEmitter

  1. private static class PeriodicWatermarkEmitter implements ProcessingTimeCallback {
  2.  
  3. public void start() {
  4. timerService.registerTimer(timerService.getCurrentProcessingTime() + interval, this); //start定时器,定时触发
  5. }
  6.  
  7. @Override
  8. public void onProcessingTime(long timestamp) throws Exception { //触发逻辑
  9.  
  10. long minAcrossAll = Long.MAX_VALUE;
  11. for (KafkaTopicPartitionStateWithPeriodicWatermarks<?, ?> state : allPartitions) { //对于每个partitions
  12.  
  13. // we access the current watermark for the periodic assigners under the state
  14. // lock, to prevent concurrent modification to any internal variables
  15. final long curr;
  16. //noinspection SynchronizationOnLocalVariableOrMethodParameter
  17. synchronized (state) {
  18. curr = state.getCurrentWatermarkTimestamp(); //取出当前partition的WaterMark
  19. }
  20.  
  21. minAcrossAll = Math.min(minAcrossAll, curr); //求min,以partition中最小的partition作为watermark
  22. }
  23.  
  24. // emit next watermark, if there is one
  25. if (minAcrossAll > lastWatermarkTimestamp) {
  26. lastWatermarkTimestamp = minAcrossAll;
  27. emitter.emitWatermark(new Watermark(minAcrossAll)); //emit
  28. }
  29.  
  30. // schedule the next watermark
  31. timerService.registerTimer(timerService.getCurrentProcessingTime() + interval, this); //重新设置timer
  32. }
  33. }

2. DataStream也可以设置定时发送Watermark

其实实现是加了个chain的TimestampsAndPeriodicWatermarksOperator

DataStream

  1. /**
  2. * Assigns timestamps to the elements in the data stream and periodically creates
  3. * watermarks to signal event time progress.
  4. *
  5. * <p>This method creates watermarks periodically (for example every second), based
  6. * on the watermarks indicated by the given watermark generator. Even when no new elements
  7. * in the stream arrive, the given watermark generator will be periodically checked for
  8. * new watermarks. The interval in which watermarks are generated is defined in
  9. * {@link ExecutionConfig#setAutoWatermarkInterval(long)}.
  10. *
  11. * <p>Use this method for the common cases, where some characteristic over all elements
  12. * should generate the watermarks, or where watermarks are simply trailing behind the
  13. * wall clock time by a certain amount.
  14. *
  15. * <p>For the second case and when the watermarks are required to lag behind the maximum
  16. * timestamp seen so far in the elements of the stream by a fixed amount of time, and this
  17. * amount is known in advance, use the
  18. * {@link BoundedOutOfOrdernessTimestampExtractor}.
  19. *
  20. * <p>For cases where watermarks should be created in an irregular fashion, for example
  21. * based on certain markers that some element carry, use the
  22. * {@link AssignerWithPunctuatedWatermarks}.
  23. *
  24. * @param timestampAndWatermarkAssigner The implementation of the timestamp assigner and
  25. * watermark generator.
  26. * @return The stream after the transformation, with assigned timestamps and watermarks.
  27. *
  28. * @see AssignerWithPeriodicWatermarks
  29. * @see AssignerWithPunctuatedWatermarks
  30. * @see #assignTimestampsAndWatermarks(AssignerWithPunctuatedWatermarks)
  31. */
  32. public SingleOutputStreamOperator<T> assignTimestampsAndWatermarks(
  33. AssignerWithPeriodicWatermarks<T> timestampAndWatermarkAssigner) {
  34.  
  35. // match parallelism to input, otherwise dop=1 sources could lead to some strange
  36. // behaviour: the watermark will creep along very slowly because the elements
  37. // from the source go to each extraction operator round robin.
  38. final int inputParallelism = getTransformation().getParallelism();
  39. final AssignerWithPeriodicWatermarks<T> cleanedAssigner = clean(timestampAndWatermarkAssigner);
  40.  
  41. TimestampsAndPeriodicWatermarksOperator<T> operator =
  42. new TimestampsAndPeriodicWatermarksOperator<>(cleanedAssigner);
  43.  
  44. return transform("Timestamps/Watermarks", getTransformation().getOutputType(), operator)
  45. .setParallelism(inputParallelism);
  46. }

TimestampsAndPeriodicWatermarksOperator

  1. public class TimestampsAndPeriodicWatermarksOperator<T>
  2. extends AbstractUdfStreamOperator<T, AssignerWithPeriodicWatermarks<T>>
  3. implements OneInputStreamOperator<T, T>, Triggerable {
  4.  
  5. private transient long watermarkInterval;
  6. private transient long currentWatermark;
  7.  
  8. public TimestampsAndPeriodicWatermarksOperator(AssignerWithPeriodicWatermarks<T> assigner) {
  9. super(assigner); //AbstractUdfStreamOperator(F userFunction)
  10. this.chainingStrategy = ChainingStrategy.ALWAYS; //一定是chain
  11. }
  12.  
  13. @Override
  14. public void open() throws Exception {
  15. super.open();
  16.  
  17. currentWatermark = Long.MIN_VALUE;
  18. watermarkInterval = getExecutionConfig().getAutoWatermarkInterval();
  19.  
  20. if (watermarkInterval > 0) {
  21. registerTimer(System.currentTimeMillis() + watermarkInterval, this); //注册到定时器
  22. }
  23. }
  24.  
  25. @Override
  26. public void processElement(StreamRecord<T> element) throws Exception {
  27. final long newTimestamp = userFunction.extractTimestamp(element.getValue(), //由element中基于AssignerWithPeriodicWatermarks提取时间戳
  28. element.hasTimestamp() ? element.getTimestamp() : Long.MIN_VALUE);
  29.  
  30. output.collect(element.replace(element.getValue(), newTimestamp)); //更新element的时间戳,再次发出
  31. }
  32.  
  33. @Override
  34. public void trigger(long timestamp) throws Exception { //定时器触发trigger
  35. // register next timer
  36. Watermark newWatermark = userFunction.getCurrentWatermark(); //取得watermark
  37. if (newWatermark != null && newWatermark.getTimestamp() > currentWatermark) {
  38. currentWatermark = newWatermark.getTimestamp();
  39. // emit watermark
  40. output.emitWatermark(newWatermark); //发出watermark
  41. }
  42.  
  43. registerTimer(System.currentTimeMillis() + watermarkInterval, this); //重新注册到定时器
  44. }
  45.  
  46. @Override
  47. public void processWatermark(Watermark mark) throws Exception {
  48. // if we receive a Long.MAX_VALUE watermark we forward it since it is used
  49. // to signal the end of input and to not block watermark progress downstream
  50. if (mark.getTimestamp() == Long.MAX_VALUE && currentWatermark != Long.MAX_VALUE) {
  51. currentWatermark = Long.MAX_VALUE;
  52. output.emitWatermark(mark); //forward watermark
  53. }
  54. }

可以看到在processElement会调用AssignerWithPeriodicWatermarks.extractTimestamp提取event time

然后更新StreamRecord的时间

然后在Window Operator中,

  1. @Override
  2. public void processElement(StreamRecord<IN> element) throws Exception {
  3. final Collection<W> elementWindows = windowAssigner.assignWindows(
  4. element.getValue(), element.getTimestamp(), windowAssignerContext);

会在windowAssigner.assignWindows时以element的timestamp作为assign时间

对于watermark的处理,参考,Flink – window operator

Flink - watermark生成的更多相关文章

  1. [源码分析] 从源码入手看 Flink Watermark 之传播过程

    [源码分析] 从源码入手看 Flink Watermark 之传播过程 0x00 摘要 本文将通过源码分析,带领大家熟悉Flink Watermark 之传播过程,顺便也可以对Flink整体逻辑有一个 ...

  2. Flink Program Guide (4) -- 时间戳和Watermark生成(DataStream API编程指导 -- For Java)

    时间戳和Watermark生成 本文翻译自Generating Timestamp / Watermarks --------------------------------------------- ...

  3. flink watermark介绍

    转发请注明原创地址 http://www.cnblogs.com/dongxiao-yang/p/7610412.html 一 概念 watermark是flink为了处理eventTime窗口计算提 ...

  4. Flink assignAscendingTimestamps 生成水印的三个重载方法

    先简单介绍一下Timestamp 和Watermark 的概念: 1. Timestamp和Watermark都是基于事件的时间字段生成的 2. Timestamp和Watermark是两个不同的东西 ...

  5. flink WaterMark之TumblingEventWindow

    1.WaterMark,翻译成水印或水位线,水印翻译更抽象,水位线翻译接地气. watermark是用于处理乱序事件的,通常用watermark机制结合window来实现. 流处理从事件产生,到流经s ...

  6. 【源码解析】Flink 是如何基于事件时间生成Timestamp和Watermark

    生成Timestamp和Watermark 的三个重载方法介绍可参见上一篇博客: Flink assignAscendingTimestamps 生成水印的三个重载方法 之前想研究下Flink是怎么处 ...

  7. Flink Program Guide (5) -- 预定义的Timestamp Extractor / Watermark Emitter (DataStream API编程指导 -- For Java)

    本文翻译自Pre-defined Timestamp Extractors / Watermark Emitter ------------------------------------------ ...

  8. Flink的时间类型和watermark机制

    一FlinkTime类型 有3类时间,分别是数据本身的产生时间.进入Flink系统的时间和被处理的时间,在Flink系统中的数据可以有三种时间属性: Event Time 是每条数据在其生产设备上发生 ...

  9. 老板让阿粉学习 flink 中的 Watermark,现在他出教程了

    1 前言 在时间 Time 那一篇中,介绍了三种时间概念 Event.Ingestin 和 Process, 其中还简单介绍了乱序 Event Time 事件和它的解决方案 Watermark 水位线 ...

随机推荐

  1. Mac下软件包管理器-homebrew

    类似于redhat系统的yum,ubuntu的apt-get,mac系统下也有相应的包管理容器:homebrew.用法与apt-get.yum大同小异,都是对安装软件做一些安装删除类的命令行操作,以下 ...

  2. Node入门教程(10)第八章:Node 的事件处理

    Node中大量运用了事件回调,所以Node对事件做了单独的封装.所有能触发事件的对象都是 EventEmitter 类的实例,所以上一篇我们提到的文件操作的可读流.可写流等都是继承了 EventEmi ...

  3. wamp多站点多端口配置

    1.配置httpd.conf 监听多个端口 #Listen 12.34.56.78:80 Listen 8081 Listen 8082 Listen 8083 可以通过netstat -n -a查看 ...

  4. .Net Reactor 单个dll或exe文件的保护

    .Net  Reactor配置如下: 点一下“Protect”能执行成功,就说明配置没问题.然后保存配置文件,在vs插件上就可以直接读取使用了. vs插件配置

  5. Visual自动添加CSS兼容前缀

    安装方法 打开vs code 的 扩展 ---> 搜索 Autoprefixer,并安装. 使用方法 打开css文件,按F1,选择 Autoprefix CSS 这条命令 没执行命令之前: 执行 ...

  6. Android中Sqlite数据库多线程并发问题

    最近在做一个Android项目, 为了改善用户体验,把原先必须让用户“等待”的过程改成在新线程中异步执行.但是这样做遇到了多个线程同时需要写Sqlite数据库,导致操作数据库失败. 本人对Java并不 ...

  7. mysql的text字段长度?mysql数据库中text字段长度不够的问题

    类型是可变长度的字符串,最多65535个字符:     可以把字段类型改成MEDIUMTEXT(最多存放16777215个字符)或者LONGTEXT(最多存放4294967295个字符). MySQL ...

  8. No suitable servers found (`serverselectiontryonce` set): [Failed connecting to '115.28.161.44:27017': Connection timed out] php mongodb 异常

    我 php mongodb 拓展使用的是  MongoDB driver 今天查询数据的时候 偶尔会提示, No suitable servers found (`serverselectiontry ...

  9. IDEA 2017 安装后 关联SVN

    IDEA 2017 安装后,SVN checkout时候会出现如下错误: Cannot run program "svn" (in directory "D:\demo\ ...

  10. Math.ceil()、Math.floor()和Math.round()

    下面来介绍将小数值舍入为整数的几个方法:Math.ceil().Math.floor()和Math.round(). 这三个方法分别遵循下列舍入规则: Math.ceil()执行向上舍入,即它总是将数 ...