参考,Flink - Generating Timestamps / Watermarks

watermark,只有在有window的情况下才用到,所以在window operator前加上assignTimestampsAndWatermarks即可

不一定需要从source发出

1. 首先,source可以发出watermark

我们就看看kafka source的实现

    protected AbstractFetcher(
SourceContext<T> sourceContext,
List<KafkaTopicPartition> assignedPartitions,
SerializedValue<AssignerWithPeriodicWatermarks<T>> watermarksPeriodic, //在创建KafkaConsumer的时候assignTimestampsAndWatermarks
SerializedValue<AssignerWithPunctuatedWatermarks<T>> watermarksPunctuated,
ProcessingTimeService processingTimeProvider,
long autoWatermarkInterval, //env.getConfig().setAutoWatermarkInterval()
ClassLoader userCodeClassLoader,
boolean useMetrics) throws Exception
{
//判断watermark的类型
if (watermarksPeriodic == null) {
if (watermarksPunctuated == null) {
// simple case, no watermarks involved
timestampWatermarkMode = NO_TIMESTAMPS_WATERMARKS;
} else {
timestampWatermarkMode = PUNCTUATED_WATERMARKS;
}
} else {
if (watermarksPunctuated == null) {
timestampWatermarkMode = PERIODIC_WATERMARKS;
} else {
throw new IllegalArgumentException("Cannot have both periodic and punctuated watermarks");
}
} // create our partition state according to the timestamp/watermark mode
this.allPartitions = initializePartitions(
assignedPartitions,
timestampWatermarkMode,
watermarksPeriodic, watermarksPunctuated,
userCodeClassLoader); // if we have periodic watermarks, kick off the interval scheduler
if (timestampWatermarkMode == PERIODIC_WATERMARKS) { //如果是定期发出WaterMark
KafkaTopicPartitionStateWithPeriodicWatermarks<?, ?>[] parts =
(KafkaTopicPartitionStateWithPeriodicWatermarks<?, ?>[]) allPartitions; PeriodicWatermarkEmitter periodicEmitter=
new PeriodicWatermarkEmitter(parts, sourceContext, processingTimeProvider, autoWatermarkInterval);
periodicEmitter.start();
}
}

FlinkKafkaConsumerBase

    public FlinkKafkaConsumerBase<T> assignTimestampsAndWatermarks(AssignerWithPeriodicWatermarks<T> assigner) {
checkNotNull(assigner); if (this.punctuatedWatermarkAssigner != null) {
throw new IllegalStateException("A punctuated watermark emitter has already been set.");
}
try {
ClosureCleaner.clean(assigner, true);
this.periodicWatermarkAssigner = new SerializedValue<>(assigner);
return this;
} catch (Exception e) {
throw new IllegalArgumentException("The given assigner is not serializable", e);
}
}

这个接口的核心函数,定义,如何提取Timestamp和生成Watermark的逻辑

public interface AssignerWithPeriodicWatermarks<T> extends TimestampAssigner<T> {
Watermark getCurrentWatermark();
}
public interface TimestampAssigner<T> extends Function {
long extractTimestamp(T element, long previousElementTimestamp);
}

如果在初始化KafkaConsumer的时候,没有assignTimestampsAndWatermarks,就不会产生watermark

可以看到watermark有两种,

PERIODIC_WATERMARKS,定期发送的watermark

PUNCTUATED_WATERMARKS,由element触发的watermark,比如有element的特征或某种类型的element来表示触发watermark,这样便于开发者来控制watermark

initializePartitions

case PERIODIC_WATERMARKS: {
@SuppressWarnings("unchecked")
KafkaTopicPartitionStateWithPeriodicWatermarks<T, KPH>[] partitions =
(KafkaTopicPartitionStateWithPeriodicWatermarks<T, KPH>[])
new KafkaTopicPartitionStateWithPeriodicWatermarks<?, ?>[assignedPartitions.size()]; int pos = 0;
for (KafkaTopicPartition partition : assignedPartitions) {
KPH kafkaHandle = createKafkaPartitionHandle(partition); AssignerWithPeriodicWatermarks<T> assignerInstance =
watermarksPeriodic.deserializeValue(userCodeClassLoader); partitions[pos++] = new KafkaTopicPartitionStateWithPeriodicWatermarks<>(
partition, kafkaHandle, assignerInstance);
} return partitions;
}

KafkaTopicPartitionStateWithPeriodicWatermarks

这个类里面最核心的函数,

    public long getTimestampForRecord(T record, long kafkaEventTimestamp) {
return timestampsAndWatermarks.extractTimestamp(record, kafkaEventTimestamp);
} public long getCurrentWatermarkTimestamp() {
Watermark wm = timestampsAndWatermarks.getCurrentWatermark();
if (wm != null) {
partitionWatermark = Math.max(partitionWatermark, wm.getTimestamp());
}
return partitionWatermark;
}

可以看到是调用你定义的AssignerWithPeriodicWatermarks来实现

PeriodicWatermarkEmitter

    private static class PeriodicWatermarkEmitter implements ProcessingTimeCallback {

        public void start() {
timerService.registerTimer(timerService.getCurrentProcessingTime() + interval, this); //start定时器,定时触发
} @Override
public void onProcessingTime(long timestamp) throws Exception { //触发逻辑 long minAcrossAll = Long.MAX_VALUE;
for (KafkaTopicPartitionStateWithPeriodicWatermarks<?, ?> state : allPartitions) { //对于每个partitions // we access the current watermark for the periodic assigners under the state
// lock, to prevent concurrent modification to any internal variables
final long curr;
//noinspection SynchronizationOnLocalVariableOrMethodParameter
synchronized (state) {
curr = state.getCurrentWatermarkTimestamp(); //取出当前partition的WaterMark
} minAcrossAll = Math.min(minAcrossAll, curr); //求min,以partition中最小的partition作为watermark
} // emit next watermark, if there is one
if (minAcrossAll > lastWatermarkTimestamp) {
lastWatermarkTimestamp = minAcrossAll;
emitter.emitWatermark(new Watermark(minAcrossAll)); //emit
} // schedule the next watermark
timerService.registerTimer(timerService.getCurrentProcessingTime() + interval, this); //重新设置timer
}
}

2. DataStream也可以设置定时发送Watermark

其实实现是加了个chain的TimestampsAndPeriodicWatermarksOperator

DataStream

   /**
* Assigns timestamps to the elements in the data stream and periodically creates
* watermarks to signal event time progress.
*
* <p>This method creates watermarks periodically (for example every second), based
* on the watermarks indicated by the given watermark generator. Even when no new elements
* in the stream arrive, the given watermark generator will be periodically checked for
* new watermarks. The interval in which watermarks are generated is defined in
* {@link ExecutionConfig#setAutoWatermarkInterval(long)}.
*
* <p>Use this method for the common cases, where some characteristic over all elements
* should generate the watermarks, or where watermarks are simply trailing behind the
* wall clock time by a certain amount.
*
* <p>For the second case and when the watermarks are required to lag behind the maximum
* timestamp seen so far in the elements of the stream by a fixed amount of time, and this
* amount is known in advance, use the
* {@link BoundedOutOfOrdernessTimestampExtractor}.
*
* <p>For cases where watermarks should be created in an irregular fashion, for example
* based on certain markers that some element carry, use the
* {@link AssignerWithPunctuatedWatermarks}.
*
* @param timestampAndWatermarkAssigner The implementation of the timestamp assigner and
* watermark generator.
* @return The stream after the transformation, with assigned timestamps and watermarks.
*
* @see AssignerWithPeriodicWatermarks
* @see AssignerWithPunctuatedWatermarks
* @see #assignTimestampsAndWatermarks(AssignerWithPunctuatedWatermarks)
*/
public SingleOutputStreamOperator<T> assignTimestampsAndWatermarks(
AssignerWithPeriodicWatermarks<T> timestampAndWatermarkAssigner) { // match parallelism to input, otherwise dop=1 sources could lead to some strange
// behaviour: the watermark will creep along very slowly because the elements
// from the source go to each extraction operator round robin.
final int inputParallelism = getTransformation().getParallelism();
final AssignerWithPeriodicWatermarks<T> cleanedAssigner = clean(timestampAndWatermarkAssigner); TimestampsAndPeriodicWatermarksOperator<T> operator =
new TimestampsAndPeriodicWatermarksOperator<>(cleanedAssigner); return transform("Timestamps/Watermarks", getTransformation().getOutputType(), operator)
.setParallelism(inputParallelism);
}

TimestampsAndPeriodicWatermarksOperator

  public class TimestampsAndPeriodicWatermarksOperator<T>
extends AbstractUdfStreamOperator<T, AssignerWithPeriodicWatermarks<T>>
implements OneInputStreamOperator<T, T>, Triggerable { private transient long watermarkInterval;
private transient long currentWatermark; public TimestampsAndPeriodicWatermarksOperator(AssignerWithPeriodicWatermarks<T> assigner) {
super(assigner); //AbstractUdfStreamOperator(F userFunction)
this.chainingStrategy = ChainingStrategy.ALWAYS; //一定是chain
} @Override
public void open() throws Exception {
super.open(); currentWatermark = Long.MIN_VALUE;
watermarkInterval = getExecutionConfig().getAutoWatermarkInterval(); if (watermarkInterval > 0) {
registerTimer(System.currentTimeMillis() + watermarkInterval, this); //注册到定时器
}
} @Override
public void processElement(StreamRecord<T> element) throws Exception {
final long newTimestamp = userFunction.extractTimestamp(element.getValue(), //由element中基于AssignerWithPeriodicWatermarks提取时间戳
element.hasTimestamp() ? element.getTimestamp() : Long.MIN_VALUE); output.collect(element.replace(element.getValue(), newTimestamp)); //更新element的时间戳,再次发出
} @Override
public void trigger(long timestamp) throws Exception { //定时器触发trigger
// register next timer
Watermark newWatermark = userFunction.getCurrentWatermark(); //取得watermark
if (newWatermark != null && newWatermark.getTimestamp() > currentWatermark) {
currentWatermark = newWatermark.getTimestamp();
// emit watermark
output.emitWatermark(newWatermark); //发出watermark
} registerTimer(System.currentTimeMillis() + watermarkInterval, this); //重新注册到定时器
} @Override
public void processWatermark(Watermark mark) throws Exception {
// if we receive a Long.MAX_VALUE watermark we forward it since it is used
// to signal the end of input and to not block watermark progress downstream
if (mark.getTimestamp() == Long.MAX_VALUE && currentWatermark != Long.MAX_VALUE) {
currentWatermark = Long.MAX_VALUE;
output.emitWatermark(mark); //forward watermark
}
}

可以看到在processElement会调用AssignerWithPeriodicWatermarks.extractTimestamp提取event time

然后更新StreamRecord的时间

然后在Window Operator中,

@Override
public void processElement(StreamRecord<IN> element) throws Exception {
final Collection<W> elementWindows = windowAssigner.assignWindows(
element.getValue(), element.getTimestamp(), windowAssignerContext);

会在windowAssigner.assignWindows时以element的timestamp作为assign时间

对于watermark的处理,参考,Flink – window operator

Flink - watermark生成的更多相关文章

  1. [源码分析] 从源码入手看 Flink Watermark 之传播过程

    [源码分析] 从源码入手看 Flink Watermark 之传播过程 0x00 摘要 本文将通过源码分析,带领大家熟悉Flink Watermark 之传播过程,顺便也可以对Flink整体逻辑有一个 ...

  2. Flink Program Guide (4) -- 时间戳和Watermark生成(DataStream API编程指导 -- For Java)

    时间戳和Watermark生成 本文翻译自Generating Timestamp / Watermarks --------------------------------------------- ...

  3. flink watermark介绍

    转发请注明原创地址 http://www.cnblogs.com/dongxiao-yang/p/7610412.html 一 概念 watermark是flink为了处理eventTime窗口计算提 ...

  4. Flink assignAscendingTimestamps 生成水印的三个重载方法

    先简单介绍一下Timestamp 和Watermark 的概念: 1. Timestamp和Watermark都是基于事件的时间字段生成的 2. Timestamp和Watermark是两个不同的东西 ...

  5. flink WaterMark之TumblingEventWindow

    1.WaterMark,翻译成水印或水位线,水印翻译更抽象,水位线翻译接地气. watermark是用于处理乱序事件的,通常用watermark机制结合window来实现. 流处理从事件产生,到流经s ...

  6. 【源码解析】Flink 是如何基于事件时间生成Timestamp和Watermark

    生成Timestamp和Watermark 的三个重载方法介绍可参见上一篇博客: Flink assignAscendingTimestamps 生成水印的三个重载方法 之前想研究下Flink是怎么处 ...

  7. Flink Program Guide (5) -- 预定义的Timestamp Extractor / Watermark Emitter (DataStream API编程指导 -- For Java)

    本文翻译自Pre-defined Timestamp Extractors / Watermark Emitter ------------------------------------------ ...

  8. Flink的时间类型和watermark机制

    一FlinkTime类型 有3类时间,分别是数据本身的产生时间.进入Flink系统的时间和被处理的时间,在Flink系统中的数据可以有三种时间属性: Event Time 是每条数据在其生产设备上发生 ...

  9. 老板让阿粉学习 flink 中的 Watermark,现在他出教程了

    1 前言 在时间 Time 那一篇中,介绍了三种时间概念 Event.Ingestin 和 Process, 其中还简单介绍了乱序 Event Time 事件和它的解决方案 Watermark 水位线 ...

随机推荐

  1. centos 7 配置tomcat开机启动

    1. tomcat 需要增加一个pid文件 在tomca/bin 目录下面,增加 setenv.sh 配置,catalina.sh启动的时候会调用,同时配置java内存参数. #add tomcat ...

  2. MySQL的binlog日志<转>

    binlog 基本认识 MySQL的二进制日志可以说是MySQL最重要的日志了,它记录了所有的DDL和DML(除了数据查询语句)语句,以事件形式记录,还包含语句所执行的消耗的时间,MySQL的二进制日 ...

  3. RR算法 调度

    RR算法是使用非常广泛的一种调度算法. 首先将所有就绪的队列按FCFS策略排成一个就绪队列,然后系统设置一定的时间片,每次给队首作业分配时间片.如果此作业运行结束,即使时间片没用完,立刻从队列中去除此 ...

  4. linux 如何快速的查找日志中你所要查找的信息

    在工作中我总会通过日志来查找相关问题,但有时候日志太多有不知道又不知道日志什么时候打印的,这时我们可以通过一下方法来查找: 1.把目录跳到你日志文件存放的地方 2.grep  关键字  *    例如 ...

  5. [转]iOS 中几种定时器 - 控制了时间,就控制了一切

    这篇文章是转载内容,原文地址:http://www.cocoachina.com/ios/20150519/11857.html?utm_source=tuicool 这里的知识点,其实在我们日常开发 ...

  6. [PyData] 03 - Data Representation

    Ref: http://blog.csdn.net/u013534498/article/details/51399035 如何在Python中实现这五类强大的概率分布 考虑下在mgrid上画二维概率 ...

  7. lua第三方库

    一.Lua 包管理工具 1.LuaRocks luarocks 是Lua常用的包管理工具(还有一个是LuaDist),其安装方式请参考官网:https://luarocks.org/#quick-st ...

  8. 九、K3 WISE 开发插件《工业单据老单序时薄插件工具栏按钮开发实例》

    =============================== 目录: 1.添加工具栏按钮 2.查询被添加工具栏按钮的业务单据的FMenuID和FID 3.添加工具栏按钮和业务单据的映射关系 4.工具 ...

  9. MySQL 出现You can't specify target table for update in FROM clause错误解决方法

    MySQL出现You can’t specify target table for update in FROM clause 这个错误的意思是不能在同一个sql语句中,先select同一个表的某些值 ...

  10. 【黑金原创教程】【FPGA那些事儿-驱动篇I 】实验六:数码管模块

    实验六:数码管模块 有关数码管的驱动,想必读者已经学烂了 ... 不过,作为学习的新仪式,再烂的东西也要温故知新,不然学习就会不健全.黑金开发板上的数码管资源,由始至终都没有改变过,笔者因此由身怀念. ...