Flink - watermark生成
参考,Flink - Generating Timestamps / Watermarks
watermark,只有在有window的情况下才用到,所以在window operator前加上assignTimestampsAndWatermarks即可
不一定需要从source发出
1. 首先,source可以发出watermark
我们就看看kafka source的实现
protected AbstractFetcher(
SourceContext<T> sourceContext,
List<KafkaTopicPartition> assignedPartitions,
SerializedValue<AssignerWithPeriodicWatermarks<T>> watermarksPeriodic, //在创建KafkaConsumer的时候assignTimestampsAndWatermarks
SerializedValue<AssignerWithPunctuatedWatermarks<T>> watermarksPunctuated,
ProcessingTimeService processingTimeProvider,
long autoWatermarkInterval, //env.getConfig().setAutoWatermarkInterval()
ClassLoader userCodeClassLoader,
boolean useMetrics) throws Exception
{
//判断watermark的类型
if (watermarksPeriodic == null) {
if (watermarksPunctuated == null) {
// simple case, no watermarks involved
timestampWatermarkMode = NO_TIMESTAMPS_WATERMARKS;
} else {
timestampWatermarkMode = PUNCTUATED_WATERMARKS;
}
} else {
if (watermarksPunctuated == null) {
timestampWatermarkMode = PERIODIC_WATERMARKS;
} else {
throw new IllegalArgumentException("Cannot have both periodic and punctuated watermarks");
}
} // create our partition state according to the timestamp/watermark mode
this.allPartitions = initializePartitions(
assignedPartitions,
timestampWatermarkMode,
watermarksPeriodic, watermarksPunctuated,
userCodeClassLoader); // if we have periodic watermarks, kick off the interval scheduler
if (timestampWatermarkMode == PERIODIC_WATERMARKS) { //如果是定期发出WaterMark
KafkaTopicPartitionStateWithPeriodicWatermarks<?, ?>[] parts =
(KafkaTopicPartitionStateWithPeriodicWatermarks<?, ?>[]) allPartitions; PeriodicWatermarkEmitter periodicEmitter=
new PeriodicWatermarkEmitter(parts, sourceContext, processingTimeProvider, autoWatermarkInterval);
periodicEmitter.start();
}
}
FlinkKafkaConsumerBase
public FlinkKafkaConsumerBase<T> assignTimestampsAndWatermarks(AssignerWithPeriodicWatermarks<T> assigner) {
checkNotNull(assigner); if (this.punctuatedWatermarkAssigner != null) {
throw new IllegalStateException("A punctuated watermark emitter has already been set.");
}
try {
ClosureCleaner.clean(assigner, true);
this.periodicWatermarkAssigner = new SerializedValue<>(assigner);
return this;
} catch (Exception e) {
throw new IllegalArgumentException("The given assigner is not serializable", e);
}
}
这个接口的核心函数,定义,如何提取Timestamp和生成Watermark的逻辑
public interface AssignerWithPeriodicWatermarks<T> extends TimestampAssigner<T> {
Watermark getCurrentWatermark();
}
public interface TimestampAssigner<T> extends Function {
long extractTimestamp(T element, long previousElementTimestamp);
}
如果在初始化KafkaConsumer的时候,没有assignTimestampsAndWatermarks,就不会产生watermark
可以看到watermark有两种,
PERIODIC_WATERMARKS,定期发送的watermark
PUNCTUATED_WATERMARKS,由element触发的watermark,比如有element的特征或某种类型的element来表示触发watermark,这样便于开发者来控制watermark
initializePartitions
case PERIODIC_WATERMARKS: {
@SuppressWarnings("unchecked")
KafkaTopicPartitionStateWithPeriodicWatermarks<T, KPH>[] partitions =
(KafkaTopicPartitionStateWithPeriodicWatermarks<T, KPH>[])
new KafkaTopicPartitionStateWithPeriodicWatermarks<?, ?>[assignedPartitions.size()]; int pos = 0;
for (KafkaTopicPartition partition : assignedPartitions) {
KPH kafkaHandle = createKafkaPartitionHandle(partition); AssignerWithPeriodicWatermarks<T> assignerInstance =
watermarksPeriodic.deserializeValue(userCodeClassLoader); partitions[pos++] = new KafkaTopicPartitionStateWithPeriodicWatermarks<>(
partition, kafkaHandle, assignerInstance);
} return partitions;
}
KafkaTopicPartitionStateWithPeriodicWatermarks
这个类里面最核心的函数,
public long getTimestampForRecord(T record, long kafkaEventTimestamp) {
return timestampsAndWatermarks.extractTimestamp(record, kafkaEventTimestamp);
} public long getCurrentWatermarkTimestamp() {
Watermark wm = timestampsAndWatermarks.getCurrentWatermark();
if (wm != null) {
partitionWatermark = Math.max(partitionWatermark, wm.getTimestamp());
}
return partitionWatermark;
}
可以看到是调用你定义的AssignerWithPeriodicWatermarks来实现
PeriodicWatermarkEmitter
private static class PeriodicWatermarkEmitter implements ProcessingTimeCallback { public void start() {
timerService.registerTimer(timerService.getCurrentProcessingTime() + interval, this); //start定时器,定时触发
} @Override
public void onProcessingTime(long timestamp) throws Exception { //触发逻辑 long minAcrossAll = Long.MAX_VALUE;
for (KafkaTopicPartitionStateWithPeriodicWatermarks<?, ?> state : allPartitions) { //对于每个partitions // we access the current watermark for the periodic assigners under the state
// lock, to prevent concurrent modification to any internal variables
final long curr;
//noinspection SynchronizationOnLocalVariableOrMethodParameter
synchronized (state) {
curr = state.getCurrentWatermarkTimestamp(); //取出当前partition的WaterMark
} minAcrossAll = Math.min(minAcrossAll, curr); //求min,以partition中最小的partition作为watermark
} // emit next watermark, if there is one
if (minAcrossAll > lastWatermarkTimestamp) {
lastWatermarkTimestamp = minAcrossAll;
emitter.emitWatermark(new Watermark(minAcrossAll)); //emit
} // schedule the next watermark
timerService.registerTimer(timerService.getCurrentProcessingTime() + interval, this); //重新设置timer
}
}
2. DataStream也可以设置定时发送Watermark
其实实现是加了个chain的TimestampsAndPeriodicWatermarksOperator
DataStream
/**
* Assigns timestamps to the elements in the data stream and periodically creates
* watermarks to signal event time progress.
*
* <p>This method creates watermarks periodically (for example every second), based
* on the watermarks indicated by the given watermark generator. Even when no new elements
* in the stream arrive, the given watermark generator will be periodically checked for
* new watermarks. The interval in which watermarks are generated is defined in
* {@link ExecutionConfig#setAutoWatermarkInterval(long)}.
*
* <p>Use this method for the common cases, where some characteristic over all elements
* should generate the watermarks, or where watermarks are simply trailing behind the
* wall clock time by a certain amount.
*
* <p>For the second case and when the watermarks are required to lag behind the maximum
* timestamp seen so far in the elements of the stream by a fixed amount of time, and this
* amount is known in advance, use the
* {@link BoundedOutOfOrdernessTimestampExtractor}.
*
* <p>For cases where watermarks should be created in an irregular fashion, for example
* based on certain markers that some element carry, use the
* {@link AssignerWithPunctuatedWatermarks}.
*
* @param timestampAndWatermarkAssigner The implementation of the timestamp assigner and
* watermark generator.
* @return The stream after the transformation, with assigned timestamps and watermarks.
*
* @see AssignerWithPeriodicWatermarks
* @see AssignerWithPunctuatedWatermarks
* @see #assignTimestampsAndWatermarks(AssignerWithPunctuatedWatermarks)
*/
public SingleOutputStreamOperator<T> assignTimestampsAndWatermarks(
AssignerWithPeriodicWatermarks<T> timestampAndWatermarkAssigner) { // match parallelism to input, otherwise dop=1 sources could lead to some strange
// behaviour: the watermark will creep along very slowly because the elements
// from the source go to each extraction operator round robin.
final int inputParallelism = getTransformation().getParallelism();
final AssignerWithPeriodicWatermarks<T> cleanedAssigner = clean(timestampAndWatermarkAssigner); TimestampsAndPeriodicWatermarksOperator<T> operator =
new TimestampsAndPeriodicWatermarksOperator<>(cleanedAssigner); return transform("Timestamps/Watermarks", getTransformation().getOutputType(), operator)
.setParallelism(inputParallelism);
}
TimestampsAndPeriodicWatermarksOperator
public class TimestampsAndPeriodicWatermarksOperator<T>
extends AbstractUdfStreamOperator<T, AssignerWithPeriodicWatermarks<T>>
implements OneInputStreamOperator<T, T>, Triggerable { private transient long watermarkInterval;
private transient long currentWatermark; public TimestampsAndPeriodicWatermarksOperator(AssignerWithPeriodicWatermarks<T> assigner) {
super(assigner); //AbstractUdfStreamOperator(F userFunction)
this.chainingStrategy = ChainingStrategy.ALWAYS; //一定是chain
} @Override
public void open() throws Exception {
super.open(); currentWatermark = Long.MIN_VALUE;
watermarkInterval = getExecutionConfig().getAutoWatermarkInterval(); if (watermarkInterval > 0) {
registerTimer(System.currentTimeMillis() + watermarkInterval, this); //注册到定时器
}
} @Override
public void processElement(StreamRecord<T> element) throws Exception {
final long newTimestamp = userFunction.extractTimestamp(element.getValue(), //由element中基于AssignerWithPeriodicWatermarks提取时间戳
element.hasTimestamp() ? element.getTimestamp() : Long.MIN_VALUE); output.collect(element.replace(element.getValue(), newTimestamp)); //更新element的时间戳,再次发出
} @Override
public void trigger(long timestamp) throws Exception { //定时器触发trigger
// register next timer
Watermark newWatermark = userFunction.getCurrentWatermark(); //取得watermark
if (newWatermark != null && newWatermark.getTimestamp() > currentWatermark) {
currentWatermark = newWatermark.getTimestamp();
// emit watermark
output.emitWatermark(newWatermark); //发出watermark
} registerTimer(System.currentTimeMillis() + watermarkInterval, this); //重新注册到定时器
} @Override
public void processWatermark(Watermark mark) throws Exception {
// if we receive a Long.MAX_VALUE watermark we forward it since it is used
// to signal the end of input and to not block watermark progress downstream
if (mark.getTimestamp() == Long.MAX_VALUE && currentWatermark != Long.MAX_VALUE) {
currentWatermark = Long.MAX_VALUE;
output.emitWatermark(mark); //forward watermark
}
}
可以看到在processElement会调用AssignerWithPeriodicWatermarks.extractTimestamp提取event time
然后更新StreamRecord的时间
然后在Window Operator中,
@Override
public void processElement(StreamRecord<IN> element) throws Exception {
final Collection<W> elementWindows = windowAssigner.assignWindows(
element.getValue(), element.getTimestamp(), windowAssignerContext);
会在windowAssigner.assignWindows时以element的timestamp作为assign时间
对于watermark的处理,参考,Flink – window operator
Flink - watermark生成的更多相关文章
- [源码分析] 从源码入手看 Flink Watermark 之传播过程
[源码分析] 从源码入手看 Flink Watermark 之传播过程 0x00 摘要 本文将通过源码分析,带领大家熟悉Flink Watermark 之传播过程,顺便也可以对Flink整体逻辑有一个 ...
- Flink Program Guide (4) -- 时间戳和Watermark生成(DataStream API编程指导 -- For Java)
时间戳和Watermark生成 本文翻译自Generating Timestamp / Watermarks --------------------------------------------- ...
- flink watermark介绍
转发请注明原创地址 http://www.cnblogs.com/dongxiao-yang/p/7610412.html 一 概念 watermark是flink为了处理eventTime窗口计算提 ...
- Flink assignAscendingTimestamps 生成水印的三个重载方法
先简单介绍一下Timestamp 和Watermark 的概念: 1. Timestamp和Watermark都是基于事件的时间字段生成的 2. Timestamp和Watermark是两个不同的东西 ...
- flink WaterMark之TumblingEventWindow
1.WaterMark,翻译成水印或水位线,水印翻译更抽象,水位线翻译接地气. watermark是用于处理乱序事件的,通常用watermark机制结合window来实现. 流处理从事件产生,到流经s ...
- 【源码解析】Flink 是如何基于事件时间生成Timestamp和Watermark
生成Timestamp和Watermark 的三个重载方法介绍可参见上一篇博客: Flink assignAscendingTimestamps 生成水印的三个重载方法 之前想研究下Flink是怎么处 ...
- Flink Program Guide (5) -- 预定义的Timestamp Extractor / Watermark Emitter (DataStream API编程指导 -- For Java)
本文翻译自Pre-defined Timestamp Extractors / Watermark Emitter ------------------------------------------ ...
- Flink的时间类型和watermark机制
一FlinkTime类型 有3类时间,分别是数据本身的产生时间.进入Flink系统的时间和被处理的时间,在Flink系统中的数据可以有三种时间属性: Event Time 是每条数据在其生产设备上发生 ...
- 老板让阿粉学习 flink 中的 Watermark,现在他出教程了
1 前言 在时间 Time 那一篇中,介绍了三种时间概念 Event.Ingestin 和 Process, 其中还简单介绍了乱序 Event Time 事件和它的解决方案 Watermark 水位线 ...
随机推荐
- centos 7 配置tomcat开机启动
1. tomcat 需要增加一个pid文件 在tomca/bin 目录下面,增加 setenv.sh 配置,catalina.sh启动的时候会调用,同时配置java内存参数. #add tomcat ...
- MySQL的binlog日志<转>
binlog 基本认识 MySQL的二进制日志可以说是MySQL最重要的日志了,它记录了所有的DDL和DML(除了数据查询语句)语句,以事件形式记录,还包含语句所执行的消耗的时间,MySQL的二进制日 ...
- RR算法 调度
RR算法是使用非常广泛的一种调度算法. 首先将所有就绪的队列按FCFS策略排成一个就绪队列,然后系统设置一定的时间片,每次给队首作业分配时间片.如果此作业运行结束,即使时间片没用完,立刻从队列中去除此 ...
- linux 如何快速的查找日志中你所要查找的信息
在工作中我总会通过日志来查找相关问题,但有时候日志太多有不知道又不知道日志什么时候打印的,这时我们可以通过一下方法来查找: 1.把目录跳到你日志文件存放的地方 2.grep 关键字 * 例如 ...
- [转]iOS 中几种定时器 - 控制了时间,就控制了一切
这篇文章是转载内容,原文地址:http://www.cocoachina.com/ios/20150519/11857.html?utm_source=tuicool 这里的知识点,其实在我们日常开发 ...
- [PyData] 03 - Data Representation
Ref: http://blog.csdn.net/u013534498/article/details/51399035 如何在Python中实现这五类强大的概率分布 考虑下在mgrid上画二维概率 ...
- lua第三方库
一.Lua 包管理工具 1.LuaRocks luarocks 是Lua常用的包管理工具(还有一个是LuaDist),其安装方式请参考官网:https://luarocks.org/#quick-st ...
- 九、K3 WISE 开发插件《工业单据老单序时薄插件工具栏按钮开发实例》
=============================== 目录: 1.添加工具栏按钮 2.查询被添加工具栏按钮的业务单据的FMenuID和FID 3.添加工具栏按钮和业务单据的映射关系 4.工具 ...
- MySQL 出现You can't specify target table for update in FROM clause错误解决方法
MySQL出现You can’t specify target table for update in FROM clause 这个错误的意思是不能在同一个sql语句中,先select同一个表的某些值 ...
- 【黑金原创教程】【FPGA那些事儿-驱动篇I 】实验六:数码管模块
实验六:数码管模块 有关数码管的驱动,想必读者已经学烂了 ... 不过,作为学习的新仪式,再烂的东西也要温故知新,不然学习就会不健全.黑金开发板上的数码管资源,由始至终都没有改变过,笔者因此由身怀念. ...