Flink Pre-defined Timestamp Extractors / Watermark Emitters(预定义的时间戳提取/水位线发射器)
https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/event_timestamp_extractors.html
根据官网描述,Flink提供预定义的时间戳提取/水位线发射器。如下:
Flink provides abstractions that allow the programmer to assign their own timestamps and emit their own watermarks.
More specifically, one can do so by implementing one of the AssignerWithPeriodicWatermarks
and AssignerWithPunctuatedWatermarks
interfaces, depending on the use case.
In a nutshell, the first will emit watermarks periodically, while the second does so based on some property of the incoming records, e.g. whenever a special element is encountered in the stream.
AssignerWithPeriodicWatermarks介绍:
源码路径:flink\flink-streaming-java\src\main\java\org\apache\flink\streaming\api\functions\AssignerWithPeriodicWatermarks.java
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/ package org.apache.flink.streaming.api.functions; import org.apache.flink.api.common.ExecutionConfig;
import org.apache.flink.streaming.api.watermark.Watermark; import javax.annotation.Nullable; /**
* The {@code AssignerWithPeriodicWatermarks} assigns event time timestamps to elements,
* and generates low watermarks that signal event time progress within the stream.
* These timestamps and watermarks are used by functions and operators that operate
* on event time, for example event time windows.
*
* <p>Use this class to generate watermarks in a periodical interval.
* At most every {@code i} milliseconds (configured via
* {@link ExecutionConfig#getAutoWatermarkInterval()}), the system will call the
* {@link #getCurrentWatermark()} method to probe for the next watermark value.
* The system will generate a new watermark, if the probed value is non-null
* and has a timestamp larger than that of the previous watermark (to preserve
* the contract of ascending watermarks).
*
* <p>The system may call the {@link #getCurrentWatermark()} method less often than every
* {@code i} milliseconds, if no new elements arrived since the last call to the
* method.
*
* <p>Timestamps and watermarks are defined as {@code longs} that represent the
* milliseconds since the Epoch (midnight, January 1, 1970 UTC).
* A watermark with a certain value {@code t} indicates that no elements with event
* timestamps {@code x}, where {@code x} is lower or equal to {@code t}, will occur any more.
*
* @param <T> The type of the elements to which this assigner assigns timestamps.
*
* @see org.apache.flink.streaming.api.watermark.Watermark
*/
public interface AssignerWithPeriodicWatermarks<T> extends TimestampAssigner<T> { /**
* Returns the current watermark. This method is periodically called by the
* system to retrieve the current watermark. The method may return {@code null} to
* indicate that no new Watermark is available.
*
* <p>The returned watermark will be emitted only if it is non-null and its timestamp
* is larger than that of the previously emitted watermark (to preserve the contract of
* ascending watermarks). If the current watermark is still
* identical to the previous one, no progress in event time has happened since
* the previous call to this method. If a null value is returned, or the timestamp
* of the returned watermark is smaller than that of the last emitted one, then no
* new watermark will be generated.
*
* <p>The interval in which this method is called and Watermarks are generated
* depends on {@link ExecutionConfig#getAutoWatermarkInterval()}.
*
* @see org.apache.flink.streaming.api.watermark.Watermark
* @see ExecutionConfig#getAutoWatermarkInterval()
*
* @return {@code Null}, if no watermark should be emitted, or the next watermark to emit.
*/
@Nullable
Watermark getCurrentWatermark();
}
AssignerWithPunctuatedWatermarks
接口介绍
源码路径 flink\flink-streaming-java\src\main\java\org\apache\flink\streaming\api\functions\AssignerWithPunctuatedWatermarks.java
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/ package org.apache.flink.streaming.api.functions; import org.apache.flink.streaming.api.watermark.Watermark; import javax.annotation.Nullable; /**
* The {@code AssignerWithPunctuatedWatermarks} assigns event time timestamps to elements,
* and generates low watermarks that signal event time progress within the stream.
* These timestamps and watermarks are used by functions and operators that operate
* on event time, for example event time windows.
*
* <p>Use this class if certain special elements act as markers that signify event time
* progress, and when you want to emit watermarks specifically at certain events.
* The system will generate a new watermark, if the probed value is non-null
* and has a timestamp larger than that of the previous watermark (to preserve
* the contract of ascending watermarks).
*
* <p>For use cases that should periodically emit watermarks based on element timestamps,
* use the {@link AssignerWithPeriodicWatermarks} instead.
*
* <p>The following example illustrates how to use this timestamp extractor and watermark
* generator. It assumes elements carry a timestamp that describes when they were created,
* and that some elements carry a flag, marking them as the end of a sequence such that no
* elements with smaller timestamps can come anymore.
*
* <pre>{@code
* public class WatermarkOnFlagAssigner implements AssignerWithPunctuatedWatermarks<MyElement> {
*
* public long extractTimestamp(MyElement element, long previousElementTimestamp) {
* return element.getSequenceTimestamp();
* }
*
* public Watermark checkAndGetNextWatermark(MyElement lastElement, long extractedTimestamp) {
* return lastElement.isEndOfSequence() ? new Watermark(extractedTimestamp) : null;
* }
* }
* }</pre>
*
* <p>Timestamps and watermarks are defined as {@code longs} that represent the
* milliseconds since the Epoch (midnight, January 1, 1970 UTC).
* A watermark with a certain value {@code t} indicates that no elements with event
* timestamps {@code x}, where {@code x} is lower or equal to {@code t}, will occur any more.
*
* @param <T> The type of the elements to which this assigner assigns timestamps.
*
* @see org.apache.flink.streaming.api.watermark.Watermark
*/
public interface AssignerWithPunctuatedWatermarks<T> extends TimestampAssigner<T> { /**
* Asks this implementation if it wants to emit a watermark. This method is called right after
* the {@link #extractTimestamp(Object, long)} method.
*
* <p>The returned watermark will be emitted only if it is non-null and its timestamp
* is larger than that of the previously emitted watermark (to preserve the contract of
* ascending watermarks). If a null value is returned, or the timestamp of the returned
* watermark is smaller than that of the last emitted one, then no new watermark will
* be generated.
*
* <p>For an example how to use this method, see the documentation of
* {@link AssignerWithPunctuatedWatermarks this class}.
*
* @return {@code Null}, if no watermark should be emitted, or the next watermark to emit.
*/
@Nullable
Watermark checkAndGetNextWatermark(T lastElement, long extractedTimestamp);
}
两种接口的DEMO:
AssignerWithPeriodicWatermarks
接口DEMO 如:https://www.cnblogs.com/felixzh/p/9687214.html
AssignerWithPunctuatedWatermarks
接口DEMO如下:
package org.apache.flink.streaming.examples.wordcount; import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.core.fs.FileSystem;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.AssignerWithPunctuatedWatermarks;
import org.apache.flink.streaming.api.watermark.Watermark;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.util.Collector; import javax.annotation.Nullable;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Properties; public class wcNew {
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.enableCheckpointing();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
Properties props = new Properties();
props.setProperty("bootstrap.servers", "127.0.0.1:9092");
props.setProperty("group.id", "flink-group-debug");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); FlinkKafkaConsumer010<String> consumer =
new FlinkKafkaConsumer010<>(args[], new SimpleStringSchema(), props);
consumer.setStartFromEarliest();
consumer.assignTimestampsAndWatermarks(new MessageWaterEmitter()); DataStream<Tuple3<String, Integer, String>> keyedStream = env
.addSource(consumer)
.flatMap(new MessageSplitter())
.keyBy()
.timeWindow(Time.seconds())
.reduce(new ReduceFunction<Tuple3<String, Integer, String>>() {
@Override
public Tuple3<String, Integer, String> reduce(Tuple3<String, Integer, String> t0, Tuple3<String, Integer, String> t1) throws Exception {
String time0 = t0.getField();
String time1 = t1.getField();
Integer count0 = t0.getField();
Integer count1 = t1.getField();
return new Tuple3<>(t0.getField(), count0 + count1, time0 +"|"+ time1);
}
}); keyedStream.writeAsText(args[], FileSystem.WriteMode.OVERWRITE);
keyedStream.print();
env.execute("Flink-Kafka num count");
} private static class MessageWaterEmitter implements AssignerWithPunctuatedWatermarks<String> { private SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMdd-hhmmss"); /*
* 先执行该函数,从element中提取时间戳
*@param element record行
*@param previousElementTimestamp 当前的时间
*/
@Override
public long extractTimestamp(String element, long previousElementTimestamp) {
if (element != null && element.contains(",")) {
String[] parts = element.split(",");
if (parts.length == ) {
try {
return sdf.parse(parts[]).getTime();
} catch (ParseException e) {
e.printStackTrace();
}
}
}
return 0L;
} /*
* 再执行该函数,extractedTimestamp的值是extractTimestamp的返回值
*/
@Nullable
@Override
public Watermark checkAndGetNextWatermark(String lastElement, long extractedTimestamp) {
if (lastElement != null && lastElement.contains(",")) {
String[] parts = lastElement.split(",");
if(parts.length==) {
try {
return new Watermark(sdf.parse(parts[]).getTime());
} catch (ParseException e) {
e.printStackTrace();
}
} }
return null;
}
}
private static class MessageSplitter implements FlatMapFunction<String, Tuple3<String, Integer, String>> { @Override
public void flatMap(String s, Collector<Tuple3<String, Integer, String>> collector) throws Exception {
if (s != null && s.contains(",")) {
String[] strings = s.split(",");
if(strings.length==) {
collector.collect(new Tuple3<>(strings[], Integer.parseInt(strings[]), strings[]));
}
}
}
}
}
打包成jar包后,上传到flink所在服务器,在控制台输入
flink run -c org.apache.flink.streaming.examples.wordcount.wcNew flink-kafka.jar topic_test_numcount /tmp/numcount.txt
控制台输入
eee,,-
eee,,-
eee,,-
eee,,-
eee,,-
tail -f numcount.txt 监控numcount.txt输出 当最后一条输入时,可以看到程序输出了前4条的计算结果 (eee,7,20180504-113411|20180504-113415|20180504-113412|20180504-113419)
Flink Pre-defined Timestamp Extractors / Watermark Emitters(预定义的时间戳提取/水位线发射器)的更多相关文章
- Flink Program Guide (5) -- 预定义的Timestamp Extractor / Watermark Emitter (DataStream API编程指导 -- For Java)
本文翻译自Pre-defined Timestamp Extractors / Watermark Emitter ------------------------------------------ ...
- Flink系列之Time和WaterMark
当数据进入Flink的时候,数据需要带入相应的时间,根据相应的时间进行处理. 让咱们想象一个场景,有一个队列,分别带着指定的时间,那么处理的时候,需要根据相应的时间进行处理,比如:统计最近五分钟的访问 ...
- 【源码解析】Flink 是如何基于事件时间生成Timestamp和Watermark
生成Timestamp和Watermark 的三个重载方法介绍可参见上一篇博客: Flink assignAscendingTimestamps 生成水印的三个重载方法 之前想研究下Flink是怎么处 ...
- Flink的时间类型和watermark机制
一FlinkTime类型 有3类时间,分别是数据本身的产生时间.进入Flink系统的时间和被处理的时间,在Flink系统中的数据可以有三种时间属性: Event Time 是每条数据在其生产设备上发生 ...
- flink中对于window和watermark的一些理解
package com.chenxiang.flink.demo; import java.io.IOException; import java.net.ServerSocket; import j ...
- Flink中的window、watermark和ProcessFunction
一.Flink中的window 1,window简述 window 是一种切割无限数据为有限块进行处理的手段.Window 是无限数据流处理的核心,Window 将一个无限的 stream 拆分成有 ...
- PHP预定义接口之 ArrayAccess
最近这段时间回家过年了,博客也没有更新,感觉少学习了好多东西,也错失了好多的学习机会,就像大家在春节抢红包时常说的一句话:一不留神错过了好几亿.废话少说,这篇博客给大家说说关于PHP预定义接口中常用到 ...
- VS2013 预定义的宏
Visual Studio 2013 预定义的宏 https://msdn.microsoft.com/zh-cn/library/b0084kay(v=vs.120).aspx 列出预定义的 ANS ...
- 关于标准C语言的预定义宏【转】
标准C语言预处理要求定义某些对象宏,每个预定义宏的名称一两个下划线字符开头和结尾,这些预定义宏不能被取消定义(#undef)或由编程人员重新定义.下面预定义宏表,被我抄了下来. __LINE__ 当 ...
随机推荐
- asp.net core 系列 12 选项 TOptions
一.概述 本章讲的选项模式是对Configuration配置的功能扩展. 讲这篇时有个专用名词叫“选项类(TOptions)” .该选项类作用是指:把选项类中的属性与配置来源中的键关联起来.举个例,假 ...
- asp.net core 系列 6 MVC框架路由(下)
一.URL 生成 接着上篇讲MVC的路由,MVC 应用程序可以使用路由的 URL 生成功能,生成指向操作的 URL 链接. 生成 URL 可消除硬编码 URL,使代码更稳定.更易维护. 此部分重点介绍 ...
- redis 系列14 有序集合对象
一. 有序集合概述 Redis 有序集合对象和集合对象一样也是string类型元素的集合,且不允许重复的成员.不同的是每个元素都会关联一个double类型的分数.redis正是通过分数来为集合中的成员 ...
- Lucene 09 - 什么是Lucene的高亮显示 + Java API实现高亮显示
目录 1 什么是高亮显示 2 高亮显示实现 2.1 配置pom.xml文件, 加入高亮显示支持 2.2 代码实现 2.3 自定义html标签高亮显示 1 什么是高亮显示 高亮显示是全文检索的一个特点, ...
- 知其所以然~redis的原子性
原子性 原子性是数据库的事务中的特性.在数据库事务的情景下,原子性指的是:一个事务(transaction)中的所有操作,要么全部完成,要么全部不完成,不会结束在中间某个环节. 对于Redis而言,命 ...
- Java开发知识之Java字符串类
Java开发知识之Java字符串类 一丶简介 任何语言中.字符串都是很重要的.都涉及到字符串的处理. 例如C++中. 字符串使用内存. 并提供相应的函数进行处理 strcmp strcat strcp ...
- msf中exploit的web_delivery模块
背景:目标设备存在远程文件包含漏洞或者命令注入漏洞,想在目标设备上加载webshell,但不想在目标设备硬盘上留下任何webshell文件信息 解决思路:让目标设备从远端服务器加载webshell代码 ...
- [转]GitLab-CI与GitLab-Runner
本文转自:https://www.jianshu.com/p/2b43151fb92e 一.持续集成(Continuous Integration) 要了解GitLab-CI与GitLab Runne ...
- nth-child(n)和nth-of-type(n)的区别
1.官方解释: p:nth-child(2) 选择属于其父元素的第二个子元素的每个 <p> 元素. p:nth-of-type(2) 选择属于其父元素第二个 <p> 元 ...
- 修改SublimeText3插件Emmet生成HTML中lang属性的默认值
打开Preferences → Package Settings → Emmet → Settings-User,输入如下代码并保存: { "snippets": { " ...