在WindowedStream上可以执行,如reduce,aggregate,min,max等操作

关键是要理解windowOperator对KVState的运用,因为window是用它来存储window buffer的

采用不同的KVState,会有不同的效果,如ReduceState,ListState

 

Reduce

 

/**
* Applies the given window function to each window. The window function is called for each
* evaluation of the window for each key individually. The output of the window function is
* interpreted as a regular non-windowed stream.
*
* <p>
* Arriving data is incrementally aggregated using the given reducer.
*
* @param reduceFunction The reduce function that is used for incremental aggregation.
* @param function The window function.
* @param resultType Type information for the result type of the window function.
* @param legacyWindowOpType When migrating from an older Flink version, this flag indicates
* the type of the previous operator whose state we inherit.
* @return The data stream that is the result of applying the window function to the window.
*/
private <R> SingleOutputStreamOperator<R> reduce(
ReduceFunction<T> reduceFunction,
WindowFunction<T, R, K, W> function,
TypeInformation<R> resultType,
LegacyWindowOperatorType legacyWindowOpType) { String opName;
KeySelector<T, K> keySel = input.getKeySelector(); OneInputStreamOperator<T, R> operator; if (evictor != null) {
@SuppressWarnings({"unchecked", "rawtypes"})
TypeSerializer<StreamRecord<T>> streamRecordSerializer =
(TypeSerializer<StreamRecord<T>>) new StreamElementSerializer(input.getType().createSerializer(getExecutionEnvironment().getConfig())); ListStateDescriptor<StreamRecord<T>> stateDesc = //如果有evictor,这里state是list state,需要把windows整个cache下来,这样才能去evict
new ListStateDescriptor<>("window-contents", streamRecordSerializer); opName = "TriggerWindow(" + windowAssigner + ", " + stateDesc + ", " + trigger + ", " + evictor + ", " + udfName + ")"; //reduce的op name是这样拼的,可以看出window的所有相关配置 operator =
new EvictingWindowOperator<>(windowAssigner,
windowAssigner.getWindowSerializer(getExecutionEnvironment().getConfig()),
keySel,
input.getKeyType().createSerializer(getExecutionEnvironment().getConfig()),
stateDesc,
new InternalIterableWindowFunction<>(new ReduceApplyWindowFunction<>(reduceFunction, function)),
trigger,
evictor,
allowedLateness); } else { //如果没有evictor
ReducingStateDescriptor<T> stateDesc = new ReducingStateDescriptor<>("window-contents", //这里就是ReducingState,不需要cache整个list,所以效率更高
reduceFunction, //reduce的逻辑
input.getType().createSerializer(getExecutionEnvironment().getConfig())); opName = "TriggerWindow(" + windowAssigner + ", " + stateDesc + ", " + trigger + ", " + udfName + ")"; operator =
new WindowOperator<>(windowAssigner,
windowAssigner.getWindowSerializer(getExecutionEnvironment().getConfig()),
keySel,
input.getKeyType().createSerializer(getExecutionEnvironment().getConfig()),
stateDesc,
new InternalSingleValueWindowFunction<>(function),
trigger,
allowedLateness,
legacyWindowOpType);
} return input.transform(opName, resultType, operator);
}

 

reduceFunction,就是reduce的逻辑,一般只是指定这个参数

 

WindowFunction<T, R, K, W> function

TypeInformation<R> resultType

   /**
* Applies a reduce function to the window. The window function is called for each evaluation
* of the window for each key individually. The output of the reduce function is interpreted
* as a regular non-windowed stream.
*/

这个function是WindowFunction,在window被fire时调用,resultType是WindowFunction的返回值,通过reduce,windowedStream会成为non-windowed stream

   /**
* Emits the contents of the given window using the {@link InternalWindowFunction}.
*/
@SuppressWarnings("unchecked")
private void emitWindowContents(W window, ACC contents) throws Exception {
timestampedCollector.setAbsoluteTimestamp(window.maxTimestamp());
userFunction.apply(context.key, context.window, contents, timestampedCollector);
}

可以看到WindowFunction是对于每个key的window都会调用一遍

public void onEventTime(InternalTimer<K, W> timer) throws Exception {

    TriggerResult triggerResult = context.onEventTime(timer.getTimestamp());
if (triggerResult.isFire()) {
emitWindowContents(context.window, contents); //当window被fire的时候,调用
}
}

context.window是记录window的元数据,比如TimeWindow记录开始,结束时间

contents,是windowState,包含真正的数据

 

默认不指定,给定是

PassThroughWindowFunction
public class PassThroughWindowFunction<K, W extends Window, T> implements WindowFunction<T, T, K, W> {

    private static final long serialVersionUID = 1L;

    @Override
public void apply(K k, W window, Iterable<T> input, Collector<T> out) throws Exception {
for (T in: input) {
out.collect(in);
}
}
}

 

继续现在WindowOperator

    @Override
public void processElement(StreamRecord<IN> element) throws Exception { for (W window: elementWindows) { //对于每个被assign的window // drop if the window is already late
if (isLate(window)) {
continue;
} windowState.setCurrentNamespace(window);
windowState.add(element.getValue()); //add element的值

 

windowState在WindowOperator.open中被初始化,

     public void open() throws Exception {
// create (or restore) the state that hold the actual window contents
// NOTE - the state may be null in the case of the overriding evicting window operator
if (windowStateDescriptor != null) {
windowState = (InternalAppendingState<W, IN, ACC>) getOrCreateKeyedState(windowSerializer, windowStateDescriptor);
}

 

AbstractStreamOperator
     protected <N, S extends State, T> S getOrCreateKeyedState(
TypeSerializer<N> namespaceSerializer,
StateDescriptor<S, T> stateDescriptor) throws Exception { if (keyedStateStore != null) {
return keyedStateBackend.getOrCreateKeyedState(namespaceSerializer, stateDescriptor);
}

 

AbstractKeyedStateBackend
     public <N, S extends State, V> S getOrCreateKeyedState(
final TypeSerializer<N> namespaceSerializer,
StateDescriptor<S, V> stateDescriptor) throws Exception { // create a new blank key/value state
S state = stateDescriptor.bind(new StateBackend() {
@Override
public <T> ValueState<T> createValueState(ValueStateDescriptor<T> stateDesc) throws Exception {
return AbstractKeyedStateBackend.this.createValueState(namespaceSerializer, stateDesc);
} @Override
public <T> ListState<T> createListState(ListStateDescriptor<T> stateDesc) throws Exception {
return AbstractKeyedStateBackend.this.createListState(namespaceSerializer, stateDesc);
} @Override
public <T> ReducingState<T> createReducingState(ReducingStateDescriptor<T> stateDesc) throws Exception {
return AbstractKeyedStateBackend.this.createReducingState(namespaceSerializer, stateDesc);
} @Override
public <T, ACC, R> AggregatingState<T, R> createAggregatingState(
AggregatingStateDescriptor<T, ACC, R> stateDesc) throws Exception {
return AbstractKeyedStateBackend.this.createAggregatingState(namespaceSerializer, stateDesc);
}

可以看到这里根据不同的StateDescriptor调用bind,会生成不同的state

如果前面用的是ReducingStateDescriptor

    @Override
public ReducingState<T> bind(StateBackend stateBackend) throws Exception {
return stateBackend.createReducingState(this);
}

 

所以如果用的是RockDB,

那么创建的是RocksDBReducingState

所以调用add的逻辑,

public class RocksDBReducingState<K, N, V>
extends AbstractRocksDBState<K, N, ReducingState<V>, ReducingStateDescriptor<V>, V>
implements InternalReducingState<N, V> {
@Override
public void add(V value) throws IOException {
try {
writeCurrentKeyWithGroupAndNamespace();
byte[] key = keySerializationStream.toByteArray();
byte[] valueBytes = backend.db.get(columnFamily, key); DataOutputViewStreamWrapper out = new DataOutputViewStreamWrapper(keySerializationStream);
if (valueBytes == null) {
keySerializationStream.reset();
valueSerializer.serialize(value, out);
backend.db.put(columnFamily, writeOptions, key, keySerializationStream.toByteArray());
} else {
V oldValue = valueSerializer.deserialize(new DataInputViewStreamWrapper(new ByteArrayInputStream(valueBytes)));
V newValue = reduceFunction.reduce(oldValue, value); //使用reduce函数合并value
keySerializationStream.reset();
valueSerializer.serialize(newValue, out);
backend.db.put(columnFamily, writeOptions, key, keySerializationStream.toByteArray()); //将新的value put到backend中
}
} catch (Exception e) {
throw new RuntimeException("Error while adding data to RocksDB", e);
}
}

 

aggregate

这里用AggregatingStateDescriptor

并且多个参数,TypeInformation<ACC> accumulatorType,因为aggregate是不断的更新这个accumulator

/**
* Applies the given window function to each window. The window function is called for each
* evaluation of the window for each key individually. The output of the window function is
* interpreted as a regular non-windowed stream.
*
* <p>Arriving data is incrementally aggregated using the given aggregate function. This means
* that the window function typically has only a single value to process when called.
*
* @param aggregateFunction The aggregation function that is used for incremental aggregation.
* @param windowFunction The window function.
* @param accumulatorType Type information for the internal accumulator type of the aggregation function
* @param resultType Type information for the result type of the window function
*
* @return The data stream that is the result of applying the window function to the window.
*
* @param <ACC> The type of the AggregateFunction's accumulator
* @param <V> The type of AggregateFunction's result, and the WindowFunction's input
* @param <R> The type of the elements in the resulting stream, equal to the
* WindowFunction's result type
*/
public <ACC, V, R> SingleOutputStreamOperator<R> aggregate(
AggregateFunction<T, ACC, V> aggregateFunction,
WindowFunction<V, R, K, W> windowFunction,
TypeInformation<ACC> accumulatorType,
TypeInformation<V> aggregateResultType,
TypeInformation<R> resultType) { if (evictor != null) { //evictor仍然是用ListState
} else {
AggregatingStateDescriptor<T, ACC, V> stateDesc = new AggregatingStateDescriptor<>("window-contents",
aggregateFunction, accumulatorType.createSerializer(getExecutionEnvironment().getConfig())); opName = "TriggerWindow(" + windowAssigner + ", " + stateDesc + ", " + trigger + ", " + udfName + ")"; operator = new WindowOperator<>(windowAssigner,
windowAssigner.getWindowSerializer(getExecutionEnvironment().getConfig()),
keySel,
input.getKeyType().createSerializer(getExecutionEnvironment().getConfig()),
stateDesc,
new InternalSingleValueWindowFunction<>(windowFunction),
trigger,
allowedLateness);
} return input.transform(opName, resultType, operator);
}

最终用到,

RocksDBAggregatingState
    @Override
public R get() throws IOException {
try {
// prepare the current key and namespace for RocksDB lookup
writeCurrentKeyWithGroupAndNamespace();
final byte[] key = keySerializationStream.toByteArray(); // get the current value
final byte[] valueBytes = backend.db.get(columnFamily, key); if (valueBytes == null) {
return null;
} ACC accumulator = valueSerializer.deserialize(new DataInputViewStreamWrapper(new ByteArrayInputStreamWithPos(valueBytes)));
return aggFunction.getResult(accumulator); //返回accumulator的值
}
catch (IOException | RocksDBException e) {
throw new IOException("Error while retrieving value from RocksDB", e);
}
} @Override
public void add(T value) throws IOException {
try {
// prepare the current key and namespace for RocksDB lookup
writeCurrentKeyWithGroupAndNamespace();
final byte[] key = keySerializationStream.toByteArray();
keySerializationStream.reset(); // get the current value
final byte[] valueBytes = backend.db.get(columnFamily, key); // deserialize the current accumulator, or create a blank one
final ACC accumulator = valueBytes == null ? //create new或从state中反序列化出来
aggFunction.createAccumulator() :
valueSerializer.deserialize(new DataInputViewStreamWrapper(new ByteArrayInputStreamWithPos(valueBytes))); // aggregate the value into the accumulator
aggFunction.add(value, accumulator); //更新accumulator // serialize the new accumulator
final DataOutputViewStreamWrapper out = new DataOutputViewStreamWrapper(keySerializationStream);
valueSerializer.serialize(accumulator, out); // write the new value to RocksDB
backend.db.put(columnFamily, writeOptions, key, keySerializationStream.toByteArray());
}
catch (IOException | RocksDBException e) {
throw new IOException("Error while adding value to RocksDB", e);
}
}

 

给个aggFunction的例子,

    private static class AddingFunction implements AggregateFunction<Long, MutableLong, Long> {

        @Override
public MutableLong createAccumulator() {
return new MutableLong();
} @Override
public void add(Long value, MutableLong accumulator) {
accumulator.value += value;
} @Override
public Long getResult(MutableLong accumulator) {
return accumulator.value;
} @Override
public MutableLong merge(MutableLong a, MutableLong b) {
a.value += b.value;
return a;
}
} private static final class MutableLong {
long value;
}

aggregate和reduce比,更通用,

reduce, A1 reduce A2 = A3

aggregate,a1 a2… aggregate = b

 

apply

更通用,就是不会再cache的时候做预算,而是需要cache整个windows数据,在触发的时候再apply

   /**
* Applies the given window function to each window. The window function is called for each
* evaluation of the window for each key individually. The output of the window function is
* interpreted as a regular non-windowed stream.
*
* <p>
* Note that this function requires that all data in the windows is buffered until the window
* is evaluated, as the function provides no means of incremental aggregation.
*
* @param function The window function.
* @param resultType Type information for the result type of the window function
* @return The data stream that is the result of applying the window function to the window.
*/
public <R> SingleOutputStreamOperator<R> apply(WindowFunction<T, R, K, W> function, TypeInformation<R> resultType) { if (evictor != null) {
//
} else {
ListStateDescriptor<T> stateDesc = new ListStateDescriptor<>("window-contents", //因为要cache所有数据,所以一定是ListState
input.getType().createSerializer(getExecutionEnvironment().getConfig())); opName = "TriggerWindow(" + windowAssigner + ", " + stateDesc + ", " + trigger + ", " + udfName + ")"; operator =
new WindowOperator<>(windowAssigner,
windowAssigner.getWindowSerializer(getExecutionEnvironment().getConfig()),
keySel,
input.getKeyType().createSerializer(getExecutionEnvironment().getConfig()),
stateDesc,
new InternalIterableWindowFunction<>(function),
trigger,
allowedLateness,
legacyWindowOpType);
} return input.transform(opName, resultType, operator);
}

这里就很简单了,你必须要给出WindowFunction,用于处理window触发时的结果

这里也需要指明resultType

而且使用ListStateDescriptor,这种state只是把element加到list中

 

 

AggregationFunction

如sum,min,max

   /**
* Applies an aggregation that sums every window of the data stream at the
* given position.
*
* @param positionToSum The position in the tuple/array to sum
* @return The transformed DataStream.
*/
public SingleOutputStreamOperator<T> sum(int positionToSum) {
return aggregate(new SumAggregator<>(positionToSum, input.getType(), input.getExecutionConfig()));
}

 

public class SumAggregator<T> extends AggregationFunction<T> {

 

public abstract class AggregationFunction<T> implements ReduceFunction<T> {
private static final long serialVersionUID = 1L; public enum AggregationType {
SUM, MIN, MAX, MINBY, MAXBY,
} }

可以看到,无法顾名思义,这些AggregationFunction,是用reduce实现的

Flink – WindowedStream的更多相关文章

  1. Flink - DataStream

    先看例子, final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); D ...

  2. Flink DataStream API Programming Guide

    Example Program The following program is a complete, working example of streaming window word count ...

  3. Flink Program Guide (2) -- 综述 (DataStream API编程指导 -- For Java)

    v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VM ...

  4. Flink窗口介绍及应用

    Windows是Flink流计算的核心,本文将概括的介绍几种窗口的概念,重点只放在窗口的应用上. 本实验的数据采用自拟电影评分数据(userId, movieId, rating, timestamp ...

  5. Flink官网文档翻译

    http://ifeve.com/flink-quick-start/ http://vinoyang.com/2016/05/02/flink-concepts/ http://wuchong.me ...

  6. Flink源码分析

    http://vinoyang.com/ http://wuchong.me Apache Flink源码解析之stream-source https://yq.aliyun.com/articles ...

  7. kafka传数据到Flink存储到mysql之Flink使用SQL语句聚合数据流(设置时间窗口,EventTime)

    网上没什么资料,就分享下:) 简单模式:kafka传数据到Flink存储到mysql 可以参考网站: 利用Flink stream从kafka中写数据到mysql maven依赖情况: <pro ...

  8. <译>Flink编程指南

    Flink 的流数据 API 编程指南 Flink 的流数据处理程序是常规的程序 ,通过再流数据上,实现了各种转换 (比如 过滤, 更新中间状态, 定义窗口, 聚合).流数据可以来之多种数据源 (比如 ...

  9. Flink入门宝典(详细截图版)

    本文基于java构建Flink1.9版本入门程序,需要Maven 3.0.4 和 Java 8 以上版本.需要安装Netcat进行简单调试. 这里简述安装过程,并使用IDEA进行开发一个简单流处理程序 ...

随机推荐

  1. Summary Checklist for Run-Time Kubernetes Security

    Here is a convenient checklist summary of the security protections to review for securing Kubernetes ...

  2. 企业安全建设之搭建开源SIEM平台

    https://www.freebuf.com/special/127172.html https://www.freebuf.com/special/127264.html https://www. ...

  3. 施工测量中Cad一些非常有用的插件

    经常会遇到坐标在cad中批量展点.从cad中批量保存坐标点.导入cad中的坐标怎么才能有点号,怎么快速标注cad里的坐标点··· ··· 这一切都是可以程序化的,cad是可以二次开发的,我经常用易语言 ...

  4. 聊聊Python中的生成器和迭代器

    Python中有两个重要的概念,生成器和迭代器,这里详细记录一下. 1. 生成器 什么是生成器呢? 通过列表生成式,我们可以直接创建一个列表.但是,受到内存限制,列表容量肯定是有限的.而且,创建一个包 ...

  5. hdoj:2047

    #include <iostream> using namespace std; ] = { , , }; // O,E 组成长度为n的数量 long long fib(int n) { ...

  6. Java知多少(48)try语句的嵌套

    Try语句可以被嵌套.也就是说,一个try语句可以在另一个try块内部.每次进入try语句,异常的前后关系都会被推入堆栈.如果一个内部的try语句不含特殊异常的catch处理程序,堆栈将弹出,下一个t ...

  7. Java查找出现的单词

    如何找到一个单词的每个出现? 解决方法 下面的例子演示了如何使用Pattern.compile()方法和m.group()方法找到一个词出现次数. import java.util.regex.Mat ...

  8. 正确释放WORD对象(COM组件) COMException: 被调用的对象已与其客户端断开连接

    本来form method=post本页面 修改为其他页面 action=save.aspx后没问题 其他问题可参考以下: 引自:http://topic.csdn.net/u/20090108/17 ...

  9. docker中,将容器中的文件拷贝到宿主机上

    需求说明: 今天在做docker修改配置文件的问题,一个容器要使用另外容器的一个配置文件,但是在宿主机上没有, 就考虑将容器中的文件拷贝到宿主机上,在此记录下操作过程. 操作过程: 1.通过docke ...

  10. 5 -- Hibernate的基本用法 --5 1 持久化类的要求

    1.  提供一个无参数的构造器:所有的持久化类都应该提供一个无参数的构造器,这个构造器可以不采用public访问控制符.只要提供了无参数的构造器,Hibernate就可以使用Constructor.n ...