Tumbing Windows:滚动窗口,窗口之间时间点不重叠。它是按照固定的时间,或固定的事件个数划分的,分别可以叫做滚动时间窗口和滚动事件窗口。
Sliding Windows:滑动窗口,窗口之间时间点存在重叠。对于某些应用,它们需要的时间是不间断的,需要平滑的进行窗口聚合。

         例如,可以每30s记算一次最近1分钟用户所购买的商品数量的总数,这个就是时间滑动窗口;或者每10个客户点击购买,然后就计算一下最近100个客户购买的商品的总和,这个就是事件滑动窗口。
Session Windows:会话窗口,经过一段设置时间无数据认为窗口完成。

在默认的场景下,所有的窗口都是到达时间语义上的windown end time后触发对整个窗口元素的计算,但是在部分场景的情况下,业务方需要在窗口时间没有结束的情况下也可以获得当前的聚合结果,比如每隔五分钟获取当前小时的sum值,这种情况下,官方提供了对于上述窗口的定制化计算器ContinuousEventTimeTriggerContinuousProcessingTimeTrigger

下面是一个使用ContinuousProcessingTimeTrigger的简单例子:

public class ContinueTriggerDemo {

    public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub String hostName = "localhost";
Integer port = Integer.parseInt("");
; // set up the execution environment
final StreamExecutionEnvironment env = StreamExecutionEnvironment
.getExecutionEnvironment(); // 从指定socket获取输入数据
DataStream<String> text = env.socketTextStream(hostName, port); text.flatMap(new LineSplitter()) //数据语句分词
.keyBy() // 流按照单词分区
.window(TumblingProcessingTimeWindows.of(Time.seconds()))// 设置一个120s的滚动窗口
.trigger(ContinuousProcessingTimeTrigger.of(Time.seconds()))//窗口每统计一次当前计算结果
.sum()// count求和
.map(new Mapdemo())//输出结果加上时间戳
.print(); env.execute("Java WordCount from SocketTextStream Example"); } /**
* Implements the string tokenizer that splits sentences into words as a
* user-defined FlatMapFunction. The function takes a line (String) and
* splits it into multiple pairs in the form of "(word,1)" (Tuple2<String,
* Integer>).
*/
public static final class LineSplitter implements
FlatMapFunction<String, Tuple2<String, Integer>> { @Override
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
// normalize and split the line
String[] tokens = value.toLowerCase().split("\\W+"); // emit the pairs
for (String token : tokens) {
if (token.length() > ) {
out.collect(new Tuple2<String, Integer>(token, ));
}
}
}
} public static final class Mapdemo
implements
MapFunction<Tuple2<String, Integer>, Tuple3<String, String, Integer>> { @Override
public Tuple3<String, String, Integer> map(Tuple2<String, Integer> value)
throws Exception {
// TODO Auto-generated method stub DateFormat format2 = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
String s = format2.format(new Date()); return new Tuple3<String, String, Integer>(value.f0, s, value.f1);
}
} }

在本地启动端口 :nc -lk 8001 并启动flink程序
输入数据:

           aa
aa
bb

观察程序数据结果日志

5> (aa,2018-07-30 16:08:20,2)
5> (bb,2018-07-30 16:08:20,1)
5> (aa,2018-07-30 16:08:40,2)
5> (bb,2018-07-30 16:08:40,1)
5> (aa,2018-07-30 16:09:00,2)
5> (bb,2018-07-30 16:09:00,1)
5> (aa,2018-07-30 16:09:20,2)
5> (bb,2018-07-30 16:09:20,1)
5> (aa,2018-07-30 16:09:40,2)
5> (bb,2018-07-30 16:09:40,1)

在上述输入后继续输入

    aa

日志结果统计为

5> (aa,2018-07-30 16:10:00,3)
5> (bb,2018-07-30 16:10:00,1)

根据日志数据可见,flink轻松实现了一个窗口时间长度为120s并每20s向下游发送一次窗口当前聚合结果的功能。

附源码:

源码路径:flink\flink-streaming-java\src\main\java\org\apache\flink\streaming\api\windowing\triggers\ContinuousProcessingTimeTrigger.java

/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/ package org.apache.flink.streaming.api.windowing.triggers; import org.apache.flink.annotation.PublicEvolving;
import org.apache.flink.annotation.VisibleForTesting;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.common.state.ReducingState;
import org.apache.flink.api.common.state.ReducingStateDescriptor;
import org.apache.flink.api.common.typeutils.base.LongSerializer;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.Window; /**
* A {@link Trigger} that continuously fires based on a given time interval as measured by
* the clock of the machine on which the job is running.
*
* @param <W> The type of {@link Window Windows} on which this trigger can operate.
*/
@PublicEvolving
public class ContinuousProcessingTimeTrigger<W extends Window> extends Trigger<Object, W> {
private static final long serialVersionUID = 1L; private final long interval; /** When merging we take the lowest of all fire timestamps as the new fire timestamp. */
private final ReducingStateDescriptor<Long> stateDesc =
new ReducingStateDescriptor<>("fire-time", new Min(), LongSerializer.INSTANCE); private ContinuousProcessingTimeTrigger(long interval) {
this.interval = interval;
} @Override
public TriggerResult onElement(Object element, long timestamp, W window, TriggerContext ctx) throws Exception {
ReducingState<Long> fireTimestamp = ctx.getPartitionedState(stateDesc); timestamp = ctx.getCurrentProcessingTime(); if (fireTimestamp.get() == null) {
long start = timestamp - (timestamp % interval);
long nextFireTimestamp = start + interval; ctx.registerProcessingTimeTimer(nextFireTimestamp); fireTimestamp.add(nextFireTimestamp);
return TriggerResult.CONTINUE;
}
return TriggerResult.CONTINUE;
} @Override
public TriggerResult onEventTime(long time, W window, TriggerContext ctx) throws Exception {
return TriggerResult.CONTINUE;
} @Override
public TriggerResult onProcessingTime(long time, W window, TriggerContext ctx) throws Exception {
ReducingState<Long> fireTimestamp = ctx.getPartitionedState(stateDesc); if (fireTimestamp.get().equals(time)) {
fireTimestamp.clear();
fireTimestamp.add(time + interval);
ctx.registerProcessingTimeTimer(time + interval);
return TriggerResult.FIRE;
}
return TriggerResult.CONTINUE;
} @Override
public void clear(W window, TriggerContext ctx) throws Exception {
ReducingState<Long> fireTimestamp = ctx.getPartitionedState(stateDesc);
long timestamp = fireTimestamp.get();
ctx.deleteProcessingTimeTimer(timestamp);
fireTimestamp.clear();
} @Override
public boolean canMerge() {
return true;
} @Override
public void onMerge(W window,
OnMergeContext ctx) {
ctx.mergePartitionedState(stateDesc);
} @VisibleForTesting
public long getInterval() {
return interval;
} @Override
public String toString() {
return "ContinuousProcessingTimeTrigger(" + interval + ")";
} /**
* Creates a trigger that continuously fires based on the given interval.
*
* @param interval The time interval at which to fire.
* @param <W> The type of {@link Window Windows} on which this trigger can operate.
*/
public static <W extends Window> ContinuousProcessingTimeTrigger<W> of(Time interval) {
return new ContinuousProcessingTimeTrigger<>(interval.toMilliseconds());
} private static class Min implements ReduceFunction<Long> {
private static final long serialVersionUID = 1L; @Override
public Long reduce(Long value1, Long value2) throws Exception {
return Math.min(value1, value2);
}
}
}

源码路径:flink\flink-streaming-java\src\main\java\org\apache\flink\streaming\api\windowing\triggers\ContinuousEventTimeTrigger.java

/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/ package org.apache.flink.streaming.api.windowing.triggers; import org.apache.flink.annotation.PublicEvolving;
import org.apache.flink.annotation.VisibleForTesting;
import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.common.state.ReducingState;
import org.apache.flink.api.common.state.ReducingStateDescriptor;
import org.apache.flink.api.common.typeutils.base.LongSerializer;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.Window; /**
* A {@link Trigger} that continuously fires based on a given time interval. This fires based
* on {@link org.apache.flink.streaming.api.watermark.Watermark Watermarks}.
*
* @see org.apache.flink.streaming.api.watermark.Watermark
*
* @param <W> The type of {@link Window Windows} on which this trigger can operate.
*/
@PublicEvolving
public class ContinuousEventTimeTrigger<W extends Window> extends Trigger<Object, W> {
private static final long serialVersionUID = 1L; private final long interval; /** When merging we take the lowest of all fire timestamps as the new fire timestamp. */
private final ReducingStateDescriptor<Long> stateDesc =
new ReducingStateDescriptor<>("fire-time", new Min(), LongSerializer.INSTANCE); private ContinuousEventTimeTrigger(long interval) {
this.interval = interval;
} @Override
public TriggerResult onElement(Object element, long timestamp, W window, TriggerContext ctx) throws Exception { if (window.maxTimestamp() <= ctx.getCurrentWatermark()) {
// if the watermark is already past the window fire immediately
return TriggerResult.FIRE;
} else {
ctx.registerEventTimeTimer(window.maxTimestamp());
} ReducingState<Long> fireTimestamp = ctx.getPartitionedState(stateDesc);
if (fireTimestamp.get() == null) {
long start = timestamp - (timestamp % interval);
long nextFireTimestamp = start + interval;
ctx.registerEventTimeTimer(nextFireTimestamp);
fireTimestamp.add(nextFireTimestamp);
} return TriggerResult.CONTINUE;
} @Override
public TriggerResult onEventTime(long time, W window, TriggerContext ctx) throws Exception { if (time == window.maxTimestamp()){
return TriggerResult.FIRE;
} ReducingState<Long> fireTimestampState = ctx.getPartitionedState(stateDesc); Long fireTimestamp = fireTimestampState.get(); if (fireTimestamp != null && fireTimestamp == time) {
fireTimestampState.clear();
fireTimestampState.add(time + interval);
ctx.registerEventTimeTimer(time + interval);
return TriggerResult.FIRE;
} return TriggerResult.CONTINUE;
} @Override
public TriggerResult onProcessingTime(long time, W window, TriggerContext ctx) throws Exception {
return TriggerResult.CONTINUE;
} @Override
public void clear(W window, TriggerContext ctx) throws Exception {
ReducingState<Long> fireTimestamp = ctx.getPartitionedState(stateDesc);
Long timestamp = fireTimestamp.get();
if (timestamp != null) {
ctx.deleteEventTimeTimer(timestamp);
fireTimestamp.clear();
}
} @Override
public boolean canMerge() {
return true;
} @Override
public void onMerge(W window, OnMergeContext ctx) throws Exception {
ctx.mergePartitionedState(stateDesc);
Long nextFireTimestamp = ctx.getPartitionedState(stateDesc).get();
if (nextFireTimestamp != null) {
ctx.registerEventTimeTimer(nextFireTimestamp);
}
} @Override
public String toString() {
return "ContinuousEventTimeTrigger(" + interval + ")";
} @VisibleForTesting
public long getInterval() {
return interval;
} /**
* Creates a trigger that continuously fires based on the given interval.
*
* @param interval The time interval at which to fire.
* @param <W> The type of {@link Window Windows} on which this trigger can operate.
*/
public static <W extends Window> ContinuousEventTimeTrigger<W> of(Time interval) {
return new ContinuousEventTimeTrigger<>(interval.toMilliseconds());
} private static class Min implements ReduceFunction<Long> {
private static final long serialVersionUID = 1L; @Override
public Long reduce(Long value1, Long value2) throws Exception {
return Math.min(value1, value2);
}
}
}

flink window的early计算的更多相关文章

  1. 一文搞懂Flink Window机制

    Windows是处理无线数据流的核心,它将流分割成有限大小的桶(buckets),并在其上执行各种计算. 窗口化的Flink程序的结构通常如下,有分组流(keyed streams)和无分组流(non ...

  2. Flink – window operator

      参考, http://wuchong.me/blog/2016/05/25/flink-internals-window-mechanism/ http://wuchong.me/blog/201 ...

  3. Flink window机制

    此文已由作者岳猛授权网易云社区发布. 欢迎访问网易云社区,了解更多网易技术产品运营经验. 问题 window是解决流计算中的什么问题? 怎么划分window?有哪几种window?window与时间属 ...

  4. flink window实例分析

    window是处理数据的核心.按需选择你需要的窗口类型后,它会将传入的原始数据流切分成多个buckets,所有计算都在window中进行. flink本身提供的实例程序TopSpeedWindowin ...

  5. 【翻译】Flink window

    本文翻译自flink官网:https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/stream/operators/window ...

  6. Apache Flink - Window

    Window: 在Streaming中,数据是无限且连续的,我们不可能等所有数据都到才进行处理,我们可以来一个就处理一下,但是有时我们需要做一些聚合类的处理,例如:在过去的1分钟内有多少用户点击了我们 ...

  7. Flink Window窗口机制

    总览 Window 是flink处理无限流的核心,Windows将流拆分为有限大小的"桶",我们可以在其上应用计算. Flink 认为 Batch 是 Streaming 的一个特 ...

  8. Flink Window&Time 原理

    Flink 中可以使用一套 API 完成对有界数据集以及无界数据的统一处理,而无界数据集的处理一般会伴随着对某些固定时间间隔的数据聚合处理.比如:每五分钟统计一次系统活跃用户.每十秒更新热搜榜单等等 ...

  9. flink Window的Timestamps/Watermarks和allowedLateness的区别

    Watermartks是通过additional的时间戳来控制窗口激活的时间,allowedLateness来控制窗口的销毁时间.   注: 因为此特性包括官方文档在1.3-1.5版本均未做改变,所以 ...

随机推荐

  1. EF三种编程方式图文详解

    Entity Framework4.1之前EF支持“Database First”和“Model First”编程方式,从EF4.1开始EF开始支持支持“Code First”编程方式,今天简单看一下 ...

  2. Chapter 4 Invitations——28

    "Oh, thanks, now that's all cleared up." Heavy sarcasm. “哦,真感谢,现在一切都清楚了.” 我很讽刺的说道 I realiz ...

  3. wsl中使用原生docker

    之前介绍过windows中安装docker,但是它需要用到hyper-v.hyper-v与vm不兼容非常之不方便.不过发现windows有wsl(linux子系统)遂试验,结果非常nice功能一应俱全 ...

  4. .Net Core扩展 SharpPlugs简单上手

    SharpPlugs .Net Core 鋒利扩展,这是本人的开源项目 地址是 GitHub地址 大家喜欢 的话可以加个星哦 当前功能 DI AutoMapper ElasticSearch WebA ...

  5. leetcode — merge-sorted-array

    import java.util.Arrays; /** * Source : https://oj.leetcode.com/problems/merge-sorted-array/ * * * G ...

  6. [十四]JavaIO之PrintStream

    功能简介   PrintStream 为其他输出流添加了功能,使它们能够方便地打印各种数据值表示形式 装饰器模式中具体的装饰类 它提供的功能就是便捷的打印各种数据形式 FilterInputStrea ...

  7. Django学习笔记(5)——cookie和session

    一,前言 1.1,什么是会话跟踪技术 在JavaWeb中,客户向某一服务器发出第一个请求开始,会话就开始了,直到客户关闭了浏览器会话结束.在一个会话的多个请求中共享数据,这就是会话跟踪技术. 例如在一 ...

  8. JavaScript小实例:拖拽应用(二)

    经常在网站别人的网站的注册页中看到一个拖拽验证的效果,就是它的验证码刚开始不出来,而是有一个拖拽的条,你必须将这个拖拽条拖到底,验证码才出来,说了感觉跟没说一样,你还是不理解,好吧,我给个图你看看: ...

  9. [MySQL] 测试where group by order by的索引问题

    1. select * from test  where a=xx group by b order by c   如何加索引 CREATE TABLE `index_test` ( `id` int ...

  10. [Go] golang缓冲通道实现资源池

    go的pool资源池:1.当有多个并发请求的时候,比如需要查询数据库2.先创建一个2个容量的数据库连接资源池3.当一个请求过来的时候,去资源池里请求连接资源,肯定是空的就创建一个连接,执行查询,结束后 ...