1. Create environment for stream computing

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.getConfig().disableSysoutLogging();
env.getConfig().setRestartStrategy(RestartStrategies.fixedDelayRestart(4, 10000));
env.enableCheckpointing(5000); // create a checkpoint every 5 seconds
env.getConfig().setGlobalJobParameters(parameterTool); // make parameters available in the web interface
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
public static StreamExecutionEnvironment getExecutionEnvironment() {
if (contextEnvironmentFactory != null) {
return contextEnvironmentFactory.createExecutionEnvironment();
} // because the streaming project depends on "flink-clients" (and not the other way around)
// we currently need to intercept the data set environment and create a dependent stream env.
// this should be fixed once we rework the project dependencies ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
if (env instanceof ContextEnvironment) {
return new StreamContextEnvironment((ContextEnvironment) env);
} else if (env instanceof OptimizerPlanEnvironment || env instanceof PreviewPlanEnvironment) {
return new StreamPlanEnvironment(env);
} else {
return createLocalEnvironment();
}
}

2.  Now we need to add the data source for further computing

DataStream<KafkaEvent> input = env
.addSource( new FlinkKafkaConsumer010<>(
parameterTool.getRequired("input-topic"),
new KafkaEventSchema(),
parameterTool.getProperties()).assignTimestampsAndWatermarks(new CustomWatermarkExtractor()))
.keyBy("word")
.map(new RollingAdditionMapper());
public <OUT> DataStreamSource<OUT> addSource(SourceFunction<OUT> function) {
return addSource(function, "Custom Source");
}
@SuppressWarnings("unchecked")
public <OUT> DataStreamSource<OUT> addSource(SourceFunction<OUT> function, String sourceName, TypeInformation<OUT> typeInfo) { if (typeInfo == null) {
if (function instanceof ResultTypeQueryable) {
typeInfo = ((ResultTypeQueryable<OUT>) function).getProducedType();
} else {
try {
typeInfo = TypeExtractor.createTypeInfo(
SourceFunction.class,
function.getClass(), 0, null, null);
} catch (final InvalidTypesException e) {
typeInfo = (TypeInformation<OUT>) new MissingTypeInfo(sourceName, e);
}
}
} boolean isParallel = function instanceof ParallelSourceFunction; clean(function);
StreamSource<OUT, ?> sourceOperator;
if (function instanceof StoppableFunction) {
sourceOperator = new StoppableStreamSource<>(cast2StoppableSourceFunction(function));
} else {
sourceOperator = new StreamSource<>(function);
} return new DataStreamSource<>(this, typeInfo, sourceOperator, isParallel, sourceName);
}
public <R> SingleOutputStreamOperator<R> map(MapFunction<T, R> mapper) {

        TypeInformation<R> outType = TypeExtractor.getMapReturnTypes(clean(mapper), getType(),
Utils.getCallLocationName(), true); return transform("Map", outType, new StreamMap<>(clean(mapper)));
}
public <R> SingleOutputStreamOperator<R> transform(String operatorName, TypeInformation<R> outTypeInfo, OneInputStreamOperator<T, R> operator) {

        // read the output type of the input Transform to coax out errors about MissingTypeInfo
transformation.getOutputType(); OneInputTransformation<T, R> resultTransform = new OneInputTransformation<>(
this.transformation,
operatorName,
operator,
outTypeInfo,
environment.getParallelism()); @SuppressWarnings({ "unchecked", "rawtypes" })
SingleOutputStreamOperator<R> returnStream = new SingleOutputStreamOperator(environment, resultTransform); getExecutionEnvironment().addOperator(resultTransform); return returnStream;
}
@Internal
public void addOperator(StreamTransformation<?> transformation) {
Preconditions.checkNotNull(transformation, "transformation must not be null.");
this.transformations.add(transformation);
}
protected final List<StreamTransformation<?>> transformations = new ArrayList<>();
public KeyedStream<T, Tuple> keyBy(String... fields) {
return keyBy(new Keys.ExpressionKeys<>(fields, getType()));
} private KeyedStream<T, Tuple> keyBy(Keys<T> keys) {
return new KeyedStream<>(this, clean(KeySelectorUtil.getSelectorForKeys(keys,
getType(), getExecutionConfig())));
}

3. The data from data source will be streamed into Flink Distributed Computing Runtime and the computed result will be transfered to data Sink.

input.addSink(          new FlinkKafkaProducer010<>(
parameterTool.getRequired("output-topic"),
new KafkaEventSchema(),
parameterTool.getProperties()));
public DataStreamSink<T> addSink(SinkFunction<T> sinkFunction) {

        // read the output type of the input Transform to coax out errors about MissingTypeInfo
transformation.getOutputType(); // configure the type if needed
if (sinkFunction instanceof InputTypeConfigurable) {
((InputTypeConfigurable) sinkFunction).setInputType(getType(), getExecutionConfig());
} StreamSink<T> sinkOperator = new StreamSink<>(clean(sinkFunction)); DataStreamSink<T> sink = new DataStreamSink<>(this, sinkOperator); getExecutionEnvironment().addOperator(sink.getTransformation());
return sink;
}
@Internal
public void addOperator(StreamTransformation<?> transformation) {
Preconditions.checkNotNull(transformation, "transformation must not be null.");
this.transformations.add(transformation);
}
protected final List<StreamTransformation<?>> transformations = new ArrayList<>();

4. The last step is to start executing.

env.execute("Kafka 0.10 Example");

The mapper computing template is defined as blow.

private static class RollingAdditionMapper extends RichMapFunction<KafkaEvent, KafkaEvent> {

        private static final long serialVersionUID = 1180234853172462378L;

        private transient ValueState<Integer> currentTotalCount;

        @Override
public KafkaEvent map(KafkaEvent event) throws Exception {
Integer totalCount = currentTotalCount.value(); if (totalCount == null) {
totalCount = 0;
}
totalCount += event.getFrequency(); currentTotalCount.update(totalCount); return new KafkaEvent(event.getWord(), totalCount, event.getTimestamp());
} @Override
public void open(Configuration parameters) throws Exception {
currentTotalCount = getRuntimeContext().getState(new ValueStateDescriptor<>("currentTotalCount", Integer.class));
}
}

http://www.debugrun.com/a/LjK8Nni.html

Flink Flow的更多相关文章

  1. 在 Cloudera Data Flow 上运行你的第一个 Flink 例子

    文档编写目的 Cloudera Data Flow(CDF) 作为 Cloudera 一个独立的产品单元,围绕着实时数据采集,实时数据处理和实时数据分析有多个不同的功能模块,如下图所示: 图中 4 个 ...

  2. Flink Internals

    https://cwiki.apache.org/confluence/display/FLINK/Flink+Internals   Memory Management (Batch API) In ...

  3. Peeking into Apache Flink's Engine Room

    http://flink.apache.org/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html   Join Processin ...

  4. Flink - Juggling with Bits and Bytes

    http://www.36dsj.com/archives/33650 http://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-B ...

  5. Flink资料(3)-- Flink一般架构和处理模型

    Flink一般架构和处理模型 本文翻译自General Architecture and Process Model ----------------------------------------- ...

  6. Flink资料(2)-- 数据流容错机制

    数据流容错机制 该文档翻译自Data Streaming Fault Tolerance,文档描述flink在流式数据流图上的容错机制. ------------------------------- ...

  7. Flink架构、原理与部署测试

    Apache Flink是一个面向分布式数据流处理和批量数据处理的开源计算平台,它能够基于同一个Flink运行时,提供支持流处理和批处理两种类型应用的功能. 现有的开源计算方案,会把流处理和批处理作为 ...

  8. [Note] Apache Flink 的数据流编程模型

    Apache Flink 的数据流编程模型 抽象层次 Flink 为开发流式应用和批式应用设计了不同的抽象层次 状态化的流 抽象层次的最底层是状态化的流,它通过 ProcessFunction 嵌入到 ...

  9. Apache Flink 分布式执行

    Flink 的分布式执行过程包含两个重要的角色,master 和 worker,参与 Flink 程序执行的有多个进程,包括 Job Manager,Task Manager 以及 Job Clien ...

随机推荐

  1. 主流服务器虚拟化技术简单使用——Xen(一)

    Tips:因为博客园网页布局的原因,部分图片显示不清晰,可以放大网页查看清晰图片. 如果系统使用物理机,需要在BIOS里面开启Intel VT-x(或AMD-V),如果是VMware workstat ...

  2. centos的基本命令03(du 查看文件详情,echo清空文件内容)

    1:查看/etc/passwd的内容并打印出行号 强制退出vim编辑器  :q! 这个连续两个小符号, 他代表的是『结束的输入字符』的意思.这样当空行输入eof字符,输入自动结束,不用ctrl+D c ...

  3. Springboot第二篇:与前端fetch通信(附springboot解决跨域方法)

    说到与前端通信,明白人都知道这章肯定会写两部分的东西啦. 关于后台 ①首先回顾前文,上一章环境搭建如图: ②我们在maven.example.controller下添加一个文件,并附上如图代码: ③: ...

  4. 矩阵&&高斯消元

    矩阵运算: \(A\times B\)叫做\(A\)左乘\(B\),或者\(B\)右乘\(A\). 行列式性质: \(1.\)交换矩阵的两行(列),行列式取相反数. \(2.\)某一行元素都\(\ti ...

  5. Python爬虫常用之登录(一) 思想

    爬虫主要目的是获取数据,常见的数据可以直接访问网页或者抓包获取,然后再解析即可. 一些较为隐私的数据则不会让游客身份的访问者随便看到,这个时候便需要登录获取. 一般获取数据需要的是登录后的cookie ...

  6. WinForm之GDI画图步骤

    Graphics g = this.CreateGraphics(); //这句是创建画布g,根据窗体得到窗体的画布 Pen p = new Pen(Color.Red, 2); //这句是创建一个红 ...

  7. EC2 Instance扩容EBS卷容量

    EC2实例运行一段时间后,由于日志和一些应用程序数据的积累,可能出现之前预留的磁盘容量不够需要扩容的情况.AWS EBS目前还不支持在线扩容,不过可以通过结合snapshot来实现. 如,我的EC2 ...

  8. 使用kerl安装erlang遇到的问题及解决办法-bak

    1 需要安装相关包 -dev autoconf 2 出现下面错误 * documentation : * xsltproc is missing. * fop is missing. * xmllin ...

  9. java跨库事务Atomikos

    1:引入额外的jar <dependency> <groupId>com.atomikos</groupId> <artifactId>transact ...

  10. 3、在Shell程序中使用的参数

    学习目标位置参数内部参数 如同ls命令可以接受目录等作为它的参数一样,在Shell编程时同样可以使用参数.Shell程序中的参数分为位置参数和内部参数等. 12-3-1 位置参数由系统提供的参数称为位 ...