第一个flink application
官网参考:https://ci.apache.org/projects/flink/flink-docs-release-1.10/#api-references
导入maven依赖
需要注意的是,如果使用scala写程序,导入的依赖跟java是不一样的
Maven Dependencies
You can add the following dependencies to your pom.xml to include Apache Flink in your project. These dependencies include a local execution environment and thus support local testing. Scala API: To use the Scala API, replace the flink-java artifact id with flink-scala_2. and flink-streaming-java_2. with flink-streaming-scala_2..
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>1.8.</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_2.</artifactId>
<version>1.8.</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_2.</artifactId>
<version>1.8.</version>
</dependency>
批处理wordcount示例(DataSet API)
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.util.Collector; public class WordCount { // 批量处理示例代码
public static void main(String[] args) throws Exception {
String inputPath = "E:\\flink\\words.txt";
String outputPath = "E:\\flink\\result";
//获取运行环境
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
//读取文件
DataSet<String> text = env.readTextFile(inputPath); DataSet<Tuple2<String, Integer>> counts =
// split up the lines in pairs (2-tuples) containing: (word,1)
text.flatMap(new Tokenizer())
// group by the tuple field "0" and sum up tuple field "1"
.groupBy(0) //以tuple的第一个字段分组
.sum(1);//以tuple的第二个字段计算总和 //setParallelism来设置并行度,类似spark。如果不设置并行度,将以多线程的形式输出,生成多个文件
counts.writeAsCsv(outputPath, "\n", " ").setParallelism(1); env.execute("Batch WordCount Example"); } // 自定义函数,也可以不在这里自定义,直接卸载上面flatMap()中也可以
public static class Tokenizer implements FlatMapFunction<String, Tuple2<String, Integer>> { @Override
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
// normalize and split the line
String[] tokens = value.toLowerCase().split(","); for (String token : tokens) {
if (token.length() > 0) {
//包装成tuple2
out.collect(new Tuple2<String, Integer>(token, 1));
}
}
}
}
}
流式处理wordcount示例(DataStream API)
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.util.Collector; /**
* 滑动窗口计算
* 通过socket模拟产生单词数据
* flink对数据进行统计计算
*/
public class SocketWindowWordCount { public static void main(String[] args) throws Exception {
//获取socket的端口号
int port;
try {
ParameterTool parameterTool = ParameterTool.fromArgs(args);
port = parameterTool.getInt("port");
}catch (Exception e){
System.out.println("No port set. use default port 9000");
port = 9999;
} //获取运行环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
String hostname = "master01.hadoop.mobile.cn";
String delimiter = "\n";
DataStreamSource<String> text = env.socketTextStream(hostname, port, delimiter);
//跟spark一样,使用flatmap算子来操作
//输入数据为string类型,输出为自定义的WordWithCount类型对象
DataStream<WordWithCount> windowCounts = text.flatMap(new FlatMapFunction<String, WordWithCount>() {
public void flatMap(String value, Collector<WordWithCount> out) throws Exception {
String[] splits = value.split(" ");
for (String word : splits) {
out.collect(new WordWithCount(word, 1L));
}
}
}).keyBy("word")
.timeWindow(Time.seconds(10), Time.seconds(5))//指定时间窗口大小为10秒,指定时间间隔为5秒
//每隔1秒统计前2秒的数据
.sum("count"); //把数据打印到控制台并且设置并行度
windowCounts.print().setParallelism(1);
System.out.println(System.currentTimeMillis());
env.execute("Socket window count");
} public static class WordWithCount{
public String word;
public long count;
public WordWithCount(){}
public WordWithCount(String word,long count){
this.word = word;
this.count = count;
}
@Override
public String toString() {
return "WordWithCount{" +
"word='" + word + '\'' +
", count=" + count +
'}';
}
} }
关于keyby算子:
/**
* Partitions the operator state of a {@link DataStream} using field expressions.
* A field expression is either the name of a public field or a getter method with parentheses
* of the {@link DataStream}'s underlying type. A dot can be used to drill
* down into objects, as in {@code "field1.getInnerField2()" }.
*
* @param fields
* One or more field expressions on which the state of the {@link DataStream} operators will be
* partitioned.
* @return The {@link DataStream} with partitioned state (i.e. KeyedStream)
* keyby用于分组的,接收的为变长参数,所以key可以指定一个或者多个字段。
* 此外在指定key的时候可以直接指定该字段的名字(但是要求为public类型的,否则报错如下:
* Exception in thread "main" org.apache.flink.api.common.InvalidProgramException: This type (GenericType<SocketWindowWordCount.WordWithCount>) cannot be used as key.
* at org.apache.flink.api.common.operators.Keys$ExpressionKeys.<init>(Keys.java:330)
* at org.apache.flink.streaming.api.datastream.DataStream.keyBy(DataStream.java:337)
* at SocketWindowWordCount.main(SocketWindowWordCount.java:41)
)
也可以通过getter方法来获取
**/
public KeyedStream<T, Tuple> keyBy(String... fields) {
return keyBy(new Keys.ExpressionKeys<>(fields, getType()));
}
flink table sql处理
package com.kong.flink; import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.java.BatchTableEnvironment; import java.util.ArrayList; public class FlinkSqlWordCount { public static void main(String[] args) throws Exception {
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
//创建一个tableEnvironment
BatchTableEnvironment tableEnv = BatchTableEnvironment.create(env); //将word封装成对象
String words = "hello,flink,hello,ksw";
ArrayList<WordCount> list = new ArrayList<>();
String[] split = words.split(",");
for (String word : split) {
list.add(new WordCount(word, 1L));
} //生成DataSet,类似spark并行化一个集合生成rdd
DataSet<WordCount> inputDataSet = env.fromCollection(list);
//将dataset转换为table
// * @param dataSet The {@link DataSet} to be converted.
// * @param fields The field names of the resulting {@link Table}.
//第一个参数表示我们要转换为table的dataSet;第二个参数表示table对应的字段名字
Table table = tableEnv.fromDataSet(inputDataSet, "word,frequency");
table.printSchema();
tableEnv.createTemporaryView("WordCount", table);
// tableEnv.createTemporaryView("wordCount",inputDataSet,"word,count");
Table table1 = tableEnv.sqlQuery("select word as word, sum(frequency) as frequency from WordCount GROUP BY word");
DataSet<WordCount> resultDataSet = tableEnv.toDataSet(table1, WordCount.class);
resultDataSet.printToErr();
} public static class WordCount {
public String word;
public long frequency;//这里不能用count表示,属于flink sql保留关键词...参考:https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/sql/index.html#reserved-keywords //这个无参构造方法必须要有,要不会报错...参考:https://ci.apache.org/projects/flink/flink-docs-release-1.10/zh/dev/api_concepts.html#pojo
//org.apache.flink.table.api.ValidationException: Too many fields referenced from an atomic type.
public WordCount() {
} public WordCount(String word, long frequency) {
this.word = word;
this.frequency = frequency;
} @Override
public String toString() {
return word + ", " + frequency;
}
}
}
第一个flink application的更多相关文章
- 构建一个flink程序,从kafka读取然后写入MYSQL
最近flink已经变得比较流行了,所以大家要了解flink并且使用flink.现在最流行的实时计算应该就是flink了,它具有了流计算和批处理功能.它可以处理有界数据和无界数据,也就是可以处理永远生产 ...
- iOS开发之 Xcode 6 创建一个Empty Application
参考链接http://jingyan.baidu.com/article/2a138328bd73f2074b134f6d.html Xcode 6 正式版如何创建一个Empty Applicatio ...
- Xcode7 通过 Single View Application 得到一个 Empty Application 工程
方法: 创建一个 Empty Application 工程 下面还是详细的说一下通过一个 Single View Application 工程得到一个 Empty Application 工程的方法: ...
- Xcode7.2中如何添加一个Empty Application模板
大熊猫猪·侯佩原创或翻译作品.欢迎转载,转载请注明出处. 如果觉得写的不好请多提意见,如果觉得不错请多多支持点赞.谢谢! hopy ;) Xcode 6.0正式版之后已经没有所谓的Empty Appl ...
- Flink从入门到放弃(入门篇2)-本地环境搭建&构建第一个Flink应用
戳更多文章: 1-Flink入门 2-本地环境搭建&构建第一个Flink应用 3-DataSet API 4-DataSteam API 5-集群部署 6-分布式缓存 7-重启策略 8-Fli ...
- Extend一个web application没有反应怎么办?
通过SharePoint管理中心Extend一个web application的时候, 点完确定按钮后,没有反应,怎么回事? [解决方法] 多等一会,不要连续点. 等待的过程中看看iis, 过一会 ...
- 一个flink作业的调优
最近接手了一个flink作业,另外一个同事断断续续有的没的写了半年的,不着急,也一直没上线,最近突然要上线,扔给我,要调通上线. 现状是: 1.代码跑不动,资源给的不少,但是就是频繁反压. 2.che ...
- 在 Cloudera Data Flow 上运行你的第一个 Flink 例子
文档编写目的 Cloudera Data Flow(CDF) 作为 Cloudera 一个独立的产品单元,围绕着实时数据采集,实时数据处理和实时数据分析有多个不同的功能模块,如下图所示: 图中 4 个 ...
- 怎么确定一个Flink job的资源
怎么确定一个Flink job的资源 Slots && parallelism 一个算子的parallelism 是5 ,那么这个算子就需要5个slot, 公式 :一个算子的paral ...
随机推荐
- jeDate日期控件精确到秒
案例下载 链接: https://pan.baidu.com/s/1m7eEW6K6Bt1t-0OjVY_Wxw 密码: xmei <script type="text/javascr ...
- ProtoBuf开发者指南
目录 1 概览 1.1 什么是protocol buffer 1.2 他们如何工作 1.3 为什么不用XML? 1.4 听起来像是为我的解决方案,如何开始? 1.5 一点历史 ...
- Scrapy 常用的shell执行命令
1.在任意系统下,可以使用 pip 安装 Scrapy pip install scrapy/ 确认安装成功 >>> import scrapy >>> scrap ...
- pytorch张量数据索引切片与维度变换操作大全(非常全)
(1-1)pytorch张量数据的索引与切片操作1.对于张量数据的索引操作主要有以下几种方式:a=torch.rand(4,3,28,28):DIM=4的张量数据a(1)a[:2]:取第一个维度的前2 ...
- 任意两点之间的最短路(floyed)
F.Moving On Firdaws and Fatinah are living in a country with nn cities, numbered from 11 to nn. Each ...
- ubuntu16.04 使用tensorflow object detection训练自己的模型
一.构建自己的数据集 1.格式必须为jpg.jpeg或png. 2.在models/research/object_detection文件夹下创建images文件夹,在images文件夹下创建trai ...
- C# 篇基础知识10——多线程
1.线程的概念 单核CPU的计算机中,一个时刻只能执行一条指令,操作系统以“时间片轮转”的方式实现多个程序“同时”运行.操作系统以进程(Process)的方式运行应用程序,进程不但包括应用程序的指令流 ...
- 使用JavaScript和Canvas打造真实的雨滴效果
使用JavaScript和Canvas打造真实的雨滴效果 寸志 · 1 年前 我最近搞了一个有趣的项目——rainyday.js .我认为这个项目并不怎么样,而且,事实上这是我第一次尝试接触一些比弹窗 ...
- linux下FTP的工具和使用以及rpmReadSignature failed错误
安装rpm文件时提示rpmReadSignature failed 错误 2011-09-23 11:04 现象: [root@localhost share]# rpm -ivh syslog- ...
- 本地Git仓库与GitHub/GitLab仓库同步
本地仓库即为在你的电脑上的项目文件,远程仓库即为服务器仓库,如GitHub.GitLab或其他等.此处以GitHub介绍本地仓库与远程仓库的同步.可先创建本地仓库,也可先创建GitHub仓库,但都需要 ...