第一个flink application
官网参考:https://ci.apache.org/projects/flink/flink-docs-release-1.10/#api-references
导入maven依赖
需要注意的是,如果使用scala写程序,导入的依赖跟java是不一样的
Maven Dependencies
You can add the following dependencies to your pom.xml to include Apache Flink in your project. These dependencies include a local execution environment and thus support local testing. Scala API: To use the Scala API, replace the flink-java artifact id with flink-scala_2. and flink-streaming-java_2. with flink-streaming-scala_2..
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>1.8.</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_2.</artifactId>
<version>1.8.</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_2.</artifactId>
<version>1.8.</version>
</dependency>
批处理wordcount示例(DataSet API)
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.util.Collector; public class WordCount { // 批量处理示例代码
public static void main(String[] args) throws Exception {
String inputPath = "E:\\flink\\words.txt";
String outputPath = "E:\\flink\\result";
//获取运行环境
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
//读取文件
DataSet<String> text = env.readTextFile(inputPath); DataSet<Tuple2<String, Integer>> counts =
// split up the lines in pairs (2-tuples) containing: (word,1)
text.flatMap(new Tokenizer())
// group by the tuple field "0" and sum up tuple field "1"
.groupBy(0) //以tuple的第一个字段分组
.sum(1);//以tuple的第二个字段计算总和 //setParallelism来设置并行度,类似spark。如果不设置并行度,将以多线程的形式输出,生成多个文件
counts.writeAsCsv(outputPath, "\n", " ").setParallelism(1); env.execute("Batch WordCount Example"); } // 自定义函数,也可以不在这里自定义,直接卸载上面flatMap()中也可以
public static class Tokenizer implements FlatMapFunction<String, Tuple2<String, Integer>> { @Override
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
// normalize and split the line
String[] tokens = value.toLowerCase().split(","); for (String token : tokens) {
if (token.length() > 0) {
//包装成tuple2
out.collect(new Tuple2<String, Integer>(token, 1));
}
}
}
}
}
流式处理wordcount示例(DataStream API)
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.util.Collector; /**
* 滑动窗口计算
* 通过socket模拟产生单词数据
* flink对数据进行统计计算
*/
public class SocketWindowWordCount { public static void main(String[] args) throws Exception {
//获取socket的端口号
int port;
try {
ParameterTool parameterTool = ParameterTool.fromArgs(args);
port = parameterTool.getInt("port");
}catch (Exception e){
System.out.println("No port set. use default port 9000");
port = 9999;
} //获取运行环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
String hostname = "master01.hadoop.mobile.cn";
String delimiter = "\n";
DataStreamSource<String> text = env.socketTextStream(hostname, port, delimiter);
//跟spark一样,使用flatmap算子来操作
//输入数据为string类型,输出为自定义的WordWithCount类型对象
DataStream<WordWithCount> windowCounts = text.flatMap(new FlatMapFunction<String, WordWithCount>() {
public void flatMap(String value, Collector<WordWithCount> out) throws Exception {
String[] splits = value.split(" ");
for (String word : splits) {
out.collect(new WordWithCount(word, 1L));
}
}
}).keyBy("word")
.timeWindow(Time.seconds(10), Time.seconds(5))//指定时间窗口大小为10秒,指定时间间隔为5秒
//每隔1秒统计前2秒的数据
.sum("count"); //把数据打印到控制台并且设置并行度
windowCounts.print().setParallelism(1);
System.out.println(System.currentTimeMillis());
env.execute("Socket window count");
} public static class WordWithCount{
public String word;
public long count;
public WordWithCount(){}
public WordWithCount(String word,long count){
this.word = word;
this.count = count;
}
@Override
public String toString() {
return "WordWithCount{" +
"word='" + word + '\'' +
", count=" + count +
'}';
}
} }
关于keyby算子:
/**
* Partitions the operator state of a {@link DataStream} using field expressions.
* A field expression is either the name of a public field or a getter method with parentheses
* of the {@link DataStream}'s underlying type. A dot can be used to drill
* down into objects, as in {@code "field1.getInnerField2()" }.
*
* @param fields
* One or more field expressions on which the state of the {@link DataStream} operators will be
* partitioned.
* @return The {@link DataStream} with partitioned state (i.e. KeyedStream)
* keyby用于分组的,接收的为变长参数,所以key可以指定一个或者多个字段。
* 此外在指定key的时候可以直接指定该字段的名字(但是要求为public类型的,否则报错如下:
* Exception in thread "main" org.apache.flink.api.common.InvalidProgramException: This type (GenericType<SocketWindowWordCount.WordWithCount>) cannot be used as key.
* at org.apache.flink.api.common.operators.Keys$ExpressionKeys.<init>(Keys.java:330)
* at org.apache.flink.streaming.api.datastream.DataStream.keyBy(DataStream.java:337)
* at SocketWindowWordCount.main(SocketWindowWordCount.java:41)
)
也可以通过getter方法来获取
**/
public KeyedStream<T, Tuple> keyBy(String... fields) {
return keyBy(new Keys.ExpressionKeys<>(fields, getType()));
}
flink table sql处理
package com.kong.flink; import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.java.BatchTableEnvironment; import java.util.ArrayList; public class FlinkSqlWordCount { public static void main(String[] args) throws Exception {
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
//创建一个tableEnvironment
BatchTableEnvironment tableEnv = BatchTableEnvironment.create(env); //将word封装成对象
String words = "hello,flink,hello,ksw";
ArrayList<WordCount> list = new ArrayList<>();
String[] split = words.split(",");
for (String word : split) {
list.add(new WordCount(word, 1L));
} //生成DataSet,类似spark并行化一个集合生成rdd
DataSet<WordCount> inputDataSet = env.fromCollection(list);
//将dataset转换为table
// * @param dataSet The {@link DataSet} to be converted.
// * @param fields The field names of the resulting {@link Table}.
//第一个参数表示我们要转换为table的dataSet;第二个参数表示table对应的字段名字
Table table = tableEnv.fromDataSet(inputDataSet, "word,frequency");
table.printSchema();
tableEnv.createTemporaryView("WordCount", table);
// tableEnv.createTemporaryView("wordCount",inputDataSet,"word,count");
Table table1 = tableEnv.sqlQuery("select word as word, sum(frequency) as frequency from WordCount GROUP BY word");
DataSet<WordCount> resultDataSet = tableEnv.toDataSet(table1, WordCount.class);
resultDataSet.printToErr();
} public static class WordCount {
public String word;
public long frequency;//这里不能用count表示,属于flink sql保留关键词...参考:https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/sql/index.html#reserved-keywords //这个无参构造方法必须要有,要不会报错...参考:https://ci.apache.org/projects/flink/flink-docs-release-1.10/zh/dev/api_concepts.html#pojo
//org.apache.flink.table.api.ValidationException: Too many fields referenced from an atomic type.
public WordCount() {
} public WordCount(String word, long frequency) {
this.word = word;
this.frequency = frequency;
} @Override
public String toString() {
return word + ", " + frequency;
}
}
}
第一个flink application的更多相关文章
- 构建一个flink程序,从kafka读取然后写入MYSQL
最近flink已经变得比较流行了,所以大家要了解flink并且使用flink.现在最流行的实时计算应该就是flink了,它具有了流计算和批处理功能.它可以处理有界数据和无界数据,也就是可以处理永远生产 ...
- iOS开发之 Xcode 6 创建一个Empty Application
参考链接http://jingyan.baidu.com/article/2a138328bd73f2074b134f6d.html Xcode 6 正式版如何创建一个Empty Applicatio ...
- Xcode7 通过 Single View Application 得到一个 Empty Application 工程
方法: 创建一个 Empty Application 工程 下面还是详细的说一下通过一个 Single View Application 工程得到一个 Empty Application 工程的方法: ...
- Xcode7.2中如何添加一个Empty Application模板
大熊猫猪·侯佩原创或翻译作品.欢迎转载,转载请注明出处. 如果觉得写的不好请多提意见,如果觉得不错请多多支持点赞.谢谢! hopy ;) Xcode 6.0正式版之后已经没有所谓的Empty Appl ...
- Flink从入门到放弃(入门篇2)-本地环境搭建&构建第一个Flink应用
戳更多文章: 1-Flink入门 2-本地环境搭建&构建第一个Flink应用 3-DataSet API 4-DataSteam API 5-集群部署 6-分布式缓存 7-重启策略 8-Fli ...
- Extend一个web application没有反应怎么办?
通过SharePoint管理中心Extend一个web application的时候, 点完确定按钮后,没有反应,怎么回事? [解决方法] 多等一会,不要连续点. 等待的过程中看看iis, 过一会 ...
- 一个flink作业的调优
最近接手了一个flink作业,另外一个同事断断续续有的没的写了半年的,不着急,也一直没上线,最近突然要上线,扔给我,要调通上线. 现状是: 1.代码跑不动,资源给的不少,但是就是频繁反压. 2.che ...
- 在 Cloudera Data Flow 上运行你的第一个 Flink 例子
文档编写目的 Cloudera Data Flow(CDF) 作为 Cloudera 一个独立的产品单元,围绕着实时数据采集,实时数据处理和实时数据分析有多个不同的功能模块,如下图所示: 图中 4 个 ...
- 怎么确定一个Flink job的资源
怎么确定一个Flink job的资源 Slots && parallelism 一个算子的parallelism 是5 ,那么这个算子就需要5个slot, 公式 :一个算子的paral ...
随机推荐
- Kibana7.3.2与ElasticSearch7.3.2的集成
上接: Ubuntu18.04 ElasticSearch7.3.2集群搭建 上传二进制包解压到指定目录, 修改目录名 tar -xzvf tar xzvf kibana-6.3.2-linux-x8 ...
- 入门学习C链接
参考链接:http://c.biancheng.net/view/465.html 在里面链接下载了:code:block,还有C语言入门的PDF文件. 常看网站:https://www.cnblog ...
- Linux命令:history命令历史的管理及用法
bash可以保存的过去曾经执行过的命令.当某个用户登录到shell中,会读取该用户家目录中的~/.bash_history文件,并将历史命令列表保存到内存中.当用户退出当前shell时,会将内存中的历 ...
- Iptables与LVS——从入门到放弃
防火墙什么是防火墙?防火墙其实就是一个隔离的工具,工作于主机或者网络的边缘,对于进出本主机或者网络的报文根据事先定义好的网络规则做匹配监测.防火墙可以简单地划分为两大类:主机防火墙 网络防火墙 ...
- JAVA字符串与整形、浮点类型之间的相互转换总结
1.字符串转化为整形.浮点类型 String s = "100"; //方法一 int a = Integer.parseInt(String s); Long.parseLong ...
- coding 321
三大原理(计算机原理.操作系统原理.编译原理)两个协议(TCP与HTTP协议)一种结构(数据结构)
- 嵊州普及Day6T1
题意:有一个矩形,由正负整数构成.一个位子的魅力值为相邻的格子,若与邻格同号则减去绝对值,若异号则加上绝对值. 思路:一个格子一个格子计算即可,没什么好说的. 见代码: #include<ios ...
- java与MySQL数据库的连接
java与MySQL数据库的连接 1.数据库的安装和建立参见上一篇博客中的第1,2步骤.(http://blog.csdn.net/nuptboyzhb/article/details/8043091 ...
- c# Thread、ThreadPool、Task的区别
Thread与ThreadPoll 前台线程:主程序必须等待线程执行完毕后才可退出程序.Thread默认为前台线程,也可以设置为后台线程 后台线程:主程序执行完毕后就退出,不管线程是否执行完毕.Thr ...
- AngularJs 禁止模板缓存
因为AngularJs的特性(or 浏览器本身的缓存?),angular默认的HTML模板加载都会被缓存起来.导致每次修改完模板之后都得经常需要清除浏览器的缓存来保证浏览器去获得最新的html模板,自 ...