第一个flink application

官网参考：https://ci.apache.org/projects/flink/flink-docs-release-1.10/#api-references

导入maven依赖

需要注意的是，如果使用scala写程序，导入的依赖跟java是不一样的

Maven Dependencies

You can add the following dependencies to your pom.xml to include Apache Flink in your project. These dependencies include a local execution environment and thus support local testing.

Scala API: To use the Scala API, replace the flink-java artifact id with flink-scala_2. and flink-streaming-java_2. with flink-streaming-scala_2..

<dependency>

  <groupId>org.apache.flink</groupId>

  <artifactId>flink-java</artifactId>

  <version>1.8.</version>

</dependency>

<dependency>

  <groupId>org.apache.flink</groupId>

  <artifactId>flink-streaming-java_2.</artifactId>

  <version>1.8.</version>

</dependency>

<dependency>

  <groupId>org.apache.flink</groupId>

  <artifactId>flink-clients_2.</artifactId>

  <version>1.8.</version>

</dependency>

批处理wordcount示例（DataSet API）

import org.apache.flink.api.common.functions.FlatMapFunction;

import org.apache.flink.api.java.DataSet;

import org.apache.flink.api.java.ExecutionEnvironment;

import org.apache.flink.api.java.tuple.Tuple2;

import org.apache.flink.util.Collector;

public class WordCount {

    // 批量处理示例代码

    public static void main(String[] args) throws Exception {

        String inputPath = "E:\\flink\\words.txt";

        String outputPath = "E:\\flink\\result";

        //获取运行环境

        ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

        //读取文件

        DataSet<String> text = env.readTextFile(inputPath);

        DataSet<Tuple2<String, Integer>> counts =

                // split up the lines in pairs (2-tuples) containing: (word,1)

                text.flatMap(new Tokenizer())

                        // group by the tuple field "0" and sum up tuple field "1"

                        .groupBy(0) //以tuple的第一个字段分组

                        .sum(1);//以tuple的第二个字段计算总和

        //setParallelism来设置并行度，类似spark。如果不设置并行度，将以多线程的形式输出，生成多个文件

        counts.writeAsCsv(outputPath, "\n", " ").setParallelism(1);

        env.execute("Batch WordCount Example");

    }

    // 自定义函数，也可以不在这里自定义，直接卸载上面flatMap()中也可以

    public static class Tokenizer implements FlatMapFunction<String, Tuple2<String, Integer>> {

        @Override

        public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {

            // normalize and split the line

            String[] tokens = value.toLowerCase().split(",");

            for (String token : tokens) {

                if (token.length() > 0) {

                    //包装成tuple2

                    out.collect(new Tuple2<String, Integer>(token, 1));

                }

            }

        }

    }

}

流式处理wordcount示例（DataStream API）

import org.apache.flink.api.common.functions.FlatMapFunction;

import org.apache.flink.api.java.utils.ParameterTool;

import org.apache.flink.streaming.api.datastream.DataStream;

import org.apache.flink.streaming.api.datastream.DataStreamSource;

import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

import org.apache.flink.streaming.api.windowing.time.Time;

import org.apache.flink.util.Collector;

/**

 *  滑动窗口计算

 * 通过socket模拟产生单词数据

 * flink对数据进行统计计算

 */

public class SocketWindowWordCount {

    public static void main(String[] args) throws Exception {

        //获取socket的端口号

        int port;

        try {

            ParameterTool parameterTool = ParameterTool.fromArgs(args);

            port = parameterTool.getInt("port");

        }catch (Exception e){

            System.out.println("No port set. use default port 9000");

            port = 9999;

        }

        //获取运行环境

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        String hostname = "master01.hadoop.mobile.cn";

        String delimiter = "\n";

        DataStreamSource<String> text  = env.socketTextStream(hostname, port, delimiter);

        //跟spark一样，使用flatmap算子来操作

        //输入数据为string类型，输出为自定义的WordWithCount类型对象

        DataStream<WordWithCount> windowCounts = text.flatMap(new FlatMapFunction<String, WordWithCount>() {

            public void flatMap(String value, Collector<WordWithCount> out) throws Exception {

                String[] splits = value.split(" ");

                for (String word : splits) {

                    out.collect(new WordWithCount(word, 1L));

                }

            }

        }).keyBy("word")

                .timeWindow(Time.seconds(10), Time.seconds(5))//指定时间窗口大小为10秒，指定时间间隔为5秒

                //每隔1秒统计前2秒的数据

                .sum("count");

        //把数据打印到控制台并且设置并行度

        windowCounts.print().setParallelism(1);

        System.out.println(System.currentTimeMillis());

        env.execute("Socket window count");

    }

    public static class WordWithCount{

        public String word;

        public long count;

        public  WordWithCount(){}

        public WordWithCount(String word,long count){

            this.word = word;

            this.count = count;

        }

        @Override

        public String toString() {

            return "WordWithCount{" +

                    "word='" + word + '\'' +

                    ", count=" + count +

                    '}';

        }

    }

}

关于keyby算子：

    /**

     * Partitions the operator state of a {@link DataStream} using field expressions.

     * A field expression is either the name of a public field or a getter method with parentheses

     * of the {@link DataStream}'s underlying type. A dot can be used to drill

     * down into objects, as in {@code "field1.getInnerField2()" }.

     *

     * @param fields

     *            One or more field expressions on which the state of the {@link DataStream} operators will be

     *            partitioned.

     * @return The {@link DataStream} with partitioned state (i.e. KeyedStream)

     * keyby用于分组的，接收的为变长参数，所以key可以指定一个或者多个字段。

     *    此外在指定key的时候可以直接指定该字段的名字（但是要求为public类型的，否则报错如下：

     *    Exception in thread "main" org.apache.flink.api.common.InvalidProgramException: This type (GenericType<SocketWindowWordCount.WordWithCount>) cannot be used as key.

     *    at org.apache.flink.api.common.operators.Keys$ExpressionKeys.<init>(Keys.java:330)

     *    at org.apache.flink.streaming.api.datastream.DataStream.keyBy(DataStream.java:337)

     *    at SocketWindowWordCount.main(SocketWindowWordCount.java:41)

     ）

     也可以通过getter方法来获取

     **/

    public KeyedStream<T, Tuple> keyBy(String... fields) {

        return keyBy(new Keys.ExpressionKeys<>(fields, getType()));

    }

flink table sql处理

package com.kong.flink;

import org.apache.flink.api.java.DataSet;

import org.apache.flink.api.java.ExecutionEnvironment;

import org.apache.flink.table.api.Table;

import org.apache.flink.table.api.java.BatchTableEnvironment;

import java.util.ArrayList;

public class FlinkSqlWordCount {

    public static void main(String[] args) throws Exception {

        ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

        //创建一个tableEnvironment

        BatchTableEnvironment tableEnv = BatchTableEnvironment.create(env);

        //将word封装成对象

        String words = "hello,flink,hello,ksw";

        ArrayList<WordCount> list = new ArrayList<>();

        String[] split = words.split(",");

        for (String word : split) {

            list.add(new WordCount(word, 1L));

        }

        //生成DataSet,类似spark并行化一个集合生成rdd

        DataSet<WordCount> inputDataSet = env.fromCollection(list);

        //将dataset转换为table

        //     * @param dataSet The {@link DataSet} to be converted.

        //     * @param fields The field names of the resulting {@link Table}.

        //第一个参数表示我们要转换为table的dataSet;第二个参数表示table对应的字段名字

        Table table = tableEnv.fromDataSet(inputDataSet, "word,frequency");

        table.printSchema();

        tableEnv.createTemporaryView("WordCount", table);

//        tableEnv.createTemporaryView("wordCount",inputDataSet,"word,count");

        Table table1 = tableEnv.sqlQuery("select word as word, sum(frequency) as frequency from WordCount GROUP BY word");

        DataSet<WordCount> resultDataSet = tableEnv.toDataSet(table1, WordCount.class);

        resultDataSet.printToErr();

    }

    public static class WordCount {

        public String word;

        public long frequency;//这里不能用count表示，属于flink sql保留关键词...参考：https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/sql/index.html#reserved-keywords

        //这个无参构造方法必须要有，要不会报错...参考：https://ci.apache.org/projects/flink/flink-docs-release-1.10/zh/dev/api_concepts.html#pojo

        //org.apache.flink.table.api.ValidationException: Too many fields referenced from an atomic type.

        public WordCount() {

        }

        public WordCount(String word, long frequency) {

            this.word = word;

            this.frequency = frequency;

        }

        @Override

        public String toString() {

            return word + ", " + frequency;

        }

    }

}

第一个flink application的更多相关文章

构建一个flink程序,从kafka读取然后写入MYSQL
最近flink已经变得比较流行了,所以大家要了解flink并且使用flink.现在最流行的实时计算应该就是flink了,它具有了流计算和批处理功能.它可以处理有界数据和无界数据,也就是可以处理永远生产 ...
iOS开发之 Xcode 6 创建一个Empty Application
参考链接http://jingyan.baidu.com/article/2a138328bd73f2074b134f6d.html Xcode 6 正式版如何创建一个Empty Applicatio ...
Xcode7 通过 Single View Application 得到一个 Empty Application 工程
方法: 创建一个 Empty Application 工程下面还是详细的说一下通过一个 Single View Application 工程得到一个 Empty Application 工程的方法: ...
Xcode7.2中如何添加一个Empty Application模板
大熊猫猪·侯佩原创或翻译作品.欢迎转载,转载请注明出处. 如果觉得写的不好请多提意见,如果觉得不错请多多支持点赞.谢谢! hopy ;) Xcode 6.0正式版之后已经没有所谓的Empty Appl ...
Flink从入门到放弃(入门篇2)-本地环境搭建&构建第一个Flink应用
戳更多文章: 1-Flink入门 2-本地环境搭建&构建第一个Flink应用 3-DataSet API 4-DataSteam API 5-集群部署 6-分布式缓存 7-重启策略 8-Fli ...
Extend一个web application没有反应怎么办？
通过SharePoint管理中心Extend一个web application的时候, 点完确定按钮后,没有反应,怎么回事? [解决方法] 多等一会,不要连续点. 等待的过程中看看iis, 过一会 ...
一个flink作业的调优
最近接手了一个flink作业,另外一个同事断断续续有的没的写了半年的,不着急,也一直没上线,最近突然要上线,扔给我,要调通上线. 现状是: 1.代码跑不动,资源给的不少,但是就是频繁反压. 2.che ...
在 Cloudera Data Flow 上运行你的第一个 Flink 例子
文档编写目的 Cloudera Data Flow(CDF) 作为 Cloudera 一个独立的产品单元,围绕着实时数据采集,实时数据处理和实时数据分析有多个不同的功能模块,如下图所示: 图中 4 个 ...
怎么确定一个Flink job的资源
怎么确定一个Flink job的资源 Slots && parallelism 一个算子的parallelism 是5 ,那么这个算子就需要5个slot, 公式 :一个算子的paral ...

随机推荐

jeDate日期控件精确到秒
案例下载链接: https://pan.baidu.com/s/1m7eEW6K6Bt1t-0OjVY_Wxw 密码: xmei <script type="text/javascr ...
ProtoBuf开发者指南
目录 1 概览 1.1 什么是protocol buffer 1.2 他们如何工作 1.3 为什么不用XML? 1.4 听起来像是为我的解决方案,如何开始? 1.5 一点历史 ...
Scrapy 常用的shell执行命令
1.在任意系统下,可以使用 pip 安装 Scrapy pip install scrapy/ 确认安装成功 >>> import scrapy >>> scrap ...
pytorch张量数据索引切片与维度变换操作大全（非常全）
(1-1)pytorch张量数据的索引与切片操作1.对于张量数据的索引操作主要有以下几种方式:a=torch.rand(4,3,28,28):DIM=4的张量数据a(1)a[:2]:取第一个维度的前2 ...
任意两点之间的最短路（floyed)
F.Moving On Firdaws and Fatinah are living in a country with nn cities, numbered from 11 to nn. Each ...
ubuntu16.04 使用tensorflow object detection训练自己的模型
一.构建自己的数据集 1.格式必须为jpg.jpeg或png. 2.在models/research/object_detection文件夹下创建images文件夹,在images文件夹下创建trai ...
C# 篇基础知识10——多线程
1.线程的概念单核CPU的计算机中,一个时刻只能执行一条指令,操作系统以“时间片轮转”的方式实现多个程序“同时”运行.操作系统以进程(Process)的方式运行应用程序,进程不但包括应用程序的指令流 ...
使用JavaScript和Canvas打造真实的雨滴效果
使用JavaScript和Canvas打造真实的雨滴效果寸志 · 1 年前我最近搞了一个有趣的项目——rainyday.js .我认为这个项目并不怎么样,而且,事实上这是我第一次尝试接触一些比弹窗 ...
linux下FTP的工具和使用以及rpmReadSignature failed错误
安装rpm文件时提示rpmReadSignature failed 错误 2011-09-23 11:04 现象: [root@localhost share]# rpm -ivh syslog- ...
本地Git仓库与GitHub/GitLab仓库同步
本地仓库即为在你的电脑上的项目文件,远程仓库即为服务器仓库,如GitHub.GitLab或其他等.此处以GitHub介绍本地仓库与远程仓库的同步.可先创建本地仓库,也可先创建GitHub仓库,但都需要 ...

第一个flink application

导入maven依赖

批处理wordcount示例（DataSet API）

流式处理wordcount示例（DataStream API）

flink table sql处理

第一个flink application的更多相关文章

随机推荐

热门专题