从flink的官方文档,我们知道flink的编程模型分为四层,sql层是最高层的api,Table api是中间层,DataStream/DataSet Api 是核心,stateful Streaming process层是底层实现。

其中,

flink dataset api使用及原理 介绍了DataSet Api

flink DataStream API使用及原理介绍了DataStream Api

flink中的时间戳如何使用?---Watermark使用及原理 介绍了底层实现的基础Watermark

flink window实例分析 介绍了window的概念及使用原理

Flink中的状态与容错 介绍了State的概念及checkpoint,savepoint的容错机制

0. 基本概念:

0.1 TableEnvironment

TableEnvironment是Table API和SQL集成的核心概念,它主要负责:

  1、在内部目录Catalog中注册一个Table
  2、注册一个外部目录Catalog
  3、执行SQL查询
  4、注册一个用户自定义函数UDF
  5、将DataStream或者DataSet转换成Table
  6、持有BatchTableEnvironment或者StreamTableEnvironment的引用
/**
* The base class for batch and stream TableEnvironments.
*
* <p>The TableEnvironment is a central concept of the Table API and SQL integration. It is
* responsible for:
*
* <ul>
* <li>Registering a Table in the internal catalog</li>
* <li>Registering an external catalog</li>
* <li>Executing SQL queries</li>
* <li>Registering a user-defined scalar function. For the user-defined table and aggregate
* function, use the StreamTableEnvironment or BatchTableEnvironment</li>
* </ul>
*/

0.2 Catalog

Catalog:所有对数据库和表的元数据信息都存放再Flink CataLog内部目录结构中,其存放了flink内部所有与Table相关的元数据信息,包括表结构信息/数据源信息等。

/**
* This interface is responsible for reading and writing metadata such as database/table/views/UDFs
* from a registered catalog. It connects a registered catalog and Flink's Table API.
*/

其结构如下:

0.3 TableSource

在使用Table API时,可以将外部的数据源直接注册成Table数据结构。此结构称之为TableSource

/**
* Defines an external table with the schema that is provided by {@link TableSource#getTableSchema}.
*
* <p>The data of a {@link TableSource} is produced as a {@code DataSet} in case of a {@code BatchTableSource}
* or as a {@code DataStream} in case of a {@code StreamTableSource}. The type of ths produced
* {@code DataSet} or {@code DataStream} is specified by the {@link TableSource#getProducedDataType()} method.
*
* <p>By default, the fields of the {@link TableSchema} are implicitly mapped by name to the fields of
* the produced {@link DataType}. An explicit mapping can be defined by implementing the
* {@link DefinedFieldMapping} interface.
*
* @param <T> The return type of the {@link TableSource}.
*/

0.4 TableSink

数据处理完成后需要将结果写入外部存储中,在Table API中有对应的Sink模块,此模块为TableSink

/**
* A {@link TableSink} specifies how to emit a table to an external
* system or location.
*
* <p>The interface is generic such that it can support different storage locations and formats.
*
* @param <T> The return type of the {@link TableSink}.
*/

0.5 Table Connector

在Flink1.6版本之后,为了能够让Table API通过配置化的方式连接外部系统,且同时可以在sql client中使用,flink 提出了Table Connector的概念,主要目的时将Table Source和Table Sink的定义和使用分离。

通过Table Connector将不同内建的Table Source和TableSink封装,形成可以配置化的组件,在Table Api和Sql client能够同时使用。

    /**
* Creates a table source and/or table sink from a descriptor.
*
* <p>Descriptors allow for declaring the communication to external systems in an
* implementation-agnostic way. The classpath is scanned for suitable table factories that match
* the desired configuration.
*
* <p>The following example shows how to read from a connector using a JSON format and
* register a table source as "MyTable":
*
* <pre>
* {@code
*
* tableEnv
* .connect(
* new ExternalSystemXYZ()
* .version("0.11"))
* .withFormat(
* new Json()
* .jsonSchema("{...}")
* .failOnMissingField(false))
* .withSchema(
* new Schema()
* .field("user-name", "VARCHAR").from("u_name")
* .field("count", "DECIMAL")
* .registerSource("MyTable");
* }
*</pre>
*
* @param connectorDescriptor connector descriptor describing the external system
*/
TableDescriptor connect(ConnectorDescriptor connectorDescriptor);

本篇主要聚焦于sql和Table Api。

1.sql

1.1 基于DataSet api的sql

示例:

package org.apache.flink.table.examples.java;

import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.java.BatchTableEnvironment; /**
* Simple example that shows how the Batch SQL API is used in Java.
*
* <p>This example shows how to:
* - Convert DataSets to Tables
* - Register a Table under a name
* - Run a SQL query on the registered Table
*/
public class WordCountSQL { // *************************************************************************
// PROGRAM
// ************************************************************************* public static void main(String[] args) throws Exception { // set up execution environment
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
BatchTableEnvironment tEnv = BatchTableEnvironment.create(env); DataSet<WC> input = env.fromElements(
new WC("Hello", 1),
new WC("Ciao", 1),
new WC("Hello", 1)); // register the DataSet as table "WordCount"
tEnv.registerDataSet("WordCount", input, "word, frequency"); // run a SQL query on the Table and retrieve the result as a new Table
Table table = tEnv.sqlQuery(
"SELECT word, SUM(frequency) as frequency FROM WordCount GROUP BY word"); DataSet<WC> result = tEnv.toDataSet(table, WC.class); result.print();
} // *************************************************************************
// USER DATA TYPES
// ************************************************************************* /**
* Simple POJO containing a word and its respective count.
*/
public static class WC {
public String word;
public long frequency; // public constructor to make it a Flink POJO
public WC() {} public WC(String word, long frequency) {
this.word = word;
this.frequency = frequency;
} @Override
public String toString() {
return "WC " + word + " " + frequency;
}
}
}

其中,BatchTableEnvironment

/**
* The {@link TableEnvironment} for a Java batch {@link ExecutionEnvironment} that works
* with {@link DataSet}s.
*
* <p>A TableEnvironment can be used to:
* <ul>
* <li>convert a {@link DataSet} to a {@link Table}</li>
* <li>register a {@link DataSet} in the {@link TableEnvironment}'s catalog</li>
* <li>register a {@link Table} in the {@link TableEnvironment}'s catalog</li>
* <li>scan a registered table to obtain a {@link Table}</li>
* <li>specify a SQL query on registered tables to obtain a {@link Table}</li>
* <li>convert a {@link Table} into a {@link DataSet}</li>
* <li>explain the AST and execution plan of a {@link Table}</li>
* </ul>
*/

BatchTableSource

/** Defines an external batch table and provides access to its data.
*
* @param <T> Type of the {@link DataSet} created by this {@link TableSource}.
*/

BatchTableSink

/** Defines an external {@link TableSink} to emit a batch {@link Table}.
*
* @param <T> Type of {@link DataSet} that this {@link TableSink} expects and supports.
*/

1.2 基于DataStream api的sql

示例代码

package org.apache.flink.table.examples.java;

import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.java.StreamTableEnvironment; import java.util.Arrays; /**
* Simple example for demonstrating the use of SQL on a Stream Table in Java.
*
* <p>This example shows how to:
* - Convert DataStreams to Tables
* - Register a Table under a name
* - Run a StreamSQL query on the registered Table
*
*/
public class StreamSQLExample { // *************************************************************************
// PROGRAM
// ************************************************************************* public static void main(String[] args) throws Exception { // set up execution environment
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment tEnv = StreamTableEnvironment.create(env); DataStream<Order> orderA = env.fromCollection(Arrays.asList(
new Order(1L, "beer", 3),
new Order(1L, "diaper", 4),
new Order(3L, "rubber", 2))); DataStream<Order> orderB = env.fromCollection(Arrays.asList(
new Order(2L, "pen", 3),
new Order(2L, "rubber", 3),
new Order(4L, "beer", 1))); // convert DataStream to Table
Table tableA = tEnv.fromDataStream(orderA, "user, product, amount");
// register DataStream as Table
tEnv.registerDataStream("OrderB", orderB, "user, product, amount"); // union the two tables
Table result = tEnv.sqlQuery("SELECT * FROM " + tableA + " WHERE amount > 2 UNION ALL " +
"SELECT * FROM OrderB WHERE amount < 2"); tEnv.toAppendStream(result, Order.class).print(); env.execute();
} // *************************************************************************
// USER DATA TYPES
// ************************************************************************* /**
* Simple POJO.
*/
public static class Order {
public Long user;
public String product;
public int amount; public Order() {
} public Order(Long user, String product, int amount) {
this.user = user;
this.product = product;
this.amount = amount;
} @Override
public String toString() {
return "Order{" +
"user=" + user +
", product='" + product + '\'' +
", amount=" + amount +
'}';
}
}
}

其中,StreamTableEnvironment

/**
* The {@link TableEnvironment} for a Java {@link StreamExecutionEnvironment} that works with
* {@link DataStream}s.
*
* <p>A TableEnvironment can be used to:
* <ul>
* <li>convert a {@link DataStream} to a {@link Table}</li>
* <li>register a {@link DataStream} in the {@link TableEnvironment}'s catalog</li>
* <li>register a {@link Table} in the {@link TableEnvironment}'s catalog</li>
* <li>scan a registered table to obtain a {@link Table}</li>
* <li>specify a SQL query on registered tables to obtain a {@link Table}</li>
* <li>convert a {@link Table} into a {@link DataStream}</li>
* <li>explain the AST and execution plan of a {@link Table}</li>
* </ul>
*/

StreamTableSource

/** Defines an external stream table and provides read access to its data.
*
* @param <T> Type of the {@link DataStream} created by this {@link TableSource}.
*/

StreamTableSink

/**
* Defines an external stream table and provides write access to its data.
*
* @param <T> Type of the {@link DataStream} created by this {@link TableSink}.
*/

2. table api

示例

package org.apache.flink.table.examples.java;

import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.java.BatchTableEnvironment; /**
* Simple example for demonstrating the use of the Table API for a Word Count in Java.
*
* <p>This example shows how to:
* - Convert DataSets to Tables
* - Apply group, aggregate, select, and filter operations
*/
public class WordCountTable { // *************************************************************************
// PROGRAM
// ************************************************************************* public static void main(String[] args) throws Exception {
ExecutionEnvironment env = ExecutionEnvironment.createCollectionsEnvironment();
BatchTableEnvironment tEnv = BatchTableEnvironment.create(env); DataSet<WC> input = env.fromElements(
new WC("Hello", 1),
new WC("Ciao", 1),
new WC("Hello", 1)); Table table = tEnv.fromDataSet(input); Table filtered = table
.groupBy("word")
.select("word, frequency.sum as frequency")
.filter("frequency = 2"); DataSet<WC> result = tEnv.toDataSet(filtered, WC.class); result.print();
} // *************************************************************************
// USER DATA TYPES
// ************************************************************************* /**
* Simple POJO containing a word and its respective count.
*/
public static class WC {
public String word;
public long frequency; // public constructor to make it a Flink POJO
public WC() {} public WC(String word, long frequency) {
this.word = word;
this.frequency = frequency;
} @Override
public String toString() {
return "WC " + word + " " + frequency;
}
}
}

3.数据转换

  3.1 DataSet与Table相互转换

    DataSet-->Table

      注册方式:

          // register the DataSet as table "WordCount"
  tEnv.registerDataSet("WordCount", input, "word, frequency"); 
     转换方式:
       Table table = tEnv.fromDataSet(input);

    Table-->DataSet

        DataSet<WC> result = tEnv.toDataSet(filtered, WC.class);

  3.2 DataStream与Table相互转换

    DataStream-->Table

      注册方式:

          tEnv.registerDataStream("OrderB", orderB, "user, product, amount");
     转换方式:
       Table tableA = tEnv.fromDataStream(orderA, "user, product, amount");

    Table-->DataStream

        DataSet<WC> result = tEnv.toDataSet(filtered, WC.class);

参考资料

【1】https://ci.apache.org/projects/flink/flink-docs-release-1.8/concepts/programming-model.html

【2】Flink原理、实战与性能优化

使用flink Table &Sql api来构建批量和流式应用(1)Table的基本概念的更多相关文章

  1. 使用flink Table &Sql api来构建批量和流式应用(2)Table API概述

    从flink的官方文档,我们知道flink的编程模型分为四层,sql层是最高层的api,Table api是中间层,DataStream/DataSet Api 是核心,stateful Stream ...

  2. 使用flink Table &Sql api来构建批量和流式应用(3)Flink Sql 使用

    从flink的官方文档,我们知道flink的编程模型分为四层,sql层是最高层的api,Table api是中间层,DataStream/DataSet Api 是核心,stateful Stream ...

  3. Flink 另外一个分布式流式和批量数据处理的开源平台

    Apache Flink是一个分布式流式和批量数据处理的开源平台. Flink的核心是一个流式数据流动引擎,它为数据流上面的分布式计算提供数据分发.通讯.容错.Flink包括几个使用 Flink引擎创 ...

  4. 8、Flink Table API & Flink Sql API

    一.概述 上图是flink的分层模型,Table API 和 SQL 处于最顶端,是 Flink 提供的高级 API 操作.Flink SQL 是 Flink 实时计算为简化计算模型,降低用户使用实时 ...

  5. Flink table&Sql中使用Calcite

    Apache Calcite是什么东东 Apache Calcite面向Hadoop新的sql引擎,它提供了标准的SQL语言.多种查询优化和连接各种数据源的能力.除此之外,Calcite还提供了OLA ...

  6. Demo:基于 Flink SQL 构建流式应用

    Flink 1.10.0 于近期刚发布,释放了许多令人激动的新特性.尤其是 Flink SQL 模块,发展速度非常快,因此本文特意从实践的角度出发,带领大家一起探索使用 Flink SQL 如何快速构 ...

  7. kafka传数据到Flink存储到mysql之Flink使用SQL语句聚合数据流(设置时间窗口,EventTime)

    网上没什么资料,就分享下:) 简单模式:kafka传数据到Flink存储到mysql 可以参考网站: 利用Flink stream从kafka中写数据到mysql maven依赖情况: <pro ...

  8. Flink Batch SQL 1.10 实践

    Flink作为流批统一的计算框架,在1.10中完成了大量batch相关的增强与改进.1.10可以说是第一个成熟的生产可用的Flink Batch SQL版本,它一扫之前Dataset的羸弱,从功能和性 ...

  9. Flink系列之1.10版流式SQL应用

    随着Flink 1.10的发布,对SQL的支持也非常强大.Flink 还提供了 MySql, Hive,ES, Kafka等连接器Connector,所以使用起来非常方便. 接下来咱们针对构建流式SQ ...

随机推荐

  1. C# 读取大文件 (可以读取3GB大小的txt文件)

    原文:C# 读取大文件 (可以读取3GB大小的txt文件) 在处理大数据时,有可能 会碰到 超过3GB大小的文件,如果通过 记事本 或 NotePad++去打开它,会报错,读不到任何文件. 如果你只是 ...

  2. ubuntu16.04安装搜狗输入法

    安装完Ubuntu 16.04后,要更换为国内的软件源: Ali-OSM Alibaba Open Source Mirror Site Home About Join Us Ubuntu 1.软件包 ...

  3. 动态lambda 构建

    var param = Expression.Parameter(typeof(T)); var datetime1 = Expression.Constant(dt1); var datetime2 ...

  4. List遍历删除 或取指定的前N项

    class Program { static void Main(string[] args) { /* * List遍历删除 或取指定的前N项 */ List<PerSon> listP ...

  5. 【UWP开发】uwp应用安装失败

    原文:[UWP开发]uwp应用安装失败 编译出了uwp应用.appx之后双击打开,报错你需要为此应用包而安装的新证书,或者是带有受信任证书的新应用包.系统管理员或应用开发人员可以提供帮助.证书链在不受 ...

  6. UWP入门(十一)--使用选取器打开文件和文件夹

    原文:UWP入门(十一)--使用选取器打开文件和文件夹 很漂亮的功能,很有趣 重要的 API FileOpenPicker FolderPicker StorageFile 通过让用户与选取器交互来访 ...

  7. windows 2003 无法安装 .net4.0 windows服务

    错误: InstallUtil.InstallLog文件中的错误信息:   正在运行事务处理安装.   正在开始安装的“安装”阶段. 查看日志文件的内容以获得 D:\Debug\DataChang.e ...

  8. C#实现任意源组播与特定源组播

    IP组播通信需要一个特殊的组播地址,IP组播地址是一组D类IP地址,范围从224.0.0.0 到 239.255.255.255.其中还有很多地址是为特殊的目的保留的.224.0.0.0到224.0. ...

  9. 快速开发平台 WebBuilder 8.6发布

    WebBuilder下载:http://www.geejing.com/download.html WebBuilder快速开发平台是基于Web面向服务的应用系统开发平台,可以方便快捷的搭建各类型企业 ...

  10. 原创-使用pywinauto进行dotnet的winform程序控制(一)

    pywinauto自动化控制win32的ui的程序,网上有好多的教程.但是操作dotnet写的winform教程,就少之又少.今天我就来分享我的pywinauto操作dotnet的winform的研究 ...