从flink的官方文档,我们知道flink的编程模型分为四层,sql层是最高层的api,Table api是中间层,DataStream/DataSet Api 是核心,stateful Streaming process层是底层实现。

其中,

flink dataset api使用及原理 介绍了DataSet Api

flink DataStream API使用及原理介绍了DataStream Api

flink中的时间戳如何使用?---Watermark使用及原理 介绍了底层实现的基础Watermark

flink window实例分析 介绍了window的概念及使用原理

Flink中的状态与容错 介绍了State的概念及checkpoint,savepoint的容错机制

0. 基本概念:

0.1 TableEnvironment

TableEnvironment是Table API和SQL集成的核心概念,它主要负责:

  1、在内部目录Catalog中注册一个Table
  2、注册一个外部目录Catalog
  3、执行SQL查询
  4、注册一个用户自定义函数UDF
  5、将DataStream或者DataSet转换成Table
  6、持有BatchTableEnvironment或者StreamTableEnvironment的引用
  1. /**
  2. * The base class for batch and stream TableEnvironments.
  3. *
  4. * <p>The TableEnvironment is a central concept of the Table API and SQL integration. It is
  5. * responsible for:
  6. *
  7. * <ul>
  8. * <li>Registering a Table in the internal catalog</li>
  9. * <li>Registering an external catalog</li>
  10. * <li>Executing SQL queries</li>
  11. * <li>Registering a user-defined scalar function. For the user-defined table and aggregate
  12. * function, use the StreamTableEnvironment or BatchTableEnvironment</li>
  13. * </ul>
  14. */

0.2 Catalog

Catalog:所有对数据库和表的元数据信息都存放再Flink CataLog内部目录结构中,其存放了flink内部所有与Table相关的元数据信息,包括表结构信息/数据源信息等。

  1. /**
  2. * This interface is responsible for reading and writing metadata such as database/table/views/UDFs
  3. * from a registered catalog. It connects a registered catalog and Flink's Table API.
  4. */

其结构如下:

0.3 TableSource

在使用Table API时,可以将外部的数据源直接注册成Table数据结构。此结构称之为TableSource

  1. /**
  2. * Defines an external table with the schema that is provided by {@link TableSource#getTableSchema}.
  3. *
  4. * <p>The data of a {@link TableSource} is produced as a {@code DataSet} in case of a {@code BatchTableSource}
  5. * or as a {@code DataStream} in case of a {@code StreamTableSource}. The type of ths produced
  6. * {@code DataSet} or {@code DataStream} is specified by the {@link TableSource#getProducedDataType()} method.
  7. *
  8. * <p>By default, the fields of the {@link TableSchema} are implicitly mapped by name to the fields of
  9. * the produced {@link DataType}. An explicit mapping can be defined by implementing the
  10. * {@link DefinedFieldMapping} interface.
  11. *
  12. * @param <T> The return type of the {@link TableSource}.
  13. */

0.4 TableSink

数据处理完成后需要将结果写入外部存储中,在Table API中有对应的Sink模块,此模块为TableSink

  1. /**
  2. * A {@link TableSink} specifies how to emit a table to an external
  3. * system or location.
  4. *
  5. * <p>The interface is generic such that it can support different storage locations and formats.
  6. *
  7. * @param <T> The return type of the {@link TableSink}.
  8. */

0.5 Table Connector

在Flink1.6版本之后,为了能够让Table API通过配置化的方式连接外部系统,且同时可以在sql client中使用,flink 提出了Table Connector的概念,主要目的时将Table Source和Table Sink的定义和使用分离。

通过Table Connector将不同内建的Table Source和TableSink封装,形成可以配置化的组件,在Table Api和Sql client能够同时使用。

  1. /**
  2. * Creates a table source and/or table sink from a descriptor.
  3. *
  4. * <p>Descriptors allow for declaring the communication to external systems in an
  5. * implementation-agnostic way. The classpath is scanned for suitable table factories that match
  6. * the desired configuration.
  7. *
  8. * <p>The following example shows how to read from a connector using a JSON format and
  9. * register a table source as "MyTable":
  10. *
  11. * <pre>
  12. * {@code
  13. *
  14. * tableEnv
  15. * .connect(
  16. * new ExternalSystemXYZ()
  17. * .version("0.11"))
  18. * .withFormat(
  19. * new Json()
  20. * .jsonSchema("{...}")
  21. * .failOnMissingField(false))
  22. * .withSchema(
  23. * new Schema()
  24. * .field("user-name", "VARCHAR").from("u_name")
  25. * .field("count", "DECIMAL")
  26. * .registerSource("MyTable");
  27. * }
  28. *</pre>
  29. *
  30. * @param connectorDescriptor connector descriptor describing the external system
  31. */
  32. TableDescriptor connect(ConnectorDescriptor connectorDescriptor);

本篇主要聚焦于sql和Table Api。

1.sql

1.1 基于DataSet api的sql

示例:

  1. package org.apache.flink.table.examples.java;
  2.  
  3. import org.apache.flink.api.java.DataSet;
  4. import org.apache.flink.api.java.ExecutionEnvironment;
  5. import org.apache.flink.table.api.Table;
  6. import org.apache.flink.table.api.java.BatchTableEnvironment;
  7.  
  8. /**
  9. * Simple example that shows how the Batch SQL API is used in Java.
  10. *
  11. * <p>This example shows how to:
  12. * - Convert DataSets to Tables
  13. * - Register a Table under a name
  14. * - Run a SQL query on the registered Table
  15. */
  16. public class WordCountSQL {
  17.  
  18. // *************************************************************************
  19. // PROGRAM
  20. // *************************************************************************
  21.  
  22. public static void main(String[] args) throws Exception {
  23.  
  24. // set up execution environment
  25. ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
  26. BatchTableEnvironment tEnv = BatchTableEnvironment.create(env);
  27.  
  28. DataSet<WC> input = env.fromElements(
  29. new WC("Hello", 1),
  30. new WC("Ciao", 1),
  31. new WC("Hello", 1));
  32.  
  33. // register the DataSet as table "WordCount"
  34. tEnv.registerDataSet("WordCount", input, "word, frequency");
  35.  
  36. // run a SQL query on the Table and retrieve the result as a new Table
  37. Table table = tEnv.sqlQuery(
  38. "SELECT word, SUM(frequency) as frequency FROM WordCount GROUP BY word");
  39.  
  40. DataSet<WC> result = tEnv.toDataSet(table, WC.class);
  41.  
  42. result.print();
  43. }
  44.  
  45. // *************************************************************************
  46. // USER DATA TYPES
  47. // *************************************************************************
  48.  
  49. /**
  50. * Simple POJO containing a word and its respective count.
  51. */
  52. public static class WC {
  53. public String word;
  54. public long frequency;
  55.  
  56. // public constructor to make it a Flink POJO
  57. public WC() {}
  58.  
  59. public WC(String word, long frequency) {
  60. this.word = word;
  61. this.frequency = frequency;
  62. }
  63.  
  64. @Override
  65. public String toString() {
  66. return "WC " + word + " " + frequency;
  67. }
  68. }
  69. }

其中,BatchTableEnvironment

  1. /**
  2. * The {@link TableEnvironment} for a Java batch {@link ExecutionEnvironment} that works
  3. * with {@link DataSet}s.
  4. *
  5. * <p>A TableEnvironment can be used to:
  6. * <ul>
  7. * <li>convert a {@link DataSet} to a {@link Table}</li>
  8. * <li>register a {@link DataSet} in the {@link TableEnvironment}'s catalog</li>
  9. * <li>register a {@link Table} in the {@link TableEnvironment}'s catalog</li>
  10. * <li>scan a registered table to obtain a {@link Table}</li>
  11. * <li>specify a SQL query on registered tables to obtain a {@link Table}</li>
  12. * <li>convert a {@link Table} into a {@link DataSet}</li>
  13. * <li>explain the AST and execution plan of a {@link Table}</li>
  14. * </ul>
  15. */

BatchTableSource

  1. /** Defines an external batch table and provides access to its data.
  2. *
  3. * @param <T> Type of the {@link DataSet} created by this {@link TableSource}.
  4. */

BatchTableSink

  1. /** Defines an external {@link TableSink} to emit a batch {@link Table}.
  2. *
  3. * @param <T> Type of {@link DataSet} that this {@link TableSink} expects and supports.
  4. */

1.2 基于DataStream api的sql

示例代码

  1. package org.apache.flink.table.examples.java;
  2.  
  3. import org.apache.flink.streaming.api.datastream.DataStream;
  4. import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
  5. import org.apache.flink.table.api.Table;
  6. import org.apache.flink.table.api.java.StreamTableEnvironment;
  7.  
  8. import java.util.Arrays;
  9.  
  10. /**
  11. * Simple example for demonstrating the use of SQL on a Stream Table in Java.
  12. *
  13. * <p>This example shows how to:
  14. * - Convert DataStreams to Tables
  15. * - Register a Table under a name
  16. * - Run a StreamSQL query on the registered Table
  17. *
  18. */
  19. public class StreamSQLExample {
  20.  
  21. // *************************************************************************
  22. // PROGRAM
  23. // *************************************************************************
  24.  
  25. public static void main(String[] args) throws Exception {
  26.  
  27. // set up execution environment
  28. StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
  29. StreamTableEnvironment tEnv = StreamTableEnvironment.create(env);
  30.  
  31. DataStream<Order> orderA = env.fromCollection(Arrays.asList(
  32. new Order(1L, "beer", 3),
  33. new Order(1L, "diaper", 4),
  34. new Order(3L, "rubber", 2)));
  35.  
  36. DataStream<Order> orderB = env.fromCollection(Arrays.asList(
  37. new Order(2L, "pen", 3),
  38. new Order(2L, "rubber", 3),
  39. new Order(4L, "beer", 1)));
  40.  
  41. // convert DataStream to Table
  42. Table tableA = tEnv.fromDataStream(orderA, "user, product, amount");
  43. // register DataStream as Table
  44. tEnv.registerDataStream("OrderB", orderB, "user, product, amount");
  45.  
  46. // union the two tables
  47. Table result = tEnv.sqlQuery("SELECT * FROM " + tableA + " WHERE amount > 2 UNION ALL " +
  48. "SELECT * FROM OrderB WHERE amount < 2");
  49.  
  50. tEnv.toAppendStream(result, Order.class).print();
  51.  
  52. env.execute();
  53. }
  54.  
  55. // *************************************************************************
  56. // USER DATA TYPES
  57. // *************************************************************************
  58.  
  59. /**
  60. * Simple POJO.
  61. */
  62. public static class Order {
  63. public Long user;
  64. public String product;
  65. public int amount;
  66.  
  67. public Order() {
  68. }
  69.  
  70. public Order(Long user, String product, int amount) {
  71. this.user = user;
  72. this.product = product;
  73. this.amount = amount;
  74. }
  75.  
  76. @Override
  77. public String toString() {
  78. return "Order{" +
  79. "user=" + user +
  80. ", product='" + product + '\'' +
  81. ", amount=" + amount +
  82. '}';
  83. }
  84. }
  85. }

其中,StreamTableEnvironment

  1. /**
  2. * The {@link TableEnvironment} for a Java {@link StreamExecutionEnvironment} that works with
  3. * {@link DataStream}s.
  4. *
  5. * <p>A TableEnvironment can be used to:
  6. * <ul>
  7. * <li>convert a {@link DataStream} to a {@link Table}</li>
  8. * <li>register a {@link DataStream} in the {@link TableEnvironment}'s catalog</li>
  9. * <li>register a {@link Table} in the {@link TableEnvironment}'s catalog</li>
  10. * <li>scan a registered table to obtain a {@link Table}</li>
  11. * <li>specify a SQL query on registered tables to obtain a {@link Table}</li>
  12. * <li>convert a {@link Table} into a {@link DataStream}</li>
  13. * <li>explain the AST and execution plan of a {@link Table}</li>
  14. * </ul>
  15. */

StreamTableSource

  1. /** Defines an external stream table and provides read access to its data.
  2. *
  3. * @param <T> Type of the {@link DataStream} created by this {@link TableSource}.
  4. */

StreamTableSink

  1. /**
  2. * Defines an external stream table and provides write access to its data.
  3. *
  4. * @param <T> Type of the {@link DataStream} created by this {@link TableSink}.
  5. */

2. table api

示例

  1. package org.apache.flink.table.examples.java;
  2.  
  3. import org.apache.flink.api.java.DataSet;
  4. import org.apache.flink.api.java.ExecutionEnvironment;
  5. import org.apache.flink.table.api.Table;
  6. import org.apache.flink.table.api.java.BatchTableEnvironment;
  7.  
  8. /**
  9. * Simple example for demonstrating the use of the Table API for a Word Count in Java.
  10. *
  11. * <p>This example shows how to:
  12. * - Convert DataSets to Tables
  13. * - Apply group, aggregate, select, and filter operations
  14. */
  15. public class WordCountTable {
  16.  
  17. // *************************************************************************
  18. // PROGRAM
  19. // *************************************************************************
  20.  
  21. public static void main(String[] args) throws Exception {
  22. ExecutionEnvironment env = ExecutionEnvironment.createCollectionsEnvironment();
  23. BatchTableEnvironment tEnv = BatchTableEnvironment.create(env);
  24.  
  25. DataSet<WC> input = env.fromElements(
  26. new WC("Hello", 1),
  27. new WC("Ciao", 1),
  28. new WC("Hello", 1));
  29.  
  30. Table table = tEnv.fromDataSet(input);
  31.  
  32. Table filtered = table
  33. .groupBy("word")
  34. .select("word, frequency.sum as frequency")
  35. .filter("frequency = 2");
  36.  
  37. DataSet<WC> result = tEnv.toDataSet(filtered, WC.class);
  38.  
  39. result.print();
  40. }
  41.  
  42. // *************************************************************************
  43. // USER DATA TYPES
  44. // *************************************************************************
  45.  
  46. /**
  47. * Simple POJO containing a word and its respective count.
  48. */
  49. public static class WC {
  50. public String word;
  51. public long frequency;
  52.  
  53. // public constructor to make it a Flink POJO
  54. public WC() {}
  55.  
  56. public WC(String word, long frequency) {
  57. this.word = word;
  58. this.frequency = frequency;
  59. }
  60.  
  61. @Override
  62. public String toString() {
  63. return "WC " + word + " " + frequency;
  64. }
  65. }
  66. }

3.数据转换

  3.1 DataSet与Table相互转换

    DataSet-->Table

      注册方式:

  1.   // register the DataSet as table "WordCount"
  2.   tEnv.registerDataSet("WordCount", input, "word, frequency"); 
         转换方式:
           Table table = tEnv.fromDataSet(input);

    Table-->DataSet

        DataSet<WC> result = tEnv.toDataSet(filtered, WC.class);

  3.2 DataStream与Table相互转换

    DataStream-->Table

      注册方式:

  1.   tEnv.registerDataStream("OrderB", orderB, "user, product, amount");
  1.      转换方式:
           Table tableA = tEnv.fromDataStream(orderA, "user, product, amount");

    Table-->DataStream

        DataSet<WC> result = tEnv.toDataSet(filtered, WC.class);

参考资料

【1】https://ci.apache.org/projects/flink/flink-docs-release-1.8/concepts/programming-model.html

【2】Flink原理、实战与性能优化

使用flink Table &Sql api来构建批量和流式应用(1)Table的基本概念的更多相关文章

  1. 使用flink Table &Sql api来构建批量和流式应用(2)Table API概述

    从flink的官方文档,我们知道flink的编程模型分为四层,sql层是最高层的api,Table api是中间层,DataStream/DataSet Api 是核心,stateful Stream ...

  2. 使用flink Table &Sql api来构建批量和流式应用(3)Flink Sql 使用

    从flink的官方文档,我们知道flink的编程模型分为四层,sql层是最高层的api,Table api是中间层,DataStream/DataSet Api 是核心,stateful Stream ...

  3. Flink 另外一个分布式流式和批量数据处理的开源平台

    Apache Flink是一个分布式流式和批量数据处理的开源平台. Flink的核心是一个流式数据流动引擎,它为数据流上面的分布式计算提供数据分发.通讯.容错.Flink包括几个使用 Flink引擎创 ...

  4. 8、Flink Table API & Flink Sql API

    一.概述 上图是flink的分层模型,Table API 和 SQL 处于最顶端,是 Flink 提供的高级 API 操作.Flink SQL 是 Flink 实时计算为简化计算模型,降低用户使用实时 ...

  5. Flink table&Sql中使用Calcite

    Apache Calcite是什么东东 Apache Calcite面向Hadoop新的sql引擎,它提供了标准的SQL语言.多种查询优化和连接各种数据源的能力.除此之外,Calcite还提供了OLA ...

  6. Demo:基于 Flink SQL 构建流式应用

    Flink 1.10.0 于近期刚发布,释放了许多令人激动的新特性.尤其是 Flink SQL 模块,发展速度非常快,因此本文特意从实践的角度出发,带领大家一起探索使用 Flink SQL 如何快速构 ...

  7. kafka传数据到Flink存储到mysql之Flink使用SQL语句聚合数据流(设置时间窗口,EventTime)

    网上没什么资料,就分享下:) 简单模式:kafka传数据到Flink存储到mysql 可以参考网站: 利用Flink stream从kafka中写数据到mysql maven依赖情况: <pro ...

  8. Flink Batch SQL 1.10 实践

    Flink作为流批统一的计算框架,在1.10中完成了大量batch相关的增强与改进.1.10可以说是第一个成熟的生产可用的Flink Batch SQL版本,它一扫之前Dataset的羸弱,从功能和性 ...

  9. Flink系列之1.10版流式SQL应用

    随着Flink 1.10的发布,对SQL的支持也非常强大.Flink 还提供了 MySql, Hive,ES, Kafka等连接器Connector,所以使用起来非常方便. 接下来咱们针对构建流式SQ ...

随机推荐

  1. C#异步委托等待句柄的使用

    using System;using System.Collections.Generic;using System.Linq;using System.Text;using System.Threa ...

  2. windows下,Qt Creator 中javascript调试器安装并使用

    最开始使用Qt Creator时,想使用断点来调试javascript代码.但在按下debug键后,却提示调试器未配置,让我比较郁闷. 好了,郁闷的是说了,咱们来说说高兴的.要Qt Creator调试 ...

  3. WPF常用第三方控件

    NLog日志控件: Install-Package NLog.Config Mysql数据库控件: Install-Package Mysql.Data 最新版本只支持.net 4.5.2及以上版本, ...

  4. 向github提交本地项目

    首先你需要一个github账号,所有还没有的话先去注册吧! https://github.com/ 我们使用git需要先安装git工具,这里给出下载地址,下载后一路直接安装即可: https://gi ...

  5. 修改用户名后TSF出现"需要本地工作区。工作区 xxx 并未驻留在本计算机上"

    解决方法就是:1,打开vs下的"开发人员命令提示"2,按下面格式输入命令:tf workspaces /collection:http://192.168.0.110:8080/t ...

  6. ML:梯度下降(Gradient Descent)

    现在我们有了假设函数和评价假设准确性的方法,现在我们需要确定假设函数中的参数了,这就是梯度下降(gradient descent)的用武之地. 梯度下降算法 不断重复以下步骤,直到收敛(repeat ...

  7. windows-qt 使用mingw编译c++boost并使用

    一.boost是一个准标准库,相当于STL的延续和扩充,它的设计理念和STL比较接近,都是利用泛型让复用达到最大化.不过对比STL,boost更加实用.STL集中在算法部分,而boost包含了不少工具 ...

  8. Delphi 中 断言 Assert 用法

    procedure Assert(expr : Boolean [; const msg: string]); 用法:   Assert(表达式,[显示信息]); 如果为假, assert会产生一个E ...

  9. vc++的学习目的

    vc++支持多种编程方式,从结构化的编程,面向对象编程,泛型编程,com组件编程. 我想学习vc++的原因是它更接近底层.非常的高效,希望之后用它写出非常简洁高效的代码.

  10. .NET程序员如何快入门Spring Boot

    本篇文章将教你作为一个.NET程序员如何快入门Spring Boot.你不需要用Eclipse,也不需要用IDEA.已经习惯了VS,其他的IDE-- 但不得不说VS Code很厉害,一用就喜欢.微软给 ...