Flink应用案例:How Trackunit leverages Flink to process real-time data from industrial IoT devices
Recently there has been significant discussion about edge computing as a major technology trend in 2019. Edge computing brings computing capabilities away from the cloud, and rather close to the field, especially in the Industrial IoT sector (IIoT). In this blog post we describe how Trackunit leverages Apache Flink as the stream processing framework of choice to build data pipelines for fleet management operations in the construction industry.
Trackunit has specialized in the design, development and production of fleet management systems. The company is a world leader in telematics solutions for the construction industry and provides IoT services for a broad portfolio of companies and sectors to optimize the daily operations of its customers. In the following paragraphs, we describe how Trackunit’s data architecture evolved over time to include new features for the company’s data pipeline.
The company’s journey with Flink started in 2016 as part of a new strategy to build its technology powered by distributed, open source data processing technologies for increased scalability and efficient production deployment. The infrastructure was built on AWS, initially using Amazon Kinesis as the messaging queue and Amazon EMR for cluster management, alongside Flink1.2 (which was quickly upgraded to Flink1.3). The following diagram gives an overview of the initial pipeline:
As shown above, in this phase of the architecture the IoT devices send data through telematics to a Kinesis topic that is then passing them on to a single Flink Job for parsing and storage. During this stage, the architecture included an external parsing service that was additionally accessing data from a database asynchronously. The results were then passed back to the single Flink job and then stored in Cassandra.
However, due to location data being important for industrial IoT applications like Trackunit’s, the second iteration of the pipeline includes additional data enrichment. This is achieved using Flink’s Async I/O function that calls two separate external services: one for parsing and a second for enriching the data that is then transferred back to the pipeline as shown in the diagram below.
The third evolution of the pipeline includes separating this single job to multiple ones, each specialized in a specific pipeline task. As illustrated below, this iteration includes a Flink job responsible for parsing the data which is then moved to a Kinesis topic, followed by a second Flink job responsible for data enrichment and a third one storing the enriched data to Cassandra.
By separating the single Flink job to different ones, the team was able to reuse and add functionality to the same pipeline. Additionally, with each Flink job focusing on a single operation, it became easier to debug and fix issues. Finally, Trackunit’s team can re-use different parts of the infrastructure in different applications as required by the business. This proves to be a scalable solution that allows development work to be repeated and shared across use cases. However, with this setup, the team experienced a slowdown in Flink’s throughput that was caused by the external parsing service. As a solution, the team removed the external parsing service and embedded the code to the Flink parsing job for greater efficiency and faster parsing of the data as shown in the diagram below.
To further increase performance and minimize the number of calls to the async enrichment service, the team implemented a cache to enrich the pipeline with location data before writing to a new Kinesis topic pushing the enriched data downstream as illustrated in the diagram below. This addition managed to decrease the Async calls by 33% which was a big achievement for the team.
Trackunit is constantly looking at new upgrades and Flink features that can increase the pipeline's performance even further and make the architecture more scalable and robust. The team is currently using Flink 1.7.1 in testing and production and plans to replace all internal state to Avro to ensure better state migration.
You can find out more about our journey with Apache Flink and some specific DOs and DONTs in my Flink Forward Berlin 2018 talk here.
Lasse Nedergaard is a lead developer and system architect for reactive distributed systems at Trackunit S/A based on Mesos DC/OS, Apache Flink, Apache Akka and Akka streams, Kinesis, Cassandra, and SQL Server 2016 among others.
About Trackunit:
Since 2003, Trackunit has specialized in the design and development of fleet management systems. The company creates both hardware and software solutions within telematics and industrial IoT. Developing unique solutions to provide suppliers, owners and operators of machines with the most effective telematics solutions. We use case studies and customer feedback to generate valuable insights for developing new products and services.
Trackunit is the leading global supplier of fleet management solutions, operating out of our HQ in Denmark and eight offices worldwide.
About Apache Flink:
Apache Flink is used by developers to analyze and process data streams of very high volume. By adopting Flink and a data streaming architecture, enterprises can get real-time insights from their data in milliseconds, as well as cover existing historical data processing needs within a single platform.
Flink is developed and supported by a vibrant and growing open source community at the Apache Software Foundation with more than 460 contributors, of which dA engineers are proud participants.
Flink应用案例:How Trackunit leverages Flink to process real-time data from industrial IoT devices的更多相关文章
- Flink 从 0 到 1 学习 —— Flink Data transformation(转换)
toc: true title: Flink 从 0 到 1 学习 -- Flink Data transformation(转换) date: 2018-11-04 tags: Flink 大数据 ...
- Flink 从0到1学习—— Flink 不可以连续 Split(分流)?
前言 今天上午被 Flink 的一个算子困惑了下,具体问题是什么呢? 我有这么个需求:有不同种类型的告警数据流(包含恢复数据),然后我要将这些数据流做一个拆分,拆分后的话,每种告警里面的数据又想将告警 ...
- Flink 从0到1学习 —— Flink 中如何管理配置?
前言 如果你了解 Apache Flink 的话,那么你应该熟悉该如何像 Flink 发送数据或者如何从 Flink 获取数据.但是在某些情况下,我们需要将配置数据发送到 Flink 集群并从中接收一 ...
- Flink 源码解析 —— 深度解析 Flink 是如何管理好内存的?
前言 如今,许多用于分析大型数据集的开源系统都是用 Java 或者是基于 JVM 的编程语言实现的.最着名的例子是 Apache Hadoop,还有较新的框架,如 Apache Spark.Apach ...
- Flink 源码解析 —— 深度解析 Flink 序列化机制
Flink 序列化机制 https://t.zsxq.com/JaQfeMf 博客 1.Flink 从0到1学习 -- Apache Flink 介绍 2.Flink 从0到1学习 -- Mac 上搭 ...
- Flink 从 0 到 1 学习 —— Flink 配置文件详解
前面文章我们已经知道 Flink 是什么东西了,安装好 Flink 后,我们再来看下安装路径下的配置文件吧. 安装目录下主要有 flink-conf.yaml 配置.日志的配置文件.zk 配置.Fli ...
- Flink中案例学习--State与CheckPoint理解
1.State概念理解 在Flink中,按照基本类型,对State做了以下两类的划分:Keyed State, Operator State. Keyed State:和Key有关的状态类型,它只能被 ...
- Apache Flink 进阶(六):Flink 作业执行深度解析
本文根据 Apache Flink 系列直播课程整理而成,由 Apache Flink Contributor.网易云音乐实时计算平台研发工程师岳猛分享.主要分享内容为 Flink Job 执行作业的 ...
- Flink 的Window 操作(基于flink 1.3描述)
Window是无限数据流处理的核心,Window将一个无限的stream拆分成有限大小的”buckets”桶,我们可以在这些桶上做计算操作.本文主要聚焦于在Flink中如何进行窗口操作,以及程序员如何 ...
随机推荐
- SQL执行WebService
写了一个钉钉发送消息的类, 要发送用友等审核单据信息, 模式: 钉钉发消息功能在webservice中, 用友消息列表中有新消息时,采用触发器执行webservice. 在测试中 ,功能正常 ,但将在 ...
- postgresql 添加uuid扩展
去年用EF Core做数据迁移到psql数据库的时候遇到了缺失uuid的错误,当时帅气的脸蛋突然懵逼了.现记录一下 以备参考. 环境:Centos7.2 psql Xshell Xshell连接C ...
- App瘦身、性能优化总结
App瘦身 资源瘦身 使用tinypng压缩PNG图片.视频可以通过 Final cut等软件进行分辨率压缩.音频则降低码率即可. 非必须资源文件可以放到自己服务器上 启动图使用 LaunchScre ...
- Ambari 常用的 REST API 介绍
源码文档路径:ambari\ambari-server\docs\api\v1 swagger风格api文档:https://www.cnblogs.com/felixzh/p/10694724.ht ...
- 查看三种MySQL字符集的方法(转)
MySQL字符集多种多样,下面为您列举了其中三种最常见的MySQL字符集查看方法,该方法供您参考,希望对您学习MySQL数据库能有所启迪. 一.查看MySQL数据库服务器和数据库MySQL字符集. m ...
- 多线程工具类:CountDownLatch、CyclicBarrier、Semaphore、LockSupport
◆CountDownLatch◆ 假如有一个任务想要往下执行,但必须要等到其他的任务执行完毕后才可以.比如你想要买套房子,但是呢你现在手上没有钱.你得等这个月工资发了.然后年终奖发了.然后朋友借你得钱 ...
- 用pyinstaller打包python程序,解决打包时的错误:Cannot find existing PyQt5 plugin directories
解决方法就是用everything搜索PyQt5,找到 /Library/plugins路径下的PyQt5文件夹,将里面的dll动态库pyqt5qmlplugin.dll复制出来 按照错误提示的路径, ...
- javaScript设计模式之面向对象编程(object-oriented programming,OOP)(二)
接上一篇 面向对象编程的理解? 答:面向对象编程,就是将你的需求抽象成一个对象,然后针对这个对象分析其特征(属性)与动作(方法).这个对象我们称之为类.面向对象编程思想其中一个特点就是封装,就是把你需 ...
- Elasticsearch倒排索引结构
一切设计都是为了提高搜索的性能 倒排索引(Inverted Index)也叫反向索引,有反向索引必有正向索引.通俗地来讲,正向索引是通过key找value,反向索引则是通过value找key. 先来回 ...
- 【Android Studio安装部署系列】二十八、Android Studio查看其它APP的布局结构
概述 日常使用别家的APP过程中,会遇到一些比较好看的布局,这时候我们就想学习一下别人的布局结构,以便参考. (1)手机连接电脑.设置手机为USB调试模式 参考<[Android Studio安 ...