Flink应用案例:How Trackunit leverages Flink to process real-time data from industrial IoT devices
Recently there has been significant discussion about edge computing as a major technology trend in 2019. Edge computing brings computing capabilities away from the cloud, and rather close to the field, especially in the Industrial IoT sector (IIoT). In this blog post we describe how Trackunit leverages Apache Flink as the stream processing framework of choice to build data pipelines for fleet management operations in the construction industry.
Trackunit has specialized in the design, development and production of fleet management systems. The company is a world leader in telematics solutions for the construction industry and provides IoT services for a broad portfolio of companies and sectors to optimize the daily operations of its customers. In the following paragraphs, we describe how Trackunit’s data architecture evolved over time to include new features for the company’s data pipeline.
The company’s journey with Flink started in 2016 as part of a new strategy to build its technology powered by distributed, open source data processing technologies for increased scalability and efficient production deployment. The infrastructure was built on AWS, initially using Amazon Kinesis as the messaging queue and Amazon EMR for cluster management, alongside Flink1.2 (which was quickly upgraded to Flink1.3). The following diagram gives an overview of the initial pipeline:
As shown above, in this phase of the architecture the IoT devices send data through telematics to a Kinesis topic that is then passing them on to a single Flink Job for parsing and storage. During this stage, the architecture included an external parsing service that was additionally accessing data from a database asynchronously. The results were then passed back to the single Flink job and then stored in Cassandra.
However, due to location data being important for industrial IoT applications like Trackunit’s, the second iteration of the pipeline includes additional data enrichment. This is achieved using Flink’s Async I/O function that calls two separate external services: one for parsing and a second for enriching the data that is then transferred back to the pipeline as shown in the diagram below.
The third evolution of the pipeline includes separating this single job to multiple ones, each specialized in a specific pipeline task. As illustrated below, this iteration includes a Flink job responsible for parsing the data which is then moved to a Kinesis topic, followed by a second Flink job responsible for data enrichment and a third one storing the enriched data to Cassandra.
By separating the single Flink job to different ones, the team was able to reuse and add functionality to the same pipeline. Additionally, with each Flink job focusing on a single operation, it became easier to debug and fix issues. Finally, Trackunit’s team can re-use different parts of the infrastructure in different applications as required by the business. This proves to be a scalable solution that allows development work to be repeated and shared across use cases. However, with this setup, the team experienced a slowdown in Flink’s throughput that was caused by the external parsing service. As a solution, the team removed the external parsing service and embedded the code to the Flink parsing job for greater efficiency and faster parsing of the data as shown in the diagram below.
To further increase performance and minimize the number of calls to the async enrichment service, the team implemented a cache to enrich the pipeline with location data before writing to a new Kinesis topic pushing the enriched data downstream as illustrated in the diagram below. This addition managed to decrease the Async calls by 33% which was a big achievement for the team.
Trackunit is constantly looking at new upgrades and Flink features that can increase the pipeline's performance even further and make the architecture more scalable and robust. The team is currently using Flink 1.7.1 in testing and production and plans to replace all internal state to Avro to ensure better state migration.
You can find out more about our journey with Apache Flink and some specific DOs and DONTs in my Flink Forward Berlin 2018 talk here.
Lasse Nedergaard is a lead developer and system architect for reactive distributed systems at Trackunit S/A based on Mesos DC/OS, Apache Flink, Apache Akka and Akka streams, Kinesis, Cassandra, and SQL Server 2016 among others.
About Trackunit:
Since 2003, Trackunit has specialized in the design and development of fleet management systems. The company creates both hardware and software solutions within telematics and industrial IoT. Developing unique solutions to provide suppliers, owners and operators of machines with the most effective telematics solutions. We use case studies and customer feedback to generate valuable insights for developing new products and services.
Trackunit is the leading global supplier of fleet management solutions, operating out of our HQ in Denmark and eight offices worldwide.
About Apache Flink:
Apache Flink is used by developers to analyze and process data streams of very high volume. By adopting Flink and a data streaming architecture, enterprises can get real-time insights from their data in milliseconds, as well as cover existing historical data processing needs within a single platform.
Flink is developed and supported by a vibrant and growing open source community at the Apache Software Foundation with more than 460 contributors, of which dA engineers are proud participants.
Flink应用案例:How Trackunit leverages Flink to process real-time data from industrial IoT devices的更多相关文章
- Flink 从 0 到 1 学习 —— Flink Data transformation(转换)
toc: true title: Flink 从 0 到 1 学习 -- Flink Data transformation(转换) date: 2018-11-04 tags: Flink 大数据 ...
- Flink 从0到1学习—— Flink 不可以连续 Split(分流)?
前言 今天上午被 Flink 的一个算子困惑了下,具体问题是什么呢? 我有这么个需求:有不同种类型的告警数据流(包含恢复数据),然后我要将这些数据流做一个拆分,拆分后的话,每种告警里面的数据又想将告警 ...
- Flink 从0到1学习 —— Flink 中如何管理配置?
前言 如果你了解 Apache Flink 的话,那么你应该熟悉该如何像 Flink 发送数据或者如何从 Flink 获取数据.但是在某些情况下,我们需要将配置数据发送到 Flink 集群并从中接收一 ...
- Flink 源码解析 —— 深度解析 Flink 是如何管理好内存的?
前言 如今,许多用于分析大型数据集的开源系统都是用 Java 或者是基于 JVM 的编程语言实现的.最着名的例子是 Apache Hadoop,还有较新的框架,如 Apache Spark.Apach ...
- Flink 源码解析 —— 深度解析 Flink 序列化机制
Flink 序列化机制 https://t.zsxq.com/JaQfeMf 博客 1.Flink 从0到1学习 -- Apache Flink 介绍 2.Flink 从0到1学习 -- Mac 上搭 ...
- Flink 从 0 到 1 学习 —— Flink 配置文件详解
前面文章我们已经知道 Flink 是什么东西了,安装好 Flink 后,我们再来看下安装路径下的配置文件吧. 安装目录下主要有 flink-conf.yaml 配置.日志的配置文件.zk 配置.Fli ...
- Flink中案例学习--State与CheckPoint理解
1.State概念理解 在Flink中,按照基本类型,对State做了以下两类的划分:Keyed State, Operator State. Keyed State:和Key有关的状态类型,它只能被 ...
- Apache Flink 进阶(六):Flink 作业执行深度解析
本文根据 Apache Flink 系列直播课程整理而成,由 Apache Flink Contributor.网易云音乐实时计算平台研发工程师岳猛分享.主要分享内容为 Flink Job 执行作业的 ...
- Flink 的Window 操作(基于flink 1.3描述)
Window是无限数据流处理的核心,Window将一个无限的stream拆分成有限大小的”buckets”桶,我们可以在这些桶上做计算操作.本文主要聚焦于在Flink中如何进行窗口操作,以及程序员如何 ...
随机推荐
- Java实现发送手机验证码功能(短信+语音)
利用第三方平台可以实现发送手机短信验证码和语音验证码的功能,本文使用框架是struts2+spring+hibernate,现就action层给出核心代码功能. public class Verify ...
- SpringCloud学习系列汇总
Spring Cloud常用组件使用汇总 使用SpringBoot2.0.3整合SpringCloud 服务注册与发现Eureka 自定义Eureka集群负载均衡策略 如何使用高可用的Eureka F ...
- python --- 二分查找算法
二分查找法:在我的理解中这个查找方法为什么会叫二分呢,我认为是将要查询的一个列表分成了两份,然后在利用某个值来进行比较,在一个不断循环的过程中来找出我们要找的某一个值. 废话不多说,先上代码: def ...
- FragmentTabHostBottomDemo【FragmentTabHost + Fragment实现底部选项卡】
版权声明:本文为HaiyuKing原创文章,转载请注明出处! 前言 使用FragmentTabHost实现底部选项卡效果. 备注:该Demo主要是演示FragmentTabHost的一些设置和部分功能 ...
- 基于mapreduce实现图的三角形计数
源代码放在我的github上,想细致了解的可以访问:TriangleCount on github 一.实验要求 1.1 实验背景 图的三角形计数问题是一个基本的图计算问题,是很多复杂 ...
- 高可用实现KeepAlived原理简介
一.简介 目前主流实现web网站及数据库服务高可用软件包括:keepalived.heartbeat.corosync,cman;高可用简称HA: 官方站点:https://www.keepalive ...
- Css-移动端适配总结
前言 工作以后,大部分的业务工作都是基于移动端H5的,开发过程中学习了很多东西,遇到过许多问题,诸如rem\em\css px\device px等,本文纯属个人的归纳总结,如有问题,请指出亲喷~ P ...
- 初步认识Swiper_前端交互控制神器_滚动3D切换等特效简单制作
前言: 本人在项目的工作中负责研发,页面及交互基本都是交给前端去做的.以前前端写的东西大概都知道,都是一些JS,CSS和HTML等的一些基本控制,都懂!但是今天前端突然做了一个具有特殊效果的DOM:页 ...
- 2017年IT行业测试调查报告
在刚刚过去的2017年, 我们来一起看一下2017年IT行业测试调查报告 还是1到5名测试工程师最多 Test Architects 在北上广一线城市已经出现 https://www.lagou.co ...
- IT企业级应⽤开发模式演化
前端研发流程 传统To B类系统的研发模式 探索 & 思考设计模式库(DPL)设计语⾔设计语⾔详解基于MVVM模式的Web框架 & UI库优化后的开发模式实现价值实践 赋能 赋能授权( ...