Flink生产数据到Kafka频繁出现事务失效导致任务重启
在生产中需要将一些数据发到kafka,而且需要做到EXACTLY_ONCE,kafka使用的版本为1.1.0,flink的版本为1.8.0,但是会很经常因为提交事务引起错误,甚至导致任务重启
kafka producer的配置如下
def getKafkaProducer(kafkaAddr: String, targetTopicName: String, kafkaProducersPoolSize: Int): FlinkKafkaProducer[String] = {
val properties = new Properties()
properties.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaAddr)
properties.setProperty(ProducerConfig.TRANSACTION_TIMEOUT_CONFIG, 6000 * 6 + "")
// 设置了retries参数,可以在Kafka的Partition发生leader切换时,Flink不重启,而是做5次尝试:
properties.setProperty(ProducerConfig.RETRIES_CONFIG, "5")
properties.setProperty(ProducerConfig.MAX_REQUEST_SIZE_CONFIG, String.valueOf(1048576 * 5))
val serial = new KeyedSerializationSchemaWrapper(new SimpleStringSchema())
//val producer = new FlinkKafkaProducer011[String](targetTopicName, serial, properties, Optional.of(new KafkaProducerPartitioner[String]()), Semantic.EXACTLY_ONCE, kafkaProducersPoolSize)
val producer = new FlinkKafkaProducer[String](targetTopicName, serial, properties, Optional.of(new KafkaProducerPartitioner[String]()), FlinkKafkaProducer.Semantic.EXACTLY_ONCE, kafkaProducersPoolSize)
producer.setWriteTimestampToKafka(true)
producer
}
Flink env如下
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.enableCheckpointing(60 * 1000 * 1, CheckpointingMode.EXACTLY_ONCE)
val config = env.getCheckpointConfig
//RETAIN_ON_CANCELLATION在job canceled的时候会保留externalized checkpoint state
config.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION)
//用于指定checkpoint coordinator上一个checkpoint完成之后最小等多久可以出发另一个checkpoint,当指定这个参数时,maxConcurrentCheckpoints的值为1
config.setMinPauseBetweenCheckpoints(3000)
//用于指定运行中的checkpoint最多可以有多少个,如果有设置了minPauseBetweenCheckpoints,则maxConcurrentCheckpoints这个参数就不起作用了(大于1的值不起作用)
config.setMaxConcurrentCheckpoints(1)
//指定checkpoint执行的超时时间(单位milliseconds),超时没完成就会被abort掉
config.setCheckpointTimeout(30000)
//用于指定在checkpoint发生异常的时候,是否应该fail该task,默认为true,如果设置为false,则task会拒绝checkpoint然后继续运行
//https://issues.apache.org/jira/browse/FLINK-11662
config.setFailOnCheckpointingErrors(false)
然后经常会出现事务失效的问题,报错有很多种,大概为以下
java.lang.RuntimeException: Error while confirming checkpoint
at org.apache.flink.runtime.taskmanager.Task$2.run(Task.java:1218)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.flink.util.FlinkRuntimeException: Committing one of transactions failed, logging first encountered failure
at org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.notifyCheckpointComplete(TwoPhaseCommitSinkFunction.java:296)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.notifyCheckpointComplete(AbstractUdfStreamOperator.java:130)
at org.apache.flink.streaming.runtime.tasks.StreamTask.notifyCheckpointComplete(StreamTask.java:684)
at org.apache.flink.runtime.taskmanager.Task$2.run(Task.java:1213)
... 5 more
Caused by: org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an operation with an old epoch. Either there is a newer producer with the same transactionalId, or the producer's transaction has been expired by the broker.
org.apache.flink.streaming.connectors.kafka.FlinkKafkaException: Failed to send data to Kafka: Producer attempted an operation with an old epoch. Either there is a newer producer with the same transactionalId, or the producer's transaction has been expired by the broker.
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.checkErroneous(FlinkKafkaProducer.java:1002)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.invoke(FlinkKafkaProducer.java:619)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.invoke(FlinkKafkaProducer.java:97)
at org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.invoke(TwoPhaseCommitSinkFunction.java:228)
at org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:56)
at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202)
at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an operation with an old epoch. Either there is a newer producer with the same transactionalId, or the producer's transaction has been expired by the broker.
org.apache.kafka.common.KafkaException: Cannot perform send because at least one previous transactional or idempotent request has failed with errors.
at org.apache.kafka.clients.producer.internals.TransactionManager.failIfNotReadyForSend(TransactionManager.java:278)
at org.apache.kafka.clients.producer.internals.TransactionManager.maybeAddPartitionToTransaction(TransactionManager.java:263)
at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:804)
at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:760)
at org.apache.flink.streaming.connectors.kafka.internal.FlinkKafkaInternalProducer.send(FlinkKafkaInternalProducer.java:105)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.invoke(FlinkKafkaProducer.java:650)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.invoke(FlinkKafkaProducer.java:97)
at org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.invoke(TwoPhaseCommitSinkFunction.java:228)
at org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:56)
at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:202)
at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an operation with an old epoch. Either there is a newer producer with the same transactionalId, or the producer's transaction has been expired by the broker.
Checkpoint failed: Failed to send data to Kafka: Producer attempted an operation with an old epoch. Either there is a newer producer with the same transactionalId, or the producer's transaction has been expired by the broker. Checkpoint failed: Could not complete snapshot 11 for operator Sink: data_Sink (2/2).
这些错误基本涉及到两阶段提交、事务、checkpoint。
查看kafka documentation和研究ProducerConfig这个类后发现 kafka producer 在使用EXACTLY_ONCE的时候需要增加一些配置
the transaction timeout must be larger than the checkpoint interval, but smaller than the broker transaction.max.timeout.ms.
在getKafkaProducer增加以下配置后,出现原来的错误减少
//checkpoint 间隔时间<TRANSACTION_TIMEOUT_CONFIG<kafka transaction.max.timeout.ms (默认900秒)
properties.setProperty(ProducerConfig.TRANSACTION_TIMEOUT_CONFIG, 1000 * 60 * 3 + "")
properties.setProperty(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, "1")
properties.setProperty(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, "true")
问题得到缓解。
参考:
https://www.cnblogs.com/wangzhuxing/p/10111831.html
http://www.heartthinkdo.com/?p=2040
http://romanmarkunas.com/web/blog/kafka-transactions-in-practice-1-producer/
Flink生产数据到Kafka频繁出现事务失效导致任务重启的更多相关文章
- kafka没配置好,导致服务器重启之后,topic丢失,topic里面的消息也丢失
转,原文:https://blog.csdn.net/zfszhangyuan/article/details/53389916 ----------------------------------- ...
- Kafka科普系列 | Kafka中的事务是什么样子的?
事务,对于大家来说可能并不陌生,比如数据库事务.分布式事务,那么Kafka中的事务是什么样子的呢? 在说Kafka的事务之前,先要说一下Kafka中幂等的实现.幂等和事务是Kafka 0.11.0.0 ...
- flink⼿手动维护kafka偏移量量
flink对接kafka,官方模式方式是自动维护偏移量 但并没有考虑到flink消费kafka过程中,如果出现进程中断后的事情! 如果此时,进程中段: 1:数据可能丢失 从获取了了数据,但是在执⾏行行 ...
- java面试记录二:spring加载流程、springmvc请求流程、spring事务失效、synchronized和volatile、JMM和JVM模型、二分查找的实现、垃圾收集器、控制台顺序打印ABC的三种线程实现
注:部分答案引用网络文章 简答题 1.Spring项目启动后的加载流程 (1)使用spring框架的web项目,在tomcat下,是根据web.xml来启动的.web.xml中负责配置启动spring ...
- Mysql引起的spring事务失效
老项目加新功能,导致出现service调用service的情况..一共2张表有数据的添加删除.然后测试了一下事务,表A和表B,我在表B中抛了异常,但结果发现,表B回滚正常,但是表A并没有回滚.显示事务 ...
- spring声明式事务 同一类内方法调用事务失效
只要避开Spring目前的AOP实现上的限制,要么都声明要事务,要么分开成两个类,要么直接在方法里使用编程式事务 [问题] Spring的声明式事务,我想就不用多介绍了吧,一句话“自从用了Spring ...
- SpringMvc配置 导致实事务失效
SpringMVC回归MVC本质,简简单单的Restful式函数,没有任何基类之后,应该是传统Request-Response框架中最好用的了. Tips 1.事务失效的惨案 Spring MVC最打 ...
- Spring component-scan 的逻辑 、单例模式下多实例问题、事务失效
原创内容,转发请保留:http://www.cnblogs.com/iceJava/p/6930118.html,谢谢 之前遇到该问题,今天查看了下 spring 4.x 的代码 一,先理解下 con ...
- spring事务失效情况分析
详见:http://blog.yemou.net/article/query/info/tytfjhfascvhzxcyt113 <!--[if !supportLists]-->一.&l ...
- kafka producer生产数据到kafka异常:Got error produce response with correlation id 16 on topic-partition...Error: NETWORK_EXCEPTION
kafka producer生产数据到kafka异常:Got error produce response with correlation id 16 on topic-partition... ...
随机推荐
- 我的vim配置相关
谨以此文记录下之前的折腾.(后续可能还会折腾什么) 目标 我的目的很简单,就是希望能有一个启动快速的文本编辑器,可以简单的代码着色,vim键位,简单的文本修改,打开大点的文件不发愁,可以简单的form ...
- 1.mysql创建索引
-- 创建一个普通索引(方式①)create index 索引名 ON 表名 (列名(索引键长度) [ASC|DESC]);-- 创建一个普通索引(方式②)alter table 表名 add ind ...
- 批量获取title
1 import requests 2 from bs4 import BeautifulSoup 3 import pandas as pd 4 from openpyxl import Workb ...
- loadrunner---脚本录制常见问题
一:loadrunner录制脚本时ie浏览器不弹出? 1.IE浏览器取消勾选[启用第三方浏览器扩展]启动IE,从[工具]进入[Internet选项],切到高级,去掉[启用第三方浏览器扩展(需要重启动) ...
- MAYA专用卸载工具,完全彻底卸载删除干净maya各种残留注册表和文件的方法和步骤
maya专用卸载工具,完全彻底卸载删除干净maya各种残留注册表和文件的方法和步骤.如何卸载maya呢?有很多同学想把maya卸载后重新安装,但是发现maya安装到一半就失败了或者显示maya已安装或 ...
- BIP拓展js的使用
__app.define("common_VM_Extend.js", function () { var selectData = null; var common_VM ...
- REMOTE HOST IDENTIFICATION HAS CHANGED!服务器重置后远程连接不上
问题: 解决: 本地打开shell,重置key
- 远程连接linux桌面
https://zhuanlan.zhihu.com/p/127265014 https://zhuanlan.zhihu.com/p/93438433
- WEB的安全性测试要素【转】
原文链接:http://www.cnblogs.com/zgqys1980/archive/2009/05/13/1455710.html WEB的安全性测试要素 1.SQL Injection(SQ ...
- gorm去重查询 iris框架
写练习 demo 时遇到需要进行去重查询,gorm没有db.distinct()的写法 // 数据库的表字段 type Pro_location_relation struct { Id int64 ...