kafka consumer代码梳理
kafka consumer是一个单纯的单线程程序,因此相对于producer会更好理解些。阅读consumer代码的关键是理解回调,因为consumer中使用了大量的回调函数。参看kafka中的回调函数
1 整体流程
private Map<TopicPartition, List<ConsumerRecord<K, V>>> pollOnce(long timeout) {
coordinator.ensureCoordinatorReady(); // 发送获取coordinator请求,直到获取到coordinator
if (subscriptions.partitionsAutoAssigned())
coordinator.ensurePartitionAssignment(); // 发送joinGroup和syncGroup,直到获取到consumer被分配的parttion信息;并启动心跳
if (!subscriptions.hasAllFetchPositions())
updateFetchPositions(this.subscriptions.missingFetchPositions()); // 拉取offset信息和commited信息,以便拉取数据的时候直到从哪开始拉取
long now = time.milliseconds();
Map<TopicPartition, List<ConsumerRecord<K, V>>> records = fetcher.fetchedRecords(); // 从本地数据结构中读取,并不是发送请求
if (!records.isEmpty()) // 如果获取到就直接返回
return records;
fetcher.sendFetches(); // 发送拉取数据请求
client.poll(timeout, now); // 真正的发送
return fetcher.fetchedRecords(); // 从本地数据结构中读取,并不是发送请求
2 Reblance joinGroup和syncGroup
- joinGroup。joinGroup请求加入消费组,一旦coordinator确定了所有成员都发送了joinGroup,就会返回给客户端response,response中包括memberid、generation、consumer是否是leader等信息。
- syncGroup。如果consumer是leader的话,他会在本地将已经分配好的partiton信息附加到request中,告诉coordinator,我是这样分配的。这里需要注意consumer分区的分配是放在consumer端的。如果是普通的非leader consumer,那么就是简单的请求。无论是leader还是普通的消费者, coordinator都会返回consumer需要消费的parttion列表。
private class JoinGroupResponseHandler extends CoordinatorResponseHandler<JoinGroupResponse, ByteBuffer> {
public JoinGroupResponse parse(ClientResponse response) {
return new JoinGroupResponse(response.responseBody());
public void handle(JoinGroupResponse joinResponse, RequestFuture<ByteBuffer> future) {
Errors error = Errors.forCode(joinResponse.errorCode());
if (error == Errors.NONE) {
log.debug("Received successful join group response for group {}: {}", groupId, joinResponse.toStruct());
AbstractCoordinator.this.memberId = joinResponse.memberId(); // 读取response中的memberid
AbstractCoordinator.this.generation = joinResponse.generationId(); // generationId
AbstractCoordinator.this.rejoinNeeded = false;
AbstractCoordinator.this.protocol = joinResponse.groupProtocol();
// 发送sync请求
if (joinResponse.isLeader()) {
} else {
// 省略其他
3 heartBeat
private class HeartbeatCompletionHandler extends CoordinatorResponseHandler<HeartbeatResponse, Void> {
public HeartbeatResponse parse(ClientResponse response) {
return new HeartbeatResponse(response.responseBody());
public void handle(HeartbeatResponse heartbeatResponse, RequestFuture<Void> future) {
Errors error = Errors.forCode(heartbeatResponse.errorCode());
if (error == Errors.NONE) {
log.debug("Received successful heartbeat response for group {}", groupId);
} else if (error == Errors.GROUP_COORDINATOR_NOT_AVAILABLE
|| error == Errors.NOT_COORDINATOR_FOR_GROUP) {
log.debug("Attempt to heart beat failed for group {} since coordinator {} is either not started or not valid.",
groupId, coordinator);
} else if (error == Errors.REBALANCE_IN_PROGRESS) {
log.debug("Attempt to heart beat failed for group {} since it is rebalancing.", groupId);
AbstractCoordinator.this.rejoinNeeded = true;
} else if (error == Errors.ILLEGAL_GENERATION) { // 服务端已经是新一代了,客户端需要reblance。
log.debug("Attempt to heart beat failed for group {} since generation id is not legal.", groupId);
AbstractCoordinator.this.rejoinNeeded = true; // rejoinNeeded置为true,下次拉取的时候会重新发送join和sync请求
} else if (error == Errors.UNKNOWN_MEMBER_ID) {
log.debug("Attempt to heart beat failed for group {} since member id is not valid.", groupId);
memberId = JoinGroupRequest.UNKNOWN_MEMBER_ID;
AbstractCoordinator.this.rejoinNeeded = true;
} else if (error == Errors.GROUP_AUTHORIZATION_FAILED) {
future.raise(new GroupAuthorizationException(groupId));
} else {
future.raise(new KafkaException("Unexpected error in heartbeat response: " + error.message()));
4 DelayedTask
public void schedule(DelayedTask task, long at) {
delayedTasks.add(task, at); // DelayedTaskQueue#add
public class DelayedTaskQueue {
private PriorityQueue<Entry> tasks; // 优先级队列
public DelayedTaskQueue() {
tasks = new PriorityQueue<Entry>();
* Schedule a task for execution in the future.
* @param task the task to execute
* @param at the time at which to
public void add(DelayedTask task, long at) {
tasks.add(new Entry(task, at));
// ...
private class HeartbeatTask implements DelayedTask {
private boolean requestInFlight = false;
public void reset() {
// start or restart the heartbeat task to be executed at the next chance
long now = time.milliseconds();
if (!requestInFlight)
client.schedule(this, now);
public void run(final long now) {
if (generation < 0 || needRejoin() || coordinatorUnknown()) {
// no need to send the heartbeat we're not using auto-assignment or if we are
// awaiting a rebalance
if (heartbeat.sessionTimeoutExpired(now)) {
// we haven't received a successful heartbeat in one session interval
// so mark the coordinator dead
if (!heartbeat.shouldHeartbeat(now)) {
// we don't need to heartbeat now, so reschedule for when we do
client.schedule(this, now + heartbeat.timeToNextHeartbeat(now));
} else {
requestInFlight = true;
RequestFuture<Void> future = sendHeartbeatRequest();
future.addListener(new RequestFutureListener<Void>() {
public void onSuccess(Void value) {
requestInFlight = false;
long now = time.milliseconds();
long nextHeartbeatTime = now + heartbeat.timeToNextHeartbeat(now);
// 回调中再次加入,实现了循环定时执行
client.schedule(HeartbeatTask.this, nextHeartbeatTime);
public void onFailure(RuntimeException e) {
requestInFlight = false;
client.schedule(HeartbeatTask.this, time.milliseconds() + retryBackoffMs);
5 updateFetchPositions
updateFetchPositions 用于更新commited和offset信息。客户端的消费状态是保存在SubscriptionState中的。SubscriptionState有一下主要属性
public class SubscriptionState {
private Pattern subscribedPattern;
// 消费者订阅的topic
private final Set<String> subscription;
private final Set<String> groupSubscription;
private final Set<TopicPartition> userAssignment;
// 消费状态
private final Map<TopicPartition, TopicPartitionState> assignment;
private boolean needsPartitionAssignment;
private boolean needsFetchCommittedOffsets;
private final OffsetResetStrategy defaultResetStrategy;
private ConsumerRebalanceListener listener;
// ...省略
private static class TopicPartitionState {
private Long position; // 消费位置,从coordinator拉取的时候会带上该字段
private OffsetAndMetadata committed; // 已经提交的offset
private boolean paused; // whether this partition has been paused by the user
private OffsetResetStrategy resetStrategy; // the strategy to use if the offset needs resetting
6 几个重要的参数
- fetch.min.bytes 一个parttion拉取的最小字节数。consumer是批量从broker拉取消息的,fetch.min.bytes表示最小拉取多少字节才返回。默认值是1
- fetch.max.wait.ms 拉取数据的时候最长等待时间,与fetch.min.bytes配合使用。等待fetch.max.wait.ms时间后,还没有得到fetch.min.bytes大小的数据则返回。默认值500.
- max.partition.fetch.bytes 一个partiton最多拉取字节数。默认值1048576,即1M。
private Map<Node, FetchRequest> createFetchRequests() {
// create the fetch info
Cluster cluster = metadata.fetch();
Map<Node, Map<TopicPartition, FetchRequest.PartitionData>> fetchable = new HashMap<>();
for (TopicPartition partition : fetchablePartitions()) {
Node node = cluster.leaderFor(partition);
if (node == null) {
} else if (this.client.pendingRequestCount(node) == 0) {
// if there is a leader and no in-flight requests, issue a new fetch
Map<TopicPartition, FetchRequest.PartitionData> fetch = fetchable.get(node);
if (fetch == null) {
fetch = new HashMap<>();
fetchable.put(node, fetch);
long position = this.subscriptions.position(partition);
fetch.put(partition, new FetchRequest.PartitionData(position, this.fetchSize)); // fetchSize即max.partition.fetch.bytes
log.trace("Added fetch request for partition {} at offset {}", partition, position);
// create the fetches
Map<Node, FetchRequest> requests = new HashMap<>();
for (Map.Entry<Node, Map<TopicPartition, FetchRequest.PartitionData>> entry : fetchable.entrySet()) {
Node node = entry.getKey();
// maxWaitMs即fetch.max.wait.ms,minBytes即fetch.min.byte
FetchRequest fetch = new FetchRequest(this.maxWaitMs, this.minBytes, entry.getValue());
requests.put(node, fetch);
return requests;
- max.poll.records 返回的最大record数。与以上三个参数不同,该参数不会放到fetch request中,拉取的records会放在本地变量中,该参数表示将本地变量中多少records返回。
public Map<TopicPartition, List<ConsumerRecord<K, V>>> fetchedRecords() {
if (this.subscriptions.partitionAssignmentNeeded()) {
return Collections.emptyMap();
} else {
Map<TopicPartition, List<ConsumerRecord<K, V>>> drained = new HashMap<>();
int maxRecords = maxPollRecords;
Iterator<PartitionRecords<K, V>> iterator = records.iterator();
while (iterator.hasNext() && maxRecords > 0) {
PartitionRecords<K, V> part = iterator.next();
maxRecords -= append(drained, part, maxRecords); // maxRecords就是max.poll.records
if (part.isConsumed())
return drained;
- 另外在调用consumer api的时候需要制定timeout时间,如果超过timeout仍然没有消息则返回空的records。
while (true) {
ConsumerRecords<String, String> records = consumer.poll(1000); // timeout时间
// System.out.println("begin for 2");
for (ConsumerRecord<String, String> record : records) {
// System.out.println("hello");
System.out.println(record.partition() + " " + record.offset());
kafka consumer代码梳理的更多相关文章
- kafka consumer 代码示例
使用者小组 使得许多进程的多台机器 在逻辑上作为一个单个的使用者 出现. 我们使用中,一种常见的情况是,我们按照逻辑划分出多个使用者小组,每个小组都是有作为一个逻辑整体的多台使用者计算机组成的集群. ...
- 使用kafka consumer api时,中文乱码问题
使用Intelli idea调试kafka low consumer时,由于broker存储的message有中文, idea中console端是可以正确显示的 然后mvn package打包到服务器 ...
- kafka consumer示例代码
package test_kafka; import java.util.ArrayList; import java.util.HashMap; import java.util.List; imp ...
- 读Kafka Consumer源码
最近一直在关注阿里的一个开源项目:OpenMessaging OpenMessaging, which includes the establishment of industry guideline ...
- 【原创】Kafka Consumer多线程实例
Kafka 0.9版本开始推出了Java版本的consumer,优化了coordinator的设计以及摆脱了对zookeeper的依赖.社区最近也在探讨正式用这套consumer API替换Scala ...
- 【原创】kafka consumer源代码分析
顾名思义,就是kafka的consumer api包. 一.ConsumerConfig.scala Kafka consumer的配置类,除了一些默认值常量及验证参数的方法之外,就是consumer ...
- 【原创】Kafka Consumer多线程实例续篇
在上一篇<Kafka Consumer多线程实例>中我们讨论了KafkaConsumer多线程的两种写法:多KafkaConsumer多线程以及单KafkaConsumer多线程.在第二种 ...
- Kafka设计解析(二十)Apache Flink Kafka consumer
转载自 huxihx,原文链接 Apache Flink Kafka consumer Flink提供了Kafka connector用于消费/生产Apache Kafka topic的数据.Flin ...
- 【译】Apache Flink Kafka consumer
Flink提供了Kafka connector用于消费/生产Apache Kafka topic的数据.Flink的Kafka consumer集成了checkpoint机制以提供精确一次的处理语义. ...
- Kali Linux解压包命令:
Kali Linux解压包命令: tar.gz格式压缩包: root@kali:~# tar -xzvf 压缩包.tar.gz -C /root/home/Desktop root@kali:~# c ...
- 『无为则无心』Python基础 — 61、Python中的迭代器
目录 1.迭代的概念 2.迭代器的概念 3.可迭代的对象(Iterable) 4.迭代器对象(Iterator) 5.迭代器的使用体验 (1)基本用法 (2)实际应用 1.迭代的概念 (1)什么是迭代 ...
- 【C#TAP 异步编程】异步接口 OOP
在我们深入研究"异步OOP"之前,让我们解决一个相当常见的问题:如何处理异步方法的继承?那么"异步接口"呢? 幸运的是,它确实可以很好地与继承(和接口)一起使用 ...
- 小记:音频格式转化ByPython(下)
上文中我们已经大致明白了pydub库的使用方法,今天的目标是写个爬虫爬取歌曲信息. 关于网络爬虫,Python的标准库里是有相应的包的,可以直接打开:https://docs.python.org/z ...
- docker入门-docker应用场景和优势
一.什么是docker Docker是一个使用 Go 语言开发的,并且开源的应用容器引擎,基于LXC(Linux Container)内核虚拟化技术实现,提供一系列更强的功能,比如镜像.Dockerf ...
- redis缓存雪崩和缓存穿透
缓存雪崩:由于原有的缓存过期失效,新的缓存还没有缓存进来,有一只请求缓存请求不到,导致所有请求都跑去了数据库,导致数据库IO.内存和CPU眼里过大,甚至导致宕机,使得整个系统崩溃. 解决思路:1,采用 ...
- MySQL第五讲
内容回顾 单表操作 """ 1.配置文件先统一设置成utf8 \s 2.无论你怎么改都没有生效 你的机器上不止一个mysql文件 C有一个 D有一个 3.百度搜索 sho ...
- MySql日常解决错误
MySql数据库导入sql错误 Unknown collation: 'utf8mb4_0900_ai_ci 导入语句:mysql -u root -p database < E:/SS/Tes ...
- C# Event (1) —— 我想搞个事件
本文地址:https://www.cnblogs.com/oberon-zjt0806/p/15975299.html 本文最初来自于博客园 本文遵循CC BY-NC-SA 4.0协议,转载请注明出处 ...
- thinkphp 框架自带搜索+分页+搜索标红
..........控制器方法 public function index() { //接受搜索关键字 $word=input('word'); $where=[]; if (!empty($word ...