kafka 的心跳是 kafka consumer 和 broker 之间的健康检查,只有当 broker coordinator 正常时,consumer 才会发送心跳。

consumer 和 rebalance 相关的 2 个配置参数:

参数名                --> MemberMetadata 字段
session.timeout.ms --> MemberMetadata.sessionTimeoutMs
max.poll.interval.ms --> MemberMetadata.rebalanceTimeoutMs

broker 端,sessionTimeoutMs 参数

broker 处理心跳的逻辑在 GroupCoordinator 类中:如果心跳超期, broker coordinator 会把消费者从 group 中移除,并触发 rebalance。

   private def completeAndScheduleNextHeartbeatExpiration(group: GroupMetadata, member: MemberMetadata) {
// complete current heartbeat expectation
member.latestHeartbeat = time.milliseconds()
val memberKey = MemberKey(member.groupId, member.memberId)
heartbeatPurgatory.checkAndComplete(memberKey) // reschedule the next heartbeat expiration deadline
// 计算心跳截止时刻
val newHeartbeatDeadline = member.latestHeartbeat + member.sessionTimeoutMs
val delayedHeartbeat = new DelayedHeartbeat(this, group, member, newHeartbeatDeadline, member.sessionTimeoutMs)
heartbeatPurgatory.tryCompleteElseWatch(delayedHeartbeat, Seq(memberKey))
} // 心跳过期
def onExpireHeartbeat(group: GroupMetadata, member: MemberMetadata, heartbeatDeadline: Long) {
group.inLock {
if (!shouldKeepMemberAlive(member, heartbeatDeadline)) {
info(s"Member ${member.memberId} in group ${group.groupId} has failed, removing it from the group")
removeMemberAndUpdateGroup(group, member)
}
}
} private def shouldKeepMemberAlive(member: MemberMetadata, heartbeatDeadline: Long) =
member.awaitingJoinCallback != null ||
member.awaitingSyncCallback != null ||
member.latestHeartbeat + member.sessionTimeoutMs > heartbeatDeadline

consumer 端:sessionTimeoutMs,rebalanceTimeoutMs 参数

如果客户端发现心跳超期,客户端会标记 coordinator 为不可用,并阻塞心跳线程;如果超过了 poll 消息的间隔超过了 rebalanceTimeoutMs,则 consumer 告知 broker 主动离开消费组,也会触发 rebalance

org.apache.kafka.clients.consumer.internals.AbstractCoordinator.HeartbeatThread 代码片段:

if (coordinatorUnknown()) {
if (findCoordinatorFuture != null || lookupCoordinator().failed())
// the immediate future check ensures that we backoff properly in the case that no
// brokers are available to connect to.
AbstractCoordinator.this.wait(retryBackoffMs);
} else if (heartbeat.sessionTimeoutExpired(now)) {
// the session timeout has expired without seeing a successful heartbeat, so we should
// probably make sure the coordinator is still healthy.
markCoordinatorUnknown();
} else if (heartbeat.pollTimeoutExpired(now)) {
// the poll timeout has expired, which means that the foreground thread has stalled
// in between calls to poll(), so we explicitly leave the group.
maybeLeaveGroup();
} else if (!heartbeat.shouldHeartbeat(now)) {
// poll again after waiting for the retry backoff in case the heartbeat failed or the
// coordinator disconnected
AbstractCoordinator.this.wait(retryBackoffMs);
} else {
heartbeat.sentHeartbeat(now); sendHeartbeatRequest().addListener(new RequestFutureListener<Void>() {
@Override
public void onSuccess(Void value) {
synchronized (AbstractCoordinator.this) {
heartbeat.receiveHeartbeat(time.milliseconds());
}
} @Override
public void onFailure(RuntimeException e) {
synchronized (AbstractCoordinator.this) {
if (e instanceof RebalanceInProgressException) {
// it is valid to continue heartbeating while the group is rebalancing. This
// ensures that the coordinator keeps the member in the group for as long
// as the duration of the rebalance timeout. If we stop sending heartbeats,
// however, then the session timeout may expire before we can rejoin.
heartbeat.receiveHeartbeat(time.milliseconds());
} else {
heartbeat.failHeartbeat(); // wake up the thread if it's sleeping to reschedule the heartbeat
AbstractCoordinator.this.notify();
}
}
}
});
}
 /**
* A helper class for managing the heartbeat to the coordinator
*/
public final class Heartbeat {
private final long sessionTimeout;
private final long heartbeatInterval;
private final long maxPollInterval;
private final long retryBackoffMs; private volatile long lastHeartbeatSend; // volatile since it is read by metrics
private long lastHeartbeatReceive;
private long lastSessionReset;
private long lastPoll;
private boolean heartbeatFailed; public Heartbeat(long sessionTimeout,
long heartbeatInterval,
long maxPollInterval,
long retryBackoffMs) {
if (heartbeatInterval >= sessionTimeout)
throw new IllegalArgumentException("Heartbeat must be set lower than the session timeout"); this.sessionTimeout = sessionTimeout;
this.heartbeatInterval = heartbeatInterval;
this.maxPollInterval = maxPollInterval;
this.retryBackoffMs = retryBackoffMs;
} public void poll(long now) {
this.lastPoll = now;
} public void sentHeartbeat(long now) {
this.lastHeartbeatSend = now;
this.heartbeatFailed = false;
} public void failHeartbeat() {
this.heartbeatFailed = true;
} public void receiveHeartbeat(long now) {
this.lastHeartbeatReceive = now;
} public boolean shouldHeartbeat(long now) {
return timeToNextHeartbeat(now) == 0;
} public long lastHeartbeatSend() {
return this.lastHeartbeatSend;
} public long timeToNextHeartbeat(long now) {
long timeSinceLastHeartbeat = now - Math.max(lastHeartbeatSend, lastSessionReset);
final long delayToNextHeartbeat;
if (heartbeatFailed)
delayToNextHeartbeat = retryBackoffMs;
else
delayToNextHeartbeat = heartbeatInterval; if (timeSinceLastHeartbeat > delayToNextHeartbeat)
return 0;
else
return delayToNextHeartbeat - timeSinceLastHeartbeat;
} public boolean sessionTimeoutExpired(long now) {
return now - Math.max(lastSessionReset, lastHeartbeatReceive) > sessionTimeout;
} public long interval() {
return heartbeatInterval;
} public void resetTimeouts(long now) {
this.lastSessionReset = now;
this.lastPoll = now;
this.heartbeatFailed = false;
} public boolean pollTimeoutExpired(long now) {
return now - lastPoll > maxPollInterval;
} }

join group 的处理逻辑:kafka.coordinator.group.GroupCoordinator#onCompleteJoin

kafka 心跳和 reblance的更多相关文章

  1. kafka consumer 分区reblance算法

    转载请注明原创地址 http://www.cnblogs.com/dongxiao-yang/p/6238029.html 最近需要详细研究下kafka reblance过程中分区计算的算法细节,网上 ...

  2. kafka consumer频繁reblance

    转载请注明地址http://www.cnblogs.com/dongxiao-yang/p/5417956.html 结论与下文相同,kafka不同topic的consumer如果用的groupid名 ...

  3. Kafka知识总结及面试题

    目录 概念 Kafka基础概念 命令行 Kafka 数据存储设计 kafka在zookeeper中存储结构 生产者 生产者设计 消费者 消费者设计 面试题 kafka设计 请说明什么是Apache K ...

  4. Kafka集成SparkStreaming

    Spark Streaming + Kafka集成指南 Kafka项目在版本0.8和0.10之间引入了一个新的消费者API,因此有两个独立的相应Spark Streaming包可用.请选择正确的包,  ...

  5. kafka Auto offset commit faild reblance

    今天在使用python消费kafka时遇到了一些问题, 特记录一下. 场景一. 特殊情况: 单独写程序只用来生产消费数据 开始时间: 10:42 Topic: t_facedec Partition: ...

  6. Kafka消费与心跳机制

    1.概述 最近有同学咨询Kafka的消费和心跳机制,今天笔者将通过这篇博客来逐一介绍这些内容. 2.内容 2.1 Kafka消费 首先,我们来看看消费.Kafka提供了非常简单的消费API,使用者只需 ...

  7. Kafka技术内幕 读书笔记之(四) 新消费者——心跳任务

    消费者拉取数据是在拉取器中完成的,发送心跳是在消费者的协调者上完成的,但并不是说拉取器和消费者的协调者就没有关联关系 . “消费者的协调者”的作用是确保客户端的消费者和服务端的协调者之间的正常通信,如 ...

  8. Kafka:主要参数详解(转)

    原文地址:http://kafka.apache.org/documentation.html ############################# System ############### ...

  9. Kafka主要参数详解(转)

    原文档地址:http://kafka.apache.org/documentation.html ############################# System ############## ...

随机推荐

  1. Matlab学以致用--模拟os任务装载情况

    2012-06-08 21:26:42 用matlab来建模,仿真不同时刻os task在队列中的装载情况.输入参数如下 作为初学者,M文件写的有点长.能实现功能就算学以致用了. clear;clc ...

  2. JS 全选、全不选、反选

    function checkReturn(obj) { var objIds = obj.value; //当没有选中某个子复选框时,checkboxall取消选中 if (!$("#sub ...

  3. php项目命名规范

    命名规范 ThinkPHP5遵循PSR-2命名规范和PSR-4自动加载规范,并且注意如下规范: 目录和文件 目录使用小写+下划线: 类库.函数文件统一以.php为后缀: 类的文件名均以命名空间定义,并 ...

  4. Springboot 配置类( @Configuration) 不能使用@Value注解从application.propertyes中加载值以及Environment为null解决方案

    最近遇到个场景,需要在使用@Bean注解定义bean的时候为对象设置一些属性,比如扫描路径,因为路径经常发布新特性的时候需要修改,所以就计划着放在配置文件中,然后通过@ConfigurationPro ...

  5. 算法(第四版)C# 习题题解——3.1

    写在前面 整个项目都托管在了 Github 上:https://github.com/ikesnowy/Algorithms-4th-Edition-in-Csharp 查找更方便的版本见:https ...

  6. gjt常用命令---chalee

    Git常用命令 一. git 基本操作流程 1. 从远程分支拉取并创建新的分支 git pull origin [远程分支名]:[本地分支名] // 从远程分支迁出本地分支,并切换到新的本地分支 gi ...

  7. NOIP 2017 逛公园 - 动态规划 - 最短路

    题目传送门 传送门 题目大意 给定一个$n$个点$m$条边的带权有向图,问从$1$到$n$的距离不超过最短路长度$K$的路径数. 跑一遍最短路. 一个点拆$K + 1$个点,变成一个DAG上路径计数问 ...

  8. Oracle使用——oracle11g安装——Oracle要求的结果: 5.0,5.1,5.2,6.0 6.1 之一 实际结果: 6.2

    问题 正在检查操作系统要求...        要求的结果: 5.0,5.1,5.2,6.0 之一        实际结果: 6.1        检查完成.此次检查的总体结果为: 失败 <&l ...

  9. The application to execute does not exist: 'C:\Users\Administrator\.dotnet\tools\.store\dotnet-aspnet-codegenerator\2.2.0-rtm-35687\dotnet-aspnet-codegenerator\2.2.0-rtm-35687\tools\netcoreapp2.1\any\

    vs code mvc搭建基架执行命令操作出现的问题解决方式重新复制拷贝一份2.2.0命名为2.2.0-rtm-35687, 修改

  10. BZOJ-1721|线性dp-缆车支柱

    Ski Lift 缆车支柱 Description Farmer Ron in Colorado is building a ski resort for his cows (though budge ...