kafka rebalance 部分分区没有owner

转发请注明原创地址http://www.cnblogs.com/dongxiao-yang/p/6234673.html

最近业务同学反馈kafka上线的时候某个topic的部分分区一直没有owner注册上，监控界面形式如图，其中分区5和7无法被消费者注册到，重启客户端程序rebalance依旧是这两个分区没有被消费。

由于最近业务方机房大迁移，第一反应是网络连通性，但是消费端程序挨个测试网络没有问题，而且即使通过增加或者减少consumer数量，甚至消费端只开一个客户端，rebalance结束后依然会有分区没有owner，而且随着消费端个数的变化，无owner的分区号也发生了变化，整个rebalance过程客户端程序没有任何错误日志。

这种情况还得去过客户端日志，在只起了两个客户端的时候发现有这么一段：

16/12/29 15:52:56 INFO consumer.ZookeeperConsumerConnector: [xxx], Consumer xxx rebalancing the following partitions: ArrayBuffer(5, 7, 3, 8, 1, 4, 6, 2, 0, 9) for topic onlineAdDemographicPredict with consumers: List(aaa-0, yyy-0, xxx-0)
16/12/29 15:52:56 INFO consumer.ZookeeperConsumerConnector: [xxx], xxx-0 attempting to claim partition 2
16/12/29 15:52:56 INFO consumer.ZookeeperConsumerConnector: [xxx], xxx-0 attempting to claim partition 0
16/12/29 15:52:56 INFO consumer.ZookeeperConsumerConnector: [xxx], xxx-0 attempting to claim partition 9
16/12/29 15:52:56 INFO consumer.ZookeeperConsumerConnector: [xxx], xxx-0 successfully owned partition 0 for topic onlineAdDemographicPredict
16/12/29 15:52:56 INFO consumer.ZookeeperConsumerConnector: [xxx], xxx-0 successfully owned partition 9 for topic onlineAdDemographicPredict
16/12/29 15:52:56 INFO consumer.ZookeeperConsumerConnector: [xxx], xxx-0 successfully owned partition 2 for topic onlineAdDemographicPredict

ArrayBuffer里分区10个分区都全了说明客户端读取所有Partirtion个数是没有问题的，出问题的是with consumers: List（）这个信息，此时业务方只起了xxx和yyy两个客户端，

但是Consumer确拿到了三个client-id，然后经过计算自己正好需要注册三个分区2，0，9，剩下的分区就没人认领了。

查找日志对应kafka源码如下

class RangeAssignor() extends PartitionAssignor with Logging {

  def assign(ctx: AssignmentContext) = {

    val partitionOwnershipDecision = collection.mutable.Map[TopicAndPartition, ConsumerThreadId]()

    for ((topic, consumerThreadIdSet) <- ctx.myTopicThreadIds) {

      val curConsumers = ctx.consumersForTopic(topic)

      val curPartitions: Seq[Int] = ctx.partitionsForTopic(topic)

      val nPartsPerConsumer = curPartitions.size / curConsumers.size

      val nConsumersWithExtraPart = curPartitions.size % curConsumers.size

      info("Consumer " + ctx.consumerId + " rebalancing the following partitions: " + curPartitions +

        " for topic " + topic + " with consumers: " + curConsumers)

      for (consumerThreadId <- consumerThreadIdSet) {

        val myConsumerPosition = curConsumers.indexOf(consumerThreadId)

        assert(myConsumerPosition >= )

        val startPart = nPartsPerConsumer * myConsumerPosition + myConsumerPosition.min(nConsumersWithExtraPart)

        val nParts = nPartsPerConsumer + (if (myConsumerPosition +  > nConsumersWithExtraPart)  else )

        /**

         *   Range-partition the sorted partitions to consumers for better locality.

         *  The first few consumers pick up an extra partition, if any.

         */

        if (nParts <= )

          warn("No broker partitions consumed by consumer thread " + consumerThreadId + " for topic " + topic)

        else {

          for (i <- startPart until startPart + nParts) {

            val partition = curPartitions(i)

            info(consumerThreadId + " attempting to claim partition " + partition)

            // record the partition ownership decision

            partitionOwnershipDecision += (TopicAndPartition(topic, partition) -> consumerThreadId)

          }

        }

      }

    }

    partitionOwnershipDecision

  }

}

object PartitionAssignor {

  def createInstance(assignmentStrategy: String) = assignmentStrategy match {

    case "roundrobin" => new RoundRobinAssignor()

    case _ => new RangeAssignor()

  }

}

class AssignmentContext(group: String, val consumerId: String, excludeInternalTopics: Boolean, zkClient: ZkClient) {

  val myTopicThreadIds: collection.Map[String, collection.Set[ConsumerThreadId]] = {

    val myTopicCount = TopicCount.constructTopicCount(group, consumerId, zkClient, excludeInternalTopics)

    myTopicCount.getConsumerThreadIdsPerTopic

  }

  val partitionsForTopic: collection.Map[String, Seq[Int]] =

    ZkUtils.getPartitionsForTopics(zkClient, myTopicThreadIds.keySet.toSeq)

  val consumersForTopic: collection.Map[String, List[ConsumerThreadId]] =

    ZkUtils.getConsumersPerTopic(zkClient, group, excludeInternalTopics)

  val consumers: Seq[String] = ZkUtils.getConsumersInGroup(zkClient, group).sorted

}

class ZKGroupDirs(val group: String) {

  def consumerDir = ConsumersPath

  def consumerGroupDir = consumerDir + "/" + group

  def consumerRegistryDir = consumerGroupDir + "/ids"

  def consumerGroupOffsetsDir = consumerGroupDir + "/offsets"

  def consumerGroupOwnersDir = consumerGroupDir + "/owners"

}

  def getConsumersPerTopic(group: String, excludeInternalTopics: Boolean): mutable.Map[String, List[ConsumerThreadId]] = {

    val dirs = new ZKGroupDirs(group)

    val consumers = getChildrenParentMayNotExist(dirs.consumerRegistryDir)

    val consumersPerTopicMap = new mutable.HashMap[String, List[ConsumerThreadId]]

    for (consumer <- consumers) {

      val topicCount = TopicCount.constructTopicCount(group, consumer, this, excludeInternalTopics)

      for ((topic, consumerThreadIdSet) <- topicCount.getConsumerThreadIdsPerTopic) {

        for (consumerThreadId <- consumerThreadIdSet)

          consumersPerTopicMap.get(topic) match {

            case Some(curConsumers) => consumersPerTopicMap.put(topic, consumerThreadId :: curConsumers)

            case _ => consumersPerTopicMap.put(topic, List(consumerThreadId))

          }

      }

    }

    for ( (topic, consumerList) <- consumersPerTopicMap )

      consumersPerTopicMap.put(topic, consumerList.sortWith((s,t) => s < t))

    consumersPerTopicMap

  }

 def constructTopicCount(group: String, consumerId: String, zkUtils: ZkUtils, excludeInternalTopics: Boolean) : TopicCount = {

    val dirs = new ZKGroupDirs(group)

    val topicCountString = zkUtils.readData(dirs.consumerRegistryDir + "/" + consumerId)._1

    var subscriptionPattern: String = null

    var topMap: Map[String, Int] = null

    try {

      Json.parseFull(topicCountString) match {

        case Some(m) =>

          val consumerRegistrationMap = m.asInstanceOf[Map[String, Any]]

          consumerRegistrationMap.get("pattern") match {

            case Some(pattern) => subscriptionPattern = pattern.asInstanceOf[String]

            case None => throw new KafkaException("error constructing TopicCount : " + topicCountString)

          }

          consumerRegistrationMap.get("subscription") match {

            case Some(sub) => topMap = sub.asInstanceOf[Map[String, Int]]

            case None => throw new KafkaException("error constructing TopicCount : " + topicCountString)

          }

        case None => throw new KafkaException("error constructing TopicCount : " + topicCountString)

      }

    } catch {

      case e: Throwable =>

通过上面着色的代码一路跟下来，可以看出来Consumer获取group所有客户端数量逻辑是读取zk上 /kafkachroot/consumers/{groupid}/ids路径下

所有存在的consumerid，然后读取这些consumerid对应的topic信息，最终返回一个[topic, List[ConsumerThreadId]] 的二维数组。

于是跑到zk上看节点结构，发现在出问题的group/ids 路径下果然存在aaa这个临时节点，通知应用方发现原来有个很老的程序之前也用同样的groupid消费过这个topic，但是现在业务程序很久没人管处在一个半假死的状态，所以这个临时节点一直不过期，导致后来使用同样group消费同样的每次都会感知到一个多余的消费段存在，所以每次都有部分分区无法被消费。

附：

1 Consumer Rebalance的算法

2 本文讨论的版本建立在kafka 0.8.2-beta版本前提上，新出的版本目前没有研究，可能情况不符。

kafka rebalance 部分分区没有owner的更多相关文章

kafka rebalance解决方案 -incremental cooperative协议和static membership功能
apache kafka的重平衡(rebalance),一直以来都为人诟病.因为重平衡过程会触发stop-the-world(STW),此时对应topic的资源都会处于不可用的状态.小规模的集群还好, ...
（一）kafka修改topic分区的位置
(一)kafka修改topic分区的位置环境:kafka_2.10-0.8.2.1 + JDK1.7.0_80 1. 查看分区topic的分区分布 $ le-kafka-topics.sh --de ...
kafka partition（分区）与 group
kafka partition(分区)与 group 一. 1.原理图 2.原理描述一个topic 可以配置几个partition,produce发送的消息分发到不同的partition中,co ...
玩转Kafka的生产者——分区器与多线程
上篇文章学习kafka的基本安装和基础概念,本文主要是学习kafka的常用API.其中包括生产者和消费者, 多线程生产者,多线程消费者,自定义分区等,当然还包括一些避坑指南. 首发于个人网站:链接地址 ...
kafka之partition分区及副本replica升级
修改kafka的partition分区 bin/kafka-topics.sh --zookeeper datacollect-2:2181 --alter --partitions 3 --topi ...
【Kafka】数据分区策略
数据分区策略四种策略一.指定分区号,数据会直接发送到所指定的分区二.没有指定分区号,指定了数据的key,可以通过key获取hashCode决定数据发送到哪个分区三.都没有指定的话,会采取rou ...
Kafka Rebalance机制分析
什么是 Rebalance Rebalance 本质上是一种协议,规定了一个 Consumer Group 下的所有 consumer 如何达成一致,来分配订阅 Topic 的每个分区. 例如:某 G ...
什么是 Kafka Rebalance 以及关于 Rebalance Kafka-Python 社区客户端应该关注的地方
什么是 Rebalance? Rebalance 为什么会发生?Rebalance 的情况下 consumer 是否还能正确消费消息呢? 记得之前在一段时间密集面试的时候总会问候选人这些问题. 重平衡 ...
Kafka Rebalance机制和选举策略总结
自建博客地址:https://www.bytelife.net,欢迎访问! 本文为博客同步发表文章,为了更好的阅读体验,建议您移步至我的博客本文作者: Jeffrey 本文链接: https://w ...

随机推荐

h5添加音乐
http://changziming.com/post-209.html 加入HTML代码,因为是绑定在每一页的右上方(或者其他位置),定位用了fixed,所以在页面底部/body之前加上html代码 ...
支付宝开发（一）－认识php openssl RSA 非对称加密实现
获取支付宝公钥本地服务器生成私钥和公钥运用php中openssl相关函数加密解密验证身份以下是php中openssl相关函数实现的验证,来自php官方demo //需要签名的数据 $data = ...
HTML 表格的书写方式：
首先要进行reset table{border-collapse:collapse;border-spacing:0;}th{text-align:inherit;} 1. caption标签对整个 ...
PHP接口(interface)和抽象类(abstract)
interface 定义了一个接口类,它里面的方法其子类必须实现.接口是类的一个模板,其子类必须实现接口中定义的所有方法. interface User{ function getHeight ...
限制apache错误日志大小
①配置错误日志在http.conf配置: ErrorLog "| /opt/lampp/bin/rotatelogs /opt/lampp/logs/%Y_%m_%d_error_log ...
block的用法和循环引用
一.block在OC中的用法可以分为大概一下几种. 1>用于成员属性,保存一段代码,可以替代代理传值. 比如说,创建一个ViewController控制器,点击屏幕就跳转到ModalViewCo ...
php常用函数集
网络请求: /** * 发起HTTPS请求 */ function curl_post($url,$data=null,$header=null,$post=0) { //初始化curl $ch = ...
smarty 比较运算符对照表
smarty 比较运算符对照表运算符中文解释 eq 相等 ne.neq 不相等 gt 大于 lt 小于 gte.ge 大于等于 lte.le 小于等于 not 非 mod 求模 is [not] ...
单片机系统与标准PC键盘的接口模块设计
转自单片机系统与标准PC键盘的接口模块设计概述在单片机系统中,当输入按键较多时,在硬件设计和软件编程之间总存在着矛盾.对于不同的单片机系统需要进行专用的键盘硬件设计和编程调试,通用性差,使 ...
Unity3D插件之Easy Touch 3.1(1): Easy Joystick
先看官方介绍:https://www.assetstore.unity3d.com/#/content/3322 (Allows you to quickly and easily develop a ...

kafka rebalance 部分分区没有owner

kafka rebalance 部分分区没有owner的更多相关文章

随机推荐

热门专题