Why is it that consumers connect to zookeeper to retrieve the partition locations? And kafka producers have to connect to one of the brokers to retrieve metadata.

My point is, what exactly is the use of zookeeper when every broker already has all the necessary metadata to tell producers the location to send their messages? Couldn't the brokers send this same information to the consumers?

I can understand why brokers have the metadata, to not have to make a connection to zookeeper each time a new message is sent to them. Is there a function that zookeeper has that I'm missing? I'm finding it hard to think of a reason why zookeeper is really needed within a kafka cluster.

asked Jan 13 '15 at 8:49
Luckl507

8516
 

2 Answers

First of all, zookeeper is needed only for high level consumer. SimpleConsumer does not require zookeeper to work.

The main reason zookeeper is needed for a high level consumer is to track consumed offsets and handle load balancing.

Now in more detail.

Regarding offset tracking, imagine following scenario: you start a consumer, consume 100 messages and shut the consumer down. Next time you start your consumer you'll probably want to resume from your last consumed offset (which is 100), and that means you have to store the maximum consumed offset somewhere. Here's where zookeeper kicks in: it stores offsets for every group/topic/partition. So this way next time you start your consumer it may ask "hey zookeeper, what's the offset I should start consuming from?". Kafka is actually moving towards being able to store offsets not only in zookeeper, but in other storages as well (for now only zookeeper and kafka offset storages are available and i'm not sure kafka storage is fully implemented).

Regarding load balancing, the amount of messages produced can be quite large to be handled by 1 machine and you'll probably want to add computing power at some point. Lets say you have a topic with 100 partitions and to handle this amount of messages you have 10 machines. There are several questions that arise here actually:

  • how should these 10 machines divide partitions between each other?
  • what happens if one of machines die?
  • what happens if you want to add another machine?

And again, here's where zookeeper kicks in: it tracks all consumers in group and each high level consumer is subscribed for changes in this group. The point is that when a consumer appears or disappears, zookeeper notifies all consumers and triggers rebalance so that they split partitions near-equally (e.g. to balance load). This way it guarantees if one of consumer dies others will continue processing partitions that were owned by this consumer.

answered Jan 13 '15 at 9:46
serejja

10k22749
 
1  
Thanks for the answer, this clears it up, it's what i guessed but i couldn't find it anywhere. I also just read that version 0.9 the consumers will no longer use zookeeper, and it is only used by the brokers for leader election etc. – Luckl507 Jan 13 '15 at 9:56 

With kafka 0.9+ the new Consumer API was introduced. New consumers do not need connection to Zookeeper since group balancing is provided by kafka itself.

Why do Kafka consumers connect to zookeeper, and producers get metadata from brokers?的更多相关文章

  1. kafka集群和zookeeper集群的部署,kafka的java代码示例

    来自:http://doc.okbase.net/QING____/archive/19447.html 也可参考: http://blog.csdn.net/21aspnet/article/det ...

  2. 使用不同的namespace让不同的kafka/Storm连接同一个zookeeper

    背景介绍: 需要部署2个kafka独立环境,但是只有一个zookeeper集群. 需要部署2个独立的storm环境,但是只有一个zookeeper集群. ----------------------- ...

  3. org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within

    org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeo ...

  4. Unable to connect to zookeeper server within timeout: 5000

    错误 严重: StandardWrapper.Throwable org.springframework.beans.factory.BeanCreationException: Error crea ...

  5. CentOS7 搭建Kafka(一)zookeeper篇

    CentOS7 搭建Kafka(一)zookeeper篇 近几年当红小生Kafka备受各路英雄好汉追捧,一点不比老前辈RabbitMQ和ActiveMQ差,因为流行,所以你就得学啊:我这么懒,肯定是不 ...

  6. Caused by: org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 5000

    org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'brandControl ...

  7. apache kafka系列之在zookeeper中存储结构

    1.topic注册信息 /brokers/topics/[topic] : 存储某个topic的partitions所有分配信息 Schema:   {    "version": ...

  8. Kafka学习之(五)搭建kafka集群之Zookeeper集群搭建

    Zookeeper是一种在分布式系统中被广泛用来作为:分布式状态管理.分布式协调管理.分布式配置管理.和分布式锁服务的集群.kafka增加和减少服务器都会在Zookeeper节点上触发相应的事件kaf ...

  9. kafka集群与zookeeper集群 配置过程

    Kafka的集群配置一般有三种方法,即 (1)Single node – single broker集群: (2)Single node – multiple broker集群:    (3)Mult ...

随机推荐

  1. c#金额转换成中文大写金额

    2018-08-24 转别人 c#金额转换成中文大写金额 /// <summary> /// 金额转换成中文大写金额 /// </summary> /// <param ...

  2. 12个敏捷过程的小提示Tips

    12个敏捷过程的小提示Tips 1. 可视化一切. 在团队里使用Scrum白板.同时走廊过道上也会挂上显示信息的白板,这些信息可以是公司战略.软件缺陷等等.可视化的好处是,员工经过这些白板时,能够了解 ...

  3. react学习(二)之通信篇

    react性能提升原理:虚拟DOM react把真是的DOM tree,转化成virtual DOM,每次数据更新后,重新计算virtual DOM并与上一次的作对比,然后对发生改变的部分进行批量更新 ...

  4. js 格式化数字(每三位加逗号)

    // 方法一 unction toThousands(num) { var result = [ ], counter = 0; num = (num || 0).toString().split(' ...

  5. [待优化笔记]原生JS实现验证框架 checkFun

    ;(function(){ /** 验证框架 checkFun * 使用方法: * <input class="required" type="text" ...

  6. [工具配置]requirejs 多页面,多入口js文件打包总结

    需要明确以下几点: 1.本地前端调试代码肯定是调用原始的路径以及代码,但是线上运行的肯定是通过打包后的另一个路径,这儿就是生成的dist文件夹了. 2.requirejs的引入,线上跟线下的路径怎么控 ...

  7. win10电脑怎么录制视频 电脑录制视频软件

    win10电脑怎么录制视频?相信不少网友正在面临这个疑惑.现如今是网络信息科技时代,快速传播信息的途径和方式有很多种.其中,通过录制电脑视频,可以制作视频教程.游戏解说,还可以录制在线视频存储影视资源 ...

  8. Wu反走样算法绘制圆(C++/MFC实现)

    Wu反走样圆 原理:参考Bresenham算法,在主位移过程中计算出离理想圆最近的两个点,赋予不同的亮度值,绘制像素点即可! MFC 中CXXXView类中添加函数: //Wu算法画反走样圆 void ...

  9. spark RDD,reduceByKey vs groupByKey

    Spark中有两个类似的api,分别是reduceByKey和groupByKey.这两个的功能类似,但底层实现却有些不同,那么为什么要这样设计呢?我们来从源码的角度分析一下. 先看两者的调用顺序(都 ...

  10. SparkSQL【1.x版本】字段敏感不敏感问题

    一.特征 1.SqlContext默认大小写不敏感,如果DataFrame中有字段相同,大小写不同,在使用字段的时候不会引起歧义. 2.HiveContext提供更多的Hive预置函数,可以更高效的进 ...