How to manage and balance “Huge Data Load” for Big Kafka Clusters---reference
1. Add Partition Tool
Partitions act as unit of parallelism. Messages of a single topic are distributed to multiple partitions that can be stored and served on different servers. Upon creation of a topic, the number of partitions for this topic has to be specified. Later on more partitions may be needed for this topic when the volume of this topic increases. This tool helps to add more partitions for a specific topic and also allow manual replica assignment of the added partitions. You can refer to the previous blog Quick steps : Have a Kafka Cluster Up & Running in 3 minutes to setup kafka cluster and create topics.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
bin/kafka-add-partitions.sh Option Description ------ ----------- --partition <Integer: # of partitions> REQUIRED: Number of partitions to add to the topic --replica-assignment-list For manually assigning replicas to <broker_id_for_part1_replica1 : brokers for the new partitions broker_id_for_part1_replica2, (default: ) broker_id_for_part2_replica1 : broker_id_for_part2_replica2, ...> --topic <topic> REQUIRED: The topic for which partitions need to be added. --zookeeper <urls> REQUIRED: The connection string for the zookeeper connection in the form host:port. Multiple URLS can be given to allow fail-over. |
2. Reassign Partitions Tool
What does the tool do?
The goal of this tool is similar to the Referred Replica Leader Election Tool as to achieve load balance across brokers. But instead of only electing a new leader from the assigned replicas of a partition, this tool allows to change the assigned replicas of partitions – remember that followers also need to fetch from leaders in order to keep in sync, hence sometime only balance the leadership load is not enough.
A summary of the steps that the tool does is shown below -
1. The tool updates the zookeeper path "/admin/reassign_partitions" with the list of topic partitions and (if specified in the Json file) the list of their new assigned replicas.
2. The controller listens to the path above. When a data change update is triggered, the controller reads the list of topic partitions and their assigned replicas from zookeeper.
3. For each topic partition, the controller does the following:
3.1. Start new replicas in RAR - AR (RAR = Reassigned Replicas, AR = original list of Assigned Replicas)
3.2. Wait until new replicas are in sync with the leader
3.3. If the leader is not in RAR, elect a new leader from RAR
3.4 4. Stop old replicas AR - RAR
3.5. Write new AR
3.6. Remove partition from the /admin/reassign_partitions path
How to use the tool?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
|
bin/kafka-reassign-partitions.sh bin/kafka-reassign-partitions.sh Option Description ------ ----------- --broker-list <brokerlist> The list of brokers to which the partitions need to be reassigned in the form "0,1,2". This is required for automatic topic reassignment. --execute [execute] This option does the actual reassignment. By default, the tool does a dry run --manual-assignment-json-file <manual The JSON file with the list of manual assignment json file path> reassignmentsThis option or topics- to-move-json-file needs to be specified. The format to use is - {"partitions": [{"topic": "foo", "partition": 1, "replicas": [1,2,3] }], "version":1 } --topics-to-move-json-file <topics to The JSON file with the list of topics reassign json file path> to reassign.This option or manual- assignment-json-file needs to be specified. The format to use is - {"topics": [{"topic": "foo"},{"topic": "foo1"}], "version":1 } --zookeeper <urls> REQUIRED: The connection string for the zookeeper connection in the form host:port. Multiple URLS can be given to allow fail-over. |
3. Add Brokers(Cluster Expansion)
Cluster expansion involves including brokers with new broker ids in a Kafka 08 cluster. Typically, when you add new brokers to a cluster, they will not receive any data from existing topics until this tool is run to assign existing topics/partitions to the new brokers. The tool allows 2 options to make it easier to move some topics in bulk to the new brokers. These 2 options are a) topics to move b) list of newly added brokers. Using these 2 options, the tool automatically figures out the placements of partitions for the topics on the new brokers.
The following example moves 2 topics (foo1, foo2) to newly added brokers in a cluster (5,6,7).
1
2
3
4
5
6
7
|
> ./bin/kafka-reassign-partitions.sh --topics-to-move-json-file topics-to-move.json --broker-list "5,6,7" --execute > cat topics-to-move.json {"topics": [{"topic": "foo1"},{"topic": "foo2"}], "version":1 } |
Selectively moving some partitions to a broker
The partition movement tool can also be moved to selectively move some replicas for certain partitions over to a particular broker. Typically, if you end up with an unbalanced cluster, you can use the tool in this mode to selectively move partitions around. In this mode, the tool takes a single file which has a list of partitions to move and the replicas that each of those partitions should be assigned to.
The following example moves 1 partition (foo-1) from replicas 1,2,3 to 1,2,4
1
2
3
4
5
6
7
8
9
10
|
> ./bin/kafka-reassign-partitions.sh --manual-assignment-json-file partitions-to-move.json --execute > cat partitions-to-move.json {"partitions": [{"topic": "foo", "partition": 1, "replicas": [1,2,4] }], }], "version":1 } |
Note : These tools are available in version 0.8 , not prior versions.
How to manage and balance “Huge Data Load” for Big Kafka Clusters---reference的更多相关文章
- The package 'MySql.Data' tried to add a framework reference to 'System.Runtime' which was not found in the GAC
最近在学习Visual Studio连接mysql EF模型.在nuget中安装mysql.data时总是提示The package 'MySql.Data' tried to add a frame ...
- Automatically migrating data to new machines kafka集群扩充迁移topic
The partition reassignment tool can be used to move some topics off of the current set of brokers to ...
- kafaka quickstart
http://kafka.apache.org/ http://kafka.apache.org/downloads cd /root/kafuka/kafka_2.12-0.11.0.0 nohup ...
- load data(sql)
一般对于数据库表的插入操作,我们都会写程序执行插入sql,插入的数据少还可以,如果数据多了.执行效率上可能就不太理想了.load data语句用于高速地从一个文本文件中读取数据,装载到一个表中,相比于 ...
- How Network Load Balancing Technology Works--reference
http://technet.microsoft.com/en-us/library/cc756878(v=ws.10).aspx In this section Network Load Balan ...
- Transferring Data Between ASP.NET Web Pages
14 July 2012 20:24 http://www.mikesdotnetting.com/article/192/transferring-data-between-asp-net-web- ...
- Managing Spark data handles in R
When working with big data with R (say, using Spark and sparklyr) we have found it very convenient t ...
- Building the Unstructured Data Warehouse: Architecture, Analysis, and Design
Building the Unstructured Data Warehouse: Architecture, Analysis, and Design earn essential techniqu ...
- Android开发训练之第五章第三节——Transferring Data Without Draining the Battery
Transferring Data Without Draining the Battery GET STARTED DEPENDENCIES AND PREREQUISITES Android 2. ...
随机推荐
- [BZOJ 1009] [HNOI2008] GT考试 【AC自动机 + 矩阵乘法优化DP】
题目链接:BZOJ - 1009 题目分析 题目要求求出不包含给定字符串的长度为 n 的字符串的数量. 既然这样,应该就是 KMP + DP ,用 f[i][j] 表示长度为 i ,匹配到模式串第 j ...
- 为什么 API 监控对于任何业务来说都重要?
对于商务运算来说一个比较稳定的趋势在于对 API 日渐增长的依赖性,几乎每一个代码级交互过程都会调用 API 来收集数据或触发某些关键过程.没有 API ,你将无法与同伴进行文件交流,没有 API , ...
- 单位有b\B\K\M\G的相互转换
计算机存储计量单位 1. 计算机最小存储计量单位是:BIT(位) 2. 计算机最基本存储计量单位是:Bytes(字节) 3. Bit和Bytes的关系:8Bit=1Bytes 4. 其他常用单位:1K ...
- 使用GO语言灵活批量ssh登录服务器执行操作
摘要: 在工作中时常需要登录服务器做一系列操作,每次输入ssh xxx总是很麻烦.这时候为什么不考虑写一个通用的小脚本呢? go语言是一门新兴语言,能够在很多地方发挥总用.初学go语言,做了这么一个小 ...
- elevation 和 translationZ的区别
Z轴阴影: Z = elevation + translationZ elevation 是静态值,是View在Z轴上的初始值 translationZ是动态值,是Z上的偏移变化 参考 http:// ...
- 深入详解SQL中的Null
深入详解SQL中的Null NULL 在计算机和编程世界中表示的是未知,不确定.虽然中文翻译为 “空”, 但此空(null)非彼空(empty). Null表示的是一种未知状态,未来状态,比如小明兜里 ...
- 使用GDI+轻松创建缩略图
Gdi+ 还是相当好用的. 1> Image保存图像,需要一个CLSID的参数,它可以这样获得: int GetEncoderClsid(const WCHAR* format, ...
- xcode duplicate symbol _GAD_MD5 解决方法
添加了mobi的广告平台后,在Device状态打包时,出现此错误. duplicate symbol _GAD_MD5 in: 解决方法: Targets ->Build Setting 中设 ...
- 【转】windows7 64位系统认不出8g内存显示只有3G可用
原文网址:http://www.jb51.neos/windows/93721.html 我的电脑安装的是Win7 64位系统,当时内存是用的8G的,系统里面显示出来只有3.00G可用,真是崩溃啊 ...
- Modifying the ASP.NET Request Queue Limit
Modifying the ASP.NET Request Queue Limit When ASP.NET is queried, the request for service is carrie ...