1. Add Partition Tool

Partitions act as unit of parallelism. Messages of a single topic are distributed to multiple partitions that can be stored and served on different servers. Upon creation of a topic, the number of partitions for this topic has to be specified. Later on more partitions may be needed for this topic when the volume of this topic increases. This tool helps to add more partitions for a specific topic and also allow manual replica assignment of the added partitions. You can refer to the previous blog Quick steps : Have a Kafka Cluster Up & Running in 3 minutes to setup kafka cluster and create topics.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
bin/kafka-add-partitions.sh
 
Option                                                    Description
------                                                       -----------
--partition <Integer: # of partitions>      REQUIRED: Number of partitions to add
to the topic
 
--replica-assignment-list                       For manually assigning replicas to
<broker_id_for_part1_replica1 :            brokers for the new partitions
broker_id_for_part1_replica2,              (default: )
broker_id_for_part2_replica1 :
broker_id_for_part2_replica2, ...>
 
--topic <topic>                                      REQUIRED: The topic for which
partitions need to be added.
 
--zookeeper <urls>                               REQUIRED: The connection string for
the zookeeper connection in the form
host:port. Multiple URLS can be
given to allow fail-over.

2. Reassign Partitions Tool

What does the tool do?

The goal of this tool is similar to the Referred Replica Leader Election Tool as to achieve load balance across brokers. But instead of only electing a new leader from the assigned replicas of a partition, this tool allows to change the assigned replicas of partitions – remember that followers also need to fetch from leaders in order to keep in sync, hence sometime only balance the leadership load is not enough.

A summary of the steps that the tool does is shown below -

1. The tool updates the zookeeper path "/admin/reassign_partitions" with the list of topic partitions and (if specified in the Json file) the list of their new assigned replicas.
2. The controller listens to the path above. When a data change update is triggered, the controller reads the list of topic partitions and their assigned replicas from zookeeper.
3. For each topic partition, the controller does the following:
3.1. Start new replicas in RAR - AR (RAR = Reassigned Replicas, AR = original list of Assigned Replicas)
3.2. Wait until new replicas are in sync with the leader
3.3. If the leader is not in RAR, elect a new leader from RAR
3.4 4. Stop old replicas AR - RAR
3.5. Write new AR
3.6. Remove partition from the /admin/reassign_partitions path

How to use the tool?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
bin/kafka-reassign-partitions.sh
 
bin/kafka-reassign-partitions.sh
 
Option                                                        Description
------                                                           -----------
--broker-list <brokerlist>                            The list of brokers to which the
partitions need to be reassigned in
the form "0,1,2". This is required
for automatic topic reassignment.
--execute [execute]                                   This option does the actual
reassignment. By default, the tool
does a dry run
--manual-assignment-json-file <manual                 The JSON file with the list of manual
assignment json file path>                          reassignmentsThis option or topics-
to-move-json-file needs to be
specified. The format to use is -
{"partitions":
[{"topic": "foo",
"partition": 1,
"replicas": [1,2,3] }],
"version":1
}
--topics-to-move-json-file <topics to                The JSON file with the list of topics
reassign json file path>                           to reassign.This option or manual-
assignment-json-file needs to be
specified. The format to use is -
{"topics":
[{"topic": "foo"},{"topic": "foo1"}],
"version":1
}
--zookeeper <urls>                                   REQUIRED: The connection string for
the zookeeper connection in the form
host:port. Multiple URLS can be
given to allow fail-over.

3.  Add Brokers(Cluster Expansion)

Cluster expansion involves including brokers with new broker ids in a Kafka 08 cluster. Typically, when you add new brokers to a cluster, they will not receive any data from existing topics until this tool is run to assign existing topics/partitions to the new brokers. The tool allows 2 options to make it easier to move some topics in bulk to the new brokers. These 2 options are a) topics to move b) list of newly added brokers. Using these 2 options, the tool automatically figures out the placements of partitions for the topics on the new brokers.

The following example moves 2 topics (foo1, foo2) to newly added brokers in a cluster (5,6,7).

1
2
3
4
5
6
7
> ./bin/kafka-reassign-partitions.sh --topics-to-move-json-file topics-to-move.json --broker-list "5,6,7" --execute
 
>  cat topics-to-move.json
{"topics":
[{"topic": "foo1"},{"topic": "foo2"}],
"version":1
}

Selectively moving some partitions to a broker

The partition movement tool can also be moved to selectively move some replicas for certain partitions over to a particular broker. Typically, if you end up with an unbalanced cluster, you can use the tool in this mode to selectively move partitions around. In this mode, the tool takes a single file which has a list of partitions to move and the replicas that each of those partitions should be assigned to.

The following example moves 1 partition (foo-1) from replicas 1,2,3 to 1,2,4

1
2
3
4
5
6
7
8
9
10
> ./bin/kafka-reassign-partitions.sh --manual-assignment-json-file partitions-to-move.json --execute
 
> cat partitions-to-move.json
{"partitions":
[{"topic": "foo",
"partition": 1,
"replicas": [1,2,4] }],
}],
"version":1
}

Note : These tools are available in version 0.8 , not prior versions.

Be Sociable, Share!
http://xebee.xebia.in/index.php/2014/12/04/how-to-manage-and-balance-huge-data-load-for-big-kafka-clusters/

How to manage and balance “Huge Data Load” for Big Kafka Clusters---reference的更多相关文章

  1. The package 'MySql.Data' tried to add a framework reference to 'System.Runtime' which was not found in the GAC

    最近在学习Visual Studio连接mysql EF模型.在nuget中安装mysql.data时总是提示The package 'MySql.Data' tried to add a frame ...

  2. Automatically migrating data to new machines kafka集群扩充迁移topic

    The partition reassignment tool can be used to move some topics off of the current set of brokers to ...

  3. kafaka quickstart

    http://kafka.apache.org/ http://kafka.apache.org/downloads cd /root/kafuka/kafka_2.12-0.11.0.0 nohup ...

  4. load data(sql)

    一般对于数据库表的插入操作,我们都会写程序执行插入sql,插入的数据少还可以,如果数据多了.执行效率上可能就不太理想了.load data语句用于高速地从一个文本文件中读取数据,装载到一个表中,相比于 ...

  5. How Network Load Balancing Technology Works--reference

    http://technet.microsoft.com/en-us/library/cc756878(v=ws.10).aspx In this section Network Load Balan ...

  6. Transferring Data Between ASP.NET Web Pages

    14 July 2012 20:24 http://www.mikesdotnetting.com/article/192/transferring-data-between-asp-net-web- ...

  7. Managing Spark data handles in R

    When working with big data with R (say, using Spark and sparklyr) we have found it very convenient t ...

  8. Building the Unstructured Data Warehouse: Architecture, Analysis, and Design

    Building the Unstructured Data Warehouse: Architecture, Analysis, and Design earn essential techniqu ...

  9. Android开发训练之第五章第三节——Transferring Data Without Draining the Battery

    Transferring Data Without Draining the Battery GET STARTED DEPENDENCIES AND PREREQUISITES Android 2. ...

随机推荐

  1. 构建简单的 C++ 服务组件,第 1 部分: 服务组件体系结构 C++ API 简介

    构建简单的 C++ 服务组件,第 1 部分: 服务组件体系结构 C++ API 简介 熟悉将用于 Apache Tuscany SCA for C++ 的 API.您将通过本文了解该 API 的主要组 ...

  2. OA学习笔记-008-岗位管理Action层实现

    一.分析 1,设计实体/表 设计实体 --> JavaBean --> hbm.xml --> 建表 2,分析有几个功能,对应几个请求. 3,实现功能: 1,写Action类,写Ac ...

  3. [cocos2d] 显示状态与文字

    前言: 对于显示数值与文字一般有三种方式可以使用: CCLabelTTF .CCLabelBMFont .CCLabelAtlas 详细区别可参考:cocos2d 中添加显示文字的三种方式(CCLab ...

  4. u-boot使用

    下载与烧写 使用U-boot将映像文件烧写到板上的Flash,一般步骤是: (1)通过网络.串口.U盘.SD卡等方式将文件传输到SDRAM: (2)使用Nand Flash或Nor Flash相关的读 ...

  5. USACO3.41Closed Fences(几何)

    这题水的真不易..300多行 累死了 对着数据查错啊 枚举每个边上的点到源点 是否中间隔着别的边  每条边划分500份就够了  注意一下与源点在一条直线上的边不算 几何 啊,,好繁琐 参考各种模版.. ...

  6. BZOJ_1208_&_Codevs_1258_[HNOI2004]_宠物收养所_(平衡树/set)

    描述 http://www.lydsy.com/JudgeOnline/problem.php?id=1208 (据说codevs要更新?就不放codevs的地址了吧...) 有宠物和人,每个单位都有 ...

  7. (转载)php循环检测目录是否存在并创建(循环创建目录)

    (转载)http://www.jb51.net/article/25917.htm php循环检测目录是否存在并创建,需要的朋友可以参考下. 循环创建目录方法 这个会生成image.gif目录 代码如 ...

  8. PowerDesigner 怎么给 Table Properties 增加注释和默认值

    1.  选中表,右键 2. 选中“comment”, 这个就是列的注释 3.还是这个页面 ,往下有个“default value”, 这个就是你设置的默认值. 4. 这个是怎么设置默认值.

  9. sort()函数与qsort()函数及其头文件

    sort()函数与qsort()函数及其头文件 sort()函数是C++中的排序函数其头文件为:#include<algorithm>头文件: qsort()是C中的排序函数,其头文件为: ...

  10. ios7 uuid的获取方法

    ios7后mac地址沦为鸡肋,所以必须得重新想办法获取设备的id信息,apple推荐用UUID,但app重新安装后,UUID需要重设,所以想到把UUID存储到ios系统的keychain中,既然存储在 ...