Kakfa
Kakfa分布式集群搭建
本位以最新版本kafka_2.11-0.10.1.0版本讲述分布式kafka集群环境的搭建过程。服务器列表:
1
2
3
|
172.31.10.1 172.31.10.2 172.31.10.3 |
1.下载kafka安装包
登录kafka官网http://kafka.apache.org/,
- 单击左侧“Download”按钮
- 选择对应的版本,版本2.11代表scala版本(kafka是由scala编写的),0.10.1.0代表kafka的版本
- 在弹出的窗口中选择下载链接即可
2.下载zookeeper安装包
kafka整体架构如下:
而kafka集群通常会依赖zookeeper的命名服务,单机版的可以直接用kafka安装包的zookeeper,而通常生产环境为保证命名服务的可用性,一般会单独搭建zookeeper集群。服务器不足可以直接和kafka broker共用服务器,zookeeper命名服务队资源要求不高。
登录zookeeper官网http://www.apache.org/dyn/closer.cgi/zookeeper/,一路选择download下载即可,本文选择稳定版zookeeper-3.4.8
3.安装zookeeper集群
将安装包zookeeper-3.4.8.tar上传至服务器172.31.10.1,
- 解压,目录/opt/zookeeper/zookeeper-3.4.8
1
tar -zxvf zookeeper-3.4.8.tar
- 配置,切换到conf目录,并更改dataDir和server.x
12
cd /opt/zookeeper/zookeeper-3.4.8/conf
mv zoo_sample.cfg zoo.cfg
更改后的zoo.cfg配置如下:
12345678910111213141516171819202122232425262728293031# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/
var
/logs/data/zookeeper
# the port at which the clients will connect
clientPort=2181
server.1=172.31.10.1:2888:3888
server.2=172.31.10.2:2888:3888
server.3=172.31.10.3:2888:3888
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
其中dataDir为zookeeper目录,server.x为zookeeper服务器列表的地址和通信端口
- 远程复制到其他两台服务器,并在dataDir目录下创建myid文件,内容为server.x中的数字。本文设置如下:
1
2
3
4
5
6
7
8
9
10
11
|
#172.31.10.1执行 cd / var /logs/data/zookeeper echo "1" > / var /logs/data/zookeeper/myid #172.31.10.2执行 cd / var /logs/data/zookeeper echo "2" > / var /logs/data/zookeeper/myid #172.31.10.3执行 cd / var /logs/data/zookeeper echo "3" > / var /logs/data/zookeeper/myid |
- 启动zookeeper集群和验证
1
2
3
4
5
6
7
|
#在每台服务器上启动zookeeper cd /opt/zookeeper/zookeeper-3.4.8/bin /opt/zookeeper/zookeeper-3.4.8/bin/zkServer.sh start #查看服务器上zookeeper节点角色 cd /opt/zookeeper/zookeeper-3.4.8/bin /opt/zookeeper/zookeeper-3.4.8/bin/zkServer.sh status |
4.安装kafka集群
- 解压,到/opt/kafka/kafka_2.11-0.10.1.0
1
2
|
tar -zxvf kafka_2.11-0.10.1.0.tgz cd /opt/kafka/kafka_2.11-0.10.1.0 |
- 更改conf/server.properties配置,主要是更改如下几项:
- 1234
broker.id=1
host.name=172.31.10.1
log.dirs=/
var
/logs/data/kafka
zookeeper.connect=172.31.10.1:2181,172.31.10.2:2181,172.31.10.2:2181/kafka
注意每台服务器上的broker.id均不同,需要保证整个集群中唯一性
更改后的server.properties如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
|
############################# Server Basics ############################# # The id of the broker. This must be set to a unique integer for each broker. broker.id=1 # The port the socket server listens on port=9092 # Hostname the broker will bind to. If not set, the server will bind to all interfaces host.name=172.31.10.1 # Switch to enable topic deletion or not, default value is false #delete.topic.enable=true ############################# Socket Server Settings ############################# # The address the socket server listens on. It will get the value returned from # java.net.InetAddress.getCanonicalHostName() if not configured. # FORMAT: # listeners = security_protocol://host_name:port # EXAMPLE: # listeners = PLAINTEXT://your.host.name:9092 #listeners=PLAINTEXT://:9092 # Hostname and port the broker will advertise to producers and consumers. If not set, # it uses the value for "listeners" if configured. Otherwise, it will use the value # returned from java.net.InetAddress.getCanonicalHostName(). #advertised.listeners=PLAINTEXT://your.host.name:9092 # The number of threads handling network requests num.network.threads=3 # The number of threads doing disk I/O num.io.threads=8 # The send buffer (SO_SNDBUF) used by the socket server socket.send.buffer.bytes=102400 # The receive buffer (SO_RCVBUF) used by the socket server socket.receive.buffer.bytes=102400 # The maximum size of a request that the socket server will accept (protection against OOM) socket.request.max.bytes=104857600 ############################# Log Basics ############################# # A comma seperated list of directories under which to store log files log.dirs=/ var /logs/data/kafka # The default number of log partitions per topic. More partitions allow greater # parallelism for consumption, but this will also result in more files across # the brokers. num.partitions=1 # The number of threads per data directory to be used for log recovery at startup and flushing at shutdown. # This value is recommended to be increased for installations with data dirs located in RAID array. num.recovery.threads.per.data.dir=1 ############################# Log Flush Policy ############################# # Messages are immediately written to the filesystem but by default we only fsync() to sync # the OS cache lazily. The following configurations control the flush of data to disk. # There are a few important trade-offs here: # 1. Durability: Unflushed data may be lost if you are not using replication. # 2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush. # 3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks. # The settings below allow one to configure the flush policy to flush data after a period of time or # every N messages (or both). This can be done globally and overridden on a per-topic basis. # The number of messages to accept before forcing a flush of data to disk #log.flush.interval.messages=10000 # The maximum amount of time a message can sit in a log before we force a flush #log.flush.interval.ms=1000 ############################# Log Retention Policy ############################# # The following configurations control the disposal of log segments. The policy can # be set to delete segments after a period of time, or after a given size has accumulated. # A segment will be deleted whenever *either* of these criteria are met. Deletion always happens # from the end of the log. # The minimum age of a log file to be eligible for deletion log.retention.hours=168 # A size-based retention policy for logs. Segments are pruned from the log as long as the remaining # segments don't drop below log.retention.bytes. #log.retention.bytes=1073741824 # The maximum size of a log segment file. When this size is reached a new log segment will be created. log.segment.bytes=1073741824 # The interval at which log segments are checked to see if they can be deleted according # to the retention policies log.retention.check.interval.ms=300000 ############################# Zookeeper ############################# # Zookeeper connection string (see zookeeper docs for details). # This is a comma separated host:port pairs, each corresponding to a zk # server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002". # You can also append an optional chroot string to the urls to specify the # root directory for all kafka znodes. zookeeper.connect=172.31.10.1:2181,172.31.10.2:2181,172.31.10.2:2181/kafka # Timeout in ms for connecting to zookeeper zookeeper.connection.timeout.ms=6000 |
- 同步到其他服务器,更改broker.id
- kafka启动和验证
12
cd /opt/kafka/kafka_2.11-0.10.1.0/bin
nohup /opt/kafka/kafka_2.11-0.10.1.0/bin/kafka-server-start.sh config/server.properties &
创建topic,如能成功创建topic则表示集群安装完成,也可以用jps命令查看kafka进程是否存在。
1/opt/kafka/kafka_2.11-0.10.1.0/bin/kafka-topics.sh --create --zookeeper 172.31.10.1:2181,172.31.10.2:2181,172.31.10.2:2181/kafka --replication-factor 3 --partitions 1 --topic test
至此,kafka分布式集群安装完成,后续将深入讲解kafka其他内容。
Kakfa的更多相关文章
- 【原创】Kakfa cluster包源代码分析
kafka.cluster包定义了Kafka的基本逻辑概念:broker.cluster.partition和replica——这些是最基本的概念.只有弄懂了这些概念,你才真正地使用kakfa来帮助完 ...
- kakfa源码编译打包
kakfa项目编译: cd /home/zhaofuxin/workspace/kafka-0.8.2.1-src ./gradlew releaseTarGz 会出现如下异常: zhaofuxin@ ...
- Kakfa揭秘 Day9 KafkaReceiver源码解析
Kakfa揭秘 Day9 KafkaReceiver源码解析 上一节课中,谈了Direct的方式来访问kafka的Broker,今天主要来谈一下,另一种方式,也就是KafkaReceiver. 初始化 ...
- Kakfa揭秘 Day8 DirectKafkaStream代码解析
Kakfa揭秘 Day8 DirectKafkaStream代码解析 今天让我们进入SparkStreaming,看一下其中重要的Kafka模块DirectStream的具体实现. 构造Stream ...
- Kakfa揭秘 Day7 Producer源码解密
Kakfa揭秘 Day7 Producer源码解密 今天我们来研究下Producer.Producer的主要作用就是向Kafka的brokers发送数据.从思考角度,为了简化思考过程,可以简化为一个单 ...
- Kakfa揭秘 Day6 Consumer源码解密
Kakfa揭秘 Day6 Consumer源码解密 今天主要分析下Consumer是怎么来工作的,今天主要是例子出发,对整个过程进行刨析. 简单例子 Example中Consumer.java是一个简 ...
- Kakfa揭秘 Day5 SocketServer下的NIO
Kakfa揭秘 Day5 SocketServer下的NIO 整个Kafka底层都是基于NIO来进行开发的,这种消息机制可以达到弱耦合的效果,同时在磁盘有很多数据时,会非常的高效,在gc方面有非常大的 ...
- Kakfa揭秘 Day4 Kafka中分区深度解析
Kakfa揭秘 Day4 Kafka中分区深度解析 今天主要谈Kafka中的分区数和consumer中的并行度.从使用Kafka的角度说,这些都是至关重要的. 分区原则 Partition代表一个to ...
- Kakfa揭秘 Day3 Kafka源码概述
Kakfa揭秘 Day3 Kafka源码概述 今天开始进入Kafka的源码,本次学习基于最新的0.10.0版本进行.由于之前在学习Spark过程中积累了很多的经验和思想,这些在kafka上是通用的. ...
随机推荐
- JUnit + Mockito 单元测试(二)
摘自: http://blog.csdn.net/zhangxin09/article/details/42422643 版权声明:本文为博主原创文章,未经博主允许不得转载. 目录(?)[-] 入门 ...
- AnyCAD C++ SDK与OpenCASCADE互操作
AnyCAD SDK有.Net和C++两个版本,使用C++版本的AnyPlatformOcc模块可以实现与OpenCASCADE互操作. C++版本(VS2010 32bit)下载 在AOBridge ...
- [转]基于AnyCAD的准双曲面齿轮建模
基于AnyCAD的准双曲面齿轮建模 作者:谨阳 (文章来源:http://www.opencascade.net/ask/?/article/6) 摘要:根据准双面齿轮的加工方法和传动特性,对准双面齿 ...
- Ubuntu点滴--apt-get update和upgrade的作用
update update is used to resynchronize the package index files from their sources. The indexes of av ...
- 搭建高性能计算环境(二)、远程登录Linux服务器
一般操作Linux系统都是通过远程登录使用的,本节介绍几种远程登录Linux.上传下载文件的工具. 1. Secure Shell SSH 简单方便.既能使用命令行登陆也能传文件,免费. 打开SSH ...
- [leetcode]_Valid Palindrome
题目:判断一个数字字符串是否是回文串.空串认为是回文串. 思路:双指针问题,重点在于此题的很多陷进:例如,s = " " ,return true. s = ".,&qu ...
- tomcat servlet 线程
在服务器里,有一个servlet,当客户端第一次访问服务器时,tomcat会 帮我们建一个servlet的对象出来,(注意: tomcat里面可能部署了10个Servlet,如果某一个Servlet从 ...
- JSON.stringify 语法实例讲解
语法: JSON.stringify(value [, replacer] [, space]) value:是必选字段.就是你输入的对象,比如数组,类等. replacer:这个是可选的.它又分为 ...
- hbase与Hadoop
Hbase是一种低延迟的在线系统,Hadoop是优化吞吐量的离线系统.这种互补关系造就了一种强大的.灵活的数据平台,可以用来搭建水平扩展的数据应用.
- 二,CentOS minimal 网络配置及用yum安装所需软件
CentOS minimal在刚安装完成后,ifconfig一下没发现网卡,是因为使用最小安装的网卡默认没启动,设置配置文件很简单,如下: 1.打开配置文件 vi /etc/sysconfig/net ...