kafka是干什么的,有和特性,我这里就不多说,详情自己研究官方文档

0. 背景介绍

我需要在三台机器上分别部署kafka broker的实例,构建成一个集群。
kafka的broker集群,是基于zookeeper作为协调器或者资源同步管理器的,主要是记录High Level Offset标记信息的。 另外,zookeeper还用作broker的选主以及partition的选主。
三台机器上分别安装zookeeper和kafka。

10.90.7.2    Linux localhost.localdomain 2.6.-.el5 # SMP Fri Jul  :: EDT  x86_64 x86_64 x86_64 GNU/Linux
10.90.2.101   Linux bogon 3.10.-.el7.x86_64 # SMP Thu Jan :: EST x86_64 x86_64 x86_64 GNU/Linux
10.90.2.102 Linux localhost.localdomain 3.10.-.el7.x86_64 # SMP Thu Jan :: EST x86_64 x86_64 x86_64 GNU/Linux

1. 软件下载
下载kafka 1.0.1版本
https://www.apache.org/dyn/closer.cgi?path=/kafka/1.0.1/kafka_2.11-1.0.1.tgz
遵循我一贯的原则,为了生产环境的稳定性,不会去首先使用最新版本,当前这个是次新版本。最新的版本是1.1.0,Released March 28, 2018。
下载zookeeper 3.4.9
https://archive.apache.org/dist/zookeeper/zookeeper-3.4.9/zookeeper-3.4.9.tar.gz
这个是当前稳定运行的版本,是次新版,最新的版本,有几个alpha和beta的,版本好最高达到3.5.4了。

2. 软件安装
2.1 zookeeper安装,三台服务器构建最小集群,保证paxos的选主算法正常运行。配置很简单,下面就只是将配置数据贴出来。

# The number of milliseconds of each tick
tickTime=
# The number of ticks that the initial
# synchronization phase can take
initLimit=
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/opt/shihuc/zookeeper-3.4./zkData/data
dataLogDir=/opt/shihuc/zookeeper-3.4./zkData/logs
# the port at which the clients will connect
clientPort=
# the maximum number of client connections.
# increase this if you need to handle more clients
maxClientCnxns=
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
autopurge.snapRetainCount=
# Purge task interval in hours
# Set to "" to disable auto purge feature
autopurge.purgeInterval= server.=10.90.7.2::
server.=10.90.2.101::
server.=10.90.2.102::

注意,在每一台zookeeper所在的机器对应配置文件dataDir所在的路径下创建myid,myid文件存放zookeeper服务器的编号(正如配置文件中server.x中的x,本案例中x是1,2,3)

启动zookeeper,查看启动脚本的帮助信息:

[root@localhost bin]# ./zkServer.sh
ZooKeeper JMX enabled by default
Using config: /opt/shihuc/zookeeper-3.4./bin/../conf/zoo.cfg
Usage: ./zkServer.sh {start|start-foreground|stop|restart|status|upgrade|print-cmd}

正常启动操作(三台机器,都做相同操作):

[root@localhost bin]# ./zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /opt/shihuc/zookeeper-3.4./bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

检查几个zookeeper的状态:

[root@localhost bin]# ./zkServer.sh status   #---10.90.7.2
ZooKeeper JMX enabled by default
Using config: /opt/shihuc/zookeeper-3.4./bin/../conf/zoo.cfg
Mode: leader
[root@localhost bin]# ./zkServer.sh status #---10.90.2.101
ZooKeeper JMX enabled by default
Using config: /opt/shihuc/zookeeper-3.4./bin/../conf/zoo.cfg
Mode: follower
[root@localhost bin]# ./zkServer.sh status #---10.90.2.102
ZooKeeper JMX enabled by default
Using config: /opt/shihuc/zookeeper-3.4./bin/../conf/zoo.cfg
Mode: follower

2.2 安装kafka

安装很简单,直接将下载的kafka软件的包解压即可,然后配置一下config下面的server.properties文件,主要是修改log路径以及zookeeper的监听地址。然后运行bin下面的kafka-server-start.sh即可。

配置信息:

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License. # see kafka.server.KafkaConfig for additional details and defaults ############################# Server Basics ############################# # The id of the broker. This must be set to a unique integer for each broker.
broker.id=0 ############################# Socket Server Settings ############################# # The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
# FORMAT:
# listeners = listener_name://host_name:port
# EXAMPLE:
# listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://:9092 # Hostname and port the broker will advertise to producers and consumers. If not set,
# it uses the value for "listeners" if configured. Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
#advertised.listeners=PLAINTEXT://your.host.name:9092 # Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL # The number of threads that the server uses for receiving requests from the network and sending responses to the network
num.network.threads=3 # The number of threads that the server uses for processing requests, which may include disk I/O
num.io.threads=8 # The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes= # The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes= # The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes= ############################# Log Basics ############################# # A comma seperated list of directories under which to store log files
log.dirs=/opt/shihuc/kafka_2.11-1.0.1/logDir # The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions= # The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir= ############################# Internal Topic Settings #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than is recommended for to ensure availability such as .
offsets.topic.replication.factor=
transaction.state.log.replication.factor=
transaction.state.log.min.isr= ############################# Log Flush Policy ############################# # Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
# . Durability: Unflushed data may be lost if you are not using replication.
# . Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
# . Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis. # The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages= # The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms= ############################# Log Retention Policy ############################# # The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log. # The minimum age of a log file to be eligible for deletion due to age
log.retention.hours= # A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes= # The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes= # The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms= ############################# Zookeeper ############################# # Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=10.90.7.2:2181,10.90.2.101:2181,10.90.2.102:2181 # Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms= ############################# Group Coordinator Settings ############################# # The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is seconds.
# We override this to here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=

启动kafka服务:

[root@localhost bin]# nohup ./kafka-server-start.sh ../config/server.properties &

对三台机器都做broker的启动操作,遇到下面的问题:

[-- ::,] INFO [TransactionCoordinator id=] Startup complete. (kafka.coordinator.transaction.TransactionCoordinator)
[-- ::,] INFO Creating /brokers/ids/ (is it secure? false) (kafka.utils.ZKCheckedEphemeral)
[-- ::,] INFO Result of znode creation is: NODEEXISTS (kafka.utils.ZKCheckedEphemeral)
[2018-06-12 14:22:39,022] FATAL [KafkaServer id=0] Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
java.lang.RuntimeException: A broker is already registered on the path /brokers/ids/0. This probably indicates that you either have configured a brokerid that is already in use, or else you have shutdown this broker and restarted it fas
ter than the zookeeper timeout so it appears to be re-registering.
at kafka.utils.ZkUtils.registerBrokerInZk(ZkUtils.scala:440)
at kafka.utils.ZkUtils.registerBrokerInZk(ZkUtils.scala:426)
at kafka.server.KafkaHealthcheck.register(KafkaHealthcheck.scala:73)
at kafka.server.KafkaHealthcheck.startup(KafkaHealthcheck.scala:53)
at kafka.server.KafkaServer.startup(KafkaServer.scala:287)
at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
at kafka.Kafka$.main(Kafka.scala:92
)
at kafka.Kafka.main(Kafka.scala)

[-- ::,] INFO [KafkaServer id=] shutting down (kafka.server.KafkaServer)
[-- ::,] INFO [SocketServer brokerId=] Stopping socket server request processors (kafka.network.SocketServer)
[-- ::,] INFO [SocketServer brokerId=] Stopped socket server request processors (kafka.network.SocketServer)
[-- ::,] INFO [Kafka Request Handler on Broker ], shutting down (kafka.server.KafkaRequestHandlerPool)
[-- ::,] INFO [Kafka Request Handler on Broker ], shut down completely (kafka.server.KafkaRequestHandlerPool)
[-- ::,] INFO [KafkaApi-] Shutdown complete. (kafka.server.KafkaApis)
[-- ::,] INFO [ExpirationReaper--topic]: Shutting down (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[-- ::,] INFO [ExpirationReaper--topic]: Stopped (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[-- ::,] INFO [ExpirationReaper--topic]: Shutdown completed (kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper)
[-- ::,] INFO [TransactionCoordinator id=] Shutting down. (kafka.coordinator.transaction.TransactionCoordinator)
[-- ::,] INFO [ProducerId Manager ]: Shutdown complete: last producerId assigned (kafka.coordinator.transaction.ProducerIdManager)
[-- ::,] INFO [Transaction State Manager ]: Shutdown complete (kafka.coordinator.transaction.TransactionStateManager)
[-- ::,] INFO [Transaction Marker Channel Manager ]: Shutting down (kafka.coordinator.transaction.TransactionMarkerChannelManager)

错误原因是server.properties文件中的broker.id的值,在集群环境下重复了,即,一个kafka的集群环境下,broker.id的值是不能重复的,必须唯一。就算kafka服务在不同机器上

3. 验证环境

3.1 创建一个topic

在10.90.2.102上操作:

[root@localhost bin]# ./kafka-topics.sh --create --zookeeper 10.90.7.2:,10.90.2.101:,10.90.2.102: --replication-factor  --partitions  --topic first
Created topic "first".

同一台机器上重复操作:

[root@localhost bin]# ./kafka-topics.sh --create --zookeeper 10.90.7.2:,10.90.2.101:,10.90.2.102: --replication-factor  --partitions  --topic first
Error while executing topic command : Topic 'first' already exists.
[-- ::,] ERROR org.apache.kafka.common.errors.TopicExistsException: Topic 'first' already exists.
(kafka.admin.TopicCommand$)

在10.90.7.2机器上创建相同的topic:

[root@localhost bin]# ./kafka-topics.sh --create --zookeeper 10.90.7.2:,10.90.2.101:,10.90.2.102: --replication-factor  --partitions  --topic first
Error while executing topic command : Topic 'first' already exists.
[-- ::,] ERROR org.apache.kafka.common.errors.TopicExistsException: Topic 'first' already exists.
(kafka.admin.TopicCommand$)

同一个名称的topic,在一个kafka的集群环境下,不能重复创建

3.2 创建一个kafka的生产者

在10.90.2.101上操作:

[root@localhost bin]# ./kafka-console-producer.sh --broker-list 10.90.7.2:,10.90.2.101:,10.90.2.102: --topic first
>
[-- ::,] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id : {first=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
[-- ::,] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id : {first=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
[-- ::,] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id : {first=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
[-- ::,] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id : {first=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
[-- ::,] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id : {first=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
[-- ::,] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id : {first=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
[-- ::,] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id : {first=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)

经过反复测试验证环境配置信息,最终参考了他人的经验,是kafka的server.properties的配置错误。主要是下面的内容配置有问题:

listeners=PLAINTEXT://:9092

将这句注释掉,然后在配置文件中添加下面的两行配置,指明当前broker的地址:

port=
host.name=10.90.7.2 #依据具体的服务器,配置相应的服务器的IP地址即可。

修改后,再次重启kafka服务,重新在某台服务器上启动消息生产者服务,例如在10.90.2.102上:

[root@localhost bin]# ./kafka-console-producer.sh --broker-list 10.90.7.2:,10.90.2.101:,10.90.2.102: --topic first
>hello
>good
>

然后在另外一台服务器上,启动消息消费者,例如在10.90.7.2上:

[root@localhost bin]# ./kafka-console-consumer.sh --bootstrap-server 10.90.7.2:,10.90.2.101:,10.90.2.102: --topic
hello
good

到此为止,kafka生产者消费者,在控制台下消息收发正常,说明kafka的环境配置成功。

3.3 查看不同的topic下的broker信息

[root@localhost bin]# ./kafka-topics.sh --describe --zookeeper 10.90.7.2:,10.90.2.101:,10.90.2.102: --topic first
Topic:first PartitionCount: ReplicationFactor: Configs:
Topic: first Partition: Leader: Replicas: ,, Isr: ,,
[root@localhost bin]# ./kafka-topics.sh --describe --zookeeper 10.90.7.2:,10.90.2.101:,10.90.2.102: --topic second
Topic:second PartitionCount: ReplicationFactor: Configs:
Topic: second Partition: Leader: Replicas: ,, Isr: ,,
Topic: second Partition: Leader: Replicas: ,, Isr: ,,

这是输出解释。第一行给出了各个分区的概况,分区有几个就有几行分区详细信息介绍。(我创建了两个topic,一个是first,只有一个分区;一个是second,两个分区)

Leader   是负责当前分区的所有读写请求。每个节点都将领导一个随机选择的分区。

Replicas   是节点列表,复制分区日志,不管他们是不是Leader或者不管它们是否还活着。

Isr        是in-sync的集合。这是Replicas列表当前还活着的子集。

总体来说,Kafka的环境构建,还是比较容易的,配置信息,相对来说,也比较容易理解。到此,环境的bring up工作完美收工。

Kafka研究【一】:bring up环境的更多相关文章

  1. Kafka:ZK+Kafka+Spark Streaming集群环境搭建(十二)VMW安装四台CentOS,并实现本机与它们能交互,虚拟机内部实现可以上网。

    Centos7出现异常:Failed to start LSB: Bring up/down networking. 按照<Kafka:ZK+Kafka+Spark Streaming集群环境搭 ...

  2. Kafka 0.7.2 单机环境搭建

    Kafka 0.7.2 单机环境搭建当下载完Kafka后,进行解压,其目录结构如下: bin config contrib core DISCLAIMER examples lib lib_manag ...

  3. Kafka:ZK+Kafka+Spark Streaming集群环境搭建(二十一)NIFI1.7.1安装

    一.nifi基本配置 1. 修改各节点主机名,修改/etc/hosts文件内容. 192.168.0.120 master 192.168.0.121 slave1 192.168.0.122 sla ...

  4. Kafka:ZK+Kafka+Spark Streaming集群环境搭建(十三)kafka+spark streaming打包好的程序提交时提示虚拟内存不足(Container is running beyond virtual memory limits. Current usage: 119.5 MB of 1 GB physical memory used; 2.2 GB of 2.1 G)

    异常问题:Container is running beyond virtual memory limits. Current usage: 119.5 MB of 1 GB physical mem ...

  5. Kafka:ZK+Kafka+Spark Streaming集群环境搭建(十一)定制一个arvo格式文件发送到kafka的topic,通过Structured Streaming读取kafka的数据

    将arvo格式数据发送到kafka的topic 第一步:定制avro schema: { "type": "record", "name": ...

  6. Kafka:ZK+Kafka+Spark Streaming集群环境搭建(十)安装hadoop2.9.0搭建HA

    如何搭建配置centos虚拟机请参考<Kafka:ZK+Kafka+Spark Streaming集群环境搭建(一)VMW安装四台CentOS,并实现本机与它们能交互,虚拟机内部实现可以上网.& ...

  7. Kafka:ZK+Kafka+Spark Streaming集群环境搭建(九)安装kafka_2.11-1.1.0

    如何搭建配置centos虚拟机请参考<Kafka:ZK+Kafka+Spark Streaming集群环境搭建(一)VMW安装四台CentOS,并实现本机与它们能交互,虚拟机内部实现可以上网.& ...

  8. Kafka:ZK+Kafka+Spark Streaming集群环境搭建(八)安装zookeeper-3.4.12

    如何搭建配置centos虚拟机请参考<Kafka:ZK+Kafka+Spark Streaming集群环境搭建(一)VMW安装四台CentOS,并实现本机与它们能交互,虚拟机内部实现可以上网.& ...

  9. Kafka:ZK+Kafka+Spark Streaming集群环境搭建(三)安装spark2.2.1

    如何搭建配置centos虚拟机请参考<Kafka:ZK+Kafka+Spark Streaming集群环境搭建(一)VMW安装四台CentOS,并实现本机与它们能交互,虚拟机内部实现可以上网.& ...

  10. Kafka:ZK+Kafka+Spark Streaming集群环境搭建(二)安装hadoop2.9.0

    如何搭建配置centos虚拟机请参考<Kafka:ZK+Kafka+Spark Streaming集群环境搭建(一)VMW安装四台CentOS,并实现本机与它们能交互,虚拟机内部实现可以上网.& ...

随机推荐

  1. golang快速扫描

    利用golang的并发优势快速扫描端口 Scanner startIp [endIp] port thread package main import ( "flag" " ...

  2. CSS3一个酷炫的加载效果

    上效果图,用截屏工具制作的,看起来有点卡,在网页上面显示还是不错的. CSS代码: <style type="text/css"> .loader{ position: ...

  3. react native 之 事件监听 和 回调函数

    同原生一样,react native 同样也有事件监听和回调函数这玩意. 场景很多,比如:A界面push到B界面,B界面再pop回A界面,可以给A界面传值或者告诉A刷新界面. 事件监听 事件监听类似于 ...

  4. poj 2175 费用流消圈

    题意抽象出来就是给了一个费用流的残存网络,判断该方案是不是最优方案,如果不是,还要求给出一个更优方案. 在给定残存网络上检查是否存在负环即可判断是否最优. 沿负环增广一轮即可得到更优方案. 考虑到制作 ...

  5. JAVA基础部分复习(六、常用关键字说明)

    /** * JAVA中常用关键字复习 * final * finalize * finally * * @author dyq * */ public class KeyWordReview exte ...

  6. SLES 12 sp2开启SuSEfirewall2 防火墙后,放行VRRP协议 (用于keepalived搭建高可用规则)

    centos 6下面修改防火墙vi /etc/sysconfig/iptables 增加这个-A INPUT -p 112 -d 224.0.0.0/32 -j ACCEPT #-p 112指定协议为 ...

  7. [LeetCode&Python] Problem 543. Diameter of Binary Tree

    Given a binary tree, you need to compute the length of the diameter of the tree. The diameter of a b ...

  8. hdoj-1114 (背包dp)

    题目链接 题意:已知n种coin的价值和体积  求装满容量为v背包的最小硬币价值 #include <algorithm> #include <cstdio> #include ...

  9. xdoj 1028 (素数线性筛+dp)

    #include <bits/stdc++.h> using namespace std; ; int prime[N]; int dp[N]; int main () { memset ...

  10. C语言--第七周作业评分(5班)

    作业链接:https://edu.cnblogs.com/campus/hljkj/CS2017-5/homework/1304 一.评分要求 要求1 完成PTA第七周所有题,总共两次题,每次12.5 ...