The docker image wurstmeister/kafka is the most stared image for kafka in hub.docker.com, but the useage is rare, so  in this post, I would take some time to talk about the usage of this docker image.

1. first, let's take some time to show the command to start the container instance in docker engine:

docker run -d  --name kafka -p :              \
-e KAFKA_ADVERTISED_HOST_NAME=kafka \
-e KAFKA_ZOOKEEPER_CONNECT=zookeeper: \
-e KAFKA_ADVERTISED_PORT= \
-e KAFKA_BROKER_ID= \
-e KAFKA_LISTENERS=PLAINTEXT://:9092 \
-e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://:9092 \
-e KAFKA_CREATE_TOPICS="stream-in:2:1,stream-out:2:1" \
--link zookeeper wurstmeister/kafka:1.1.

In the command above, we showed the flowing features about the usage of this image:

a). specify the advertised host name,whichi would regulalr be the container name and routed to the actual host ip of  the container defined in the /etc/hosts file.

b). the zookeeper list which kafka cluster used for cluster coordinate, there should be at least one zookeeper started, or else the kafka should be start failed.

c). the port number via which can accesse the kafka broker runed in the container instacne.

d).KAFKA_LISTENERS,KAFKA_ADVERTISED_LISTENERS, these two enviroment variables should be defined, or esle the container would start failed.

. advertised.listeners需要配置,如果不配置会使用listeners属性,如果listeners也不配置,
通过默认的方式获取:java.net.InetAddress.getCanonicalHostName(),该方法预计会返回hostname。
But:host.name 开始只绑定在了内部IP上,对外网卡无法访问.需要避免将Kafka broker机器的hostname注册进zookeeper . kafka的advertised.host.name参数 外网访问配置
kafka的server.properties文件 ```host.name```开始只绑定在了内部IP上,对外网卡无法访问。 把值设置为空的话会kafka监听端口在所有的网卡上绑定。但是在外网访问时,客户端又遇到了```java.nio.channels.ClosedChannelException```异常信息,
server端用tcpdump分析的时候发现客户端有传递kafka所在机器的机器名过来。在client里断点跟踪一下发现是findLeader的时候返回的元信息是机器名而不是IP。
客户端无法解析这个机器名所以出现了前面的异常。 在server.properties 里还有另一个参数是解决这个问题的, advertised.host.name参数用来配置返回的host.name值,把这个参数配置为外网IP地址即可。 这个参数默认没有启用,默认是返回的java.net.InetAddress.getCanonicalHostName的值,在我的mac上这个值并不等于hostname的值而是返回IP,但在linux上这个值就是hostname的值。 除了IP之外,还有PORT,外网对应的PORT也需要修改。以下是server.properties文件对应位置。 # Hostname the broker will advertise to producers and consumers. If not set, it uses the
# value for "host.name" if configured. Otherwise, it will use the value returned from
# java.net.InetAddress.getCanonicalHostName().
#advertised.host.name=<hostname routable by clients> # The port to publish to ZooKeeper for clients to use. If this is not set,
# it will publish the same port that the broker binds to.
#advertised.port=<port accessible by clients> 当Kafka broker启动时,它会在ZK上注册自己的IP和端口号,客户端就通过这个IP和端口号来连接。 在AWS这种IaaS环境下,由于java.net.InetAddress.getCanonicalHostName调用拿到的HostName是类似ip----199这样的只有内网才能访问到的主机名,所以默认注册到ZK上的IP是内网才能访问的内网IP。 此时就需要显示指定 advertised.host.name, advertised.listeners参数,让注册到ZK上的IP是外网IP。 例如对于 59.64.11.22 IP对应的broker,需要在 server.properties 配置文件里增加如下三个配置:
advertised.host.name
advertised.listeners
advertised.port 新版配置
advertised.listeners=PLAINTEXT://59.64.11.22:9092 估计读者们也会跟我一样犯迷糊,为什么需要三个参数来配置IP和端口号呢,用一个advertised.listeners不就搞定了吗? 后来发现最新版本0..x broker配置弃用了advertised.host.name 和 advertised.port 这两个个配置项,就配置advertised.listeners就可以了。

2. reminder about how to install kafka cluster distributed.

for detail please refer to my blog post.

In this blog post, there are serveral key points about install kafka cluster:

a) the broker id, which represents the unique id of the broker in the kafka cluster.

b) zookeeper.connect which stores the meta data for cluster, the kafka cluster (even only has one single node kafka node) depends on zookeeper to functionate.

c) the host.name property which refer to  to actual ip address or hostname( this property has been deprected since  kafka 0.11), we can ignore it.

d) set the value of advertised.host.name,advertised.listeners,advertised.port which would publish the service to the client, so the client can access kafka via the address and port number through the value confied by these 2 parameters.

3. internal of the wurstmeister/kafka docker  images.

begin we goes deepinside the internal of the kafka image, let's take a look at the entry_point script of the image:

#!/bin/bash -e

# Store original IFS config, so we can restore it at various stages
ORIG_IFS=$IFS if [[ -z "$KAFKA_ZOOKEEPER_CONNECT" ]]; then
echo "ERROR: missing mandatory config: KAFKA_ZOOKEEPER_CONNECT"
exit
fi if [[ -z "$KAFKA_PORT" ]]; then
export KAFKA_PORT=
fi create-topics.sh &
unset KAFKA_CREATE_TOPICS # DEPRECATED: but maintained for compatibility with older brokers pre 0.9. (https://issues.apache.org/jira/browse/KAFKA-1809)
if [[ -z "$KAFKA_ADVERTISED_PORT" && \
-z "$KAFKA_LISTENERS" && \
-z "$KAFKA_ADVERTISED_LISTENERS" && \
-S /var/run/docker.sock ]]; then
KAFKA_ADVERTISED_PORT=$(docker port "$(hostname)" $KAFKA_PORT | sed -r 's/.*:(.*)/\1/g')
export KAFKA_ADVERTISED_PORT
fi if [[ -z "$KAFKA_BROKER_ID" ]]; then
if [[ -n "$BROKER_ID_COMMAND" ]]; then
KAFKA_BROKER_ID=$(eval "$BROKER_ID_COMMAND")
export KAFKA_BROKER_ID
else
# By default auto allocate broker ID
export KAFKA_BROKER_ID=-
fi
fi if [[ -z "$KAFKA_LOG_DIRS" ]]; then
export KAFKA_LOG_DIRS="/kafka/kafka-logs-$HOSTNAME"
fi if [[ -n "$KAFKA_HEAP_OPTS" ]]; then
sed -r -i 's/(export KAFKA_HEAP_OPTS)="(.*)"/\1="'"$KAFKA_HEAP_OPTS"'"/g' "$KAFKA_HOME/bin/kafka-server-start.sh"
unset KAFKA_HEAP_OPTS
fi if [[ -n "$HOSTNAME_COMMAND" ]]; then
HOSTNAME_VALUE=$(eval "$HOSTNAME_COMMAND") # Replace any occurences of _{HOSTNAME_COMMAND} with the value
IFS=$'\n'
for VAR in $(env); do
if [[ $VAR =~ ^KAFKA_ && "$VAR" =~ "_{HOSTNAME_COMMAND}" ]]; then
eval "export ${VAR//_\{HOSTNAME_COMMAND\}/$HOSTNAME_VALUE}"
fi
done
IFS=$ORIG_IFS
fi if [[ -n "$PORT_COMMAND" ]]; then
PORT_VALUE=$(eval "$PORT_COMMAND") # Replace any occurences of _{PORT_COMMAND} with the value
IFS=$'\n'
for VAR in $(env); do
if [[ $VAR =~ ^KAFKA_ && "$VAR" =~ "_{PORT_COMMAND}" ]]; then
eval "export ${VAR//_\{PORT_COMMAND\}/$PORT_VALUE}"
fi
done
IFS=$ORIG_IFS
fi if [[ -n "$RACK_COMMAND" && -z "$KAFKA_BROKER_RACK" ]]; then
KAFKA_BROKER_RACK=$(eval "$RACK_COMMAND")
export KAFKA_BROKER_RACK
fi # Try and configure minimal settings or exit with error if there isn't enough information
if [[ -z "$KAFKA_ADVERTISED_HOST_NAME$KAFKA_LISTENERS" ]]; then
if [[ -n "$KAFKA_ADVERTISED_LISTENERS" ]]; then
echo "ERROR: Missing environment variable KAFKA_LISTENERS. Must be specified when using KAFKA_ADVERTISED_LISTENERS"
exit
elif [[ -z "$HOSTNAME_VALUE" ]]; then
echo "ERROR: No listener or advertised hostname configuration provided in environment."
echo " Please define KAFKA_LISTENERS / (deprecated) KAFKA_ADVERTISED_HOST_NAME"
exit
fi # Maintain existing behaviour
# If HOSTNAME_COMMAND is provided, set that to the advertised.host.name value if listeners are not defined.
export KAFKA_ADVERTISED_HOST_NAME="$HOSTNAME_VALUE"
fi #Issue newline to config file in case there is not one already
echo "" >> "$KAFKA_HOME/config/server.properties" (
# Read in env as a new-line separated array. This handles the case of env variables have spaces and/or carriage returns. See #
IFS=$'\n'
for VAR in $(env)
do
if [[ $VAR =~ ^KAFKA_ && ! $VAR =~ ^KAFKA_HOME ]]; then
kafka_name=$(echo "$VAR" | sed -r 's/KAFKA_(.*)=.*/\1/g' | tr '[:upper:]' '[:lower:]' | tr _ .)
env_var=$(echo "$VAR" | sed -r 's/(.*)=.*/\1/g')
if grep -E -q '(^|^#)'"$kafka_name=" "$KAFKA_HOME/config/server.properties"; then
sed -r -i 's@(^|^#)('"$kafka_name"')=(.*)@\2='"${!env_var}"'@g' "$KAFKA_HOME/config/server.properties" #note that no config values may contain an '@' char
else
echo "$kafka_name=${!env_var}" >> "$KAFKA_HOME/config/server.properties"
fi
fi if [[ $VAR =~ ^LOG4J_ ]]; then
log4j_name=$(echo "$VAR" | sed -r 's/(LOG4J_.*)=.*/\1/g' | tr '[:upper:]' '[:lower:]' | tr _ .)
log4j_env=$(echo "$VAR" | sed -r 's/(.*)=.*/\1/g')
if grep -E -q '(^|^#)'"$log4j_name=" "$KAFKA_HOME/config/log4j.properties"; then
sed -r -i 's@(^|^#)('"$log4j_name"')=(.*)@\2='"${!log4j_env}"'@g' "$KAFKA_HOME/config/log4j.properties" #note that no config values may contain an '@' char
else
echo "$log4j_name=${!log4j_env}" >> "$KAFKA_HOME/config/log4j.properties"
fi
fi
done
) if [[ -n "$CUSTOM_INIT_SCRIPT" ]] ; then
eval "$CUSTOM_INIT_SCRIPT"
fi exec "$KAFKA_HOME/bin/kafka-server-start.sh" "$KAFKA_HOME/config/server.properties"

3.1 KAFKA_ZOOKEEPER_CONNECT is the required env for starting docker images.

if [[ -z "$KAFKA_ZOOKEEPER_CONNECT" ]]; then
echo "ERROR: missing mandatory config: KAFKA_ZOOKEEPER_CONNECT"
exit
fi

3.2 KAFKA_PORT is optional parameter.

if [[ -z "$KAFKA_PORT" ]]; then
export KAFKA_PORT=
fi

if env KAFKA_PORT is not provided, then would use the default port number 9092 as the kafka port number.

3.3 KAFKA_BROKER_ID is optional parameter.

if [[ -z "$KAFKA_BROKER_ID" ]]; then
if [[ -n "$BROKER_ID_COMMAND" ]]; then
KAFKA_BROKER_ID=$(eval "$BROKER_ID_COMMAND")
export KAFKA_BROKER_ID
else
# By default auto allocate broker ID
export KAFKA_BROKER_ID=-
fi
fi

if not provided, then use -1 which mean auto generated broker id.

3.4 we can define jvm parameter using KAFKA_HEAP_OPTS which would take effect by modified the corespinding segment in kafka-server-start.sh.

if [[ -n "$KAFKA_HEAP_OPTS" ]]; then
sed -r -i 's/(export KAFKA_HEAP_OPTS)="(.*)"/\1="'"$KAFKA_HEAP_OPTS"'"/g' "$KAFKA_HOME/bin/kafka-server-start.sh"
unset KAFKA_HEAP_OPTS
fi

3.5  The configuration of  KAFKA_ADVERTISED_HOST_NAME,KAFKA_LISTENERS,KAFKA_ADVERTISED_LISTENERS,HOSTNAME_VALUE

# Try and configure minimal settings or exit with error if there isn't enough information
if [[ -z "$KAFKA_ADVERTISED_HOST_NAME$KAFKA_LISTENERS" ]]; then
if [[ -n "$KAFKA_ADVERTISED_LISTENERS" ]]; then
echo "ERROR: Missing environment variable KAFKA_LISTENERS. Must be specified when using KAFKA_ADVERTISED_LISTENERS"
exit
elif [[ -z "$HOSTNAME_VALUE" ]]; then
echo "ERROR: No listener or advertised hostname configuration provided in environment."
echo " Please define KAFKA_LISTENERS / (deprecated) KAFKA_ADVERTISED_HOST_NAME"
exit
fi # Maintain existing behaviour
# If HOSTNAME_COMMAND is provided, set that to the advertised.host.name value if listeners are not defined.
export KAFKA_ADVERTISED_HOST_NAME="$HOSTNAME_VALUE"
fi

The script above indicate the following tips:

a) we should config KAFKA_ADVERTISED_HOST_NAME, this config is the reqired config parameter.

The usage of docker image wurstmeister/kafka的更多相关文章

  1. Docker搭建Zookeeper&Kafka集群

    最近在学习Kafka,准备测试集群状态的时候感觉无论是开三台虚拟机或者在一台虚拟机开辟三个不同的端口号都太麻烦了(嗯..主要是懒). 环境准备 一台可以上网且有CentOS7虚拟机的电脑 为什么使用虚 ...

  2. docker下安装kafka和kafka-manager

    1.下载镜像 这里使用了wurstmeister/kafka和wurstmeister/zookeeper这两个版本的镜像 docker pull wurstmeister/zookeeper doc ...

  3. docker 部署 zookeeper+kafka 集群

    主机三台172.16.100.61172.16.100.62172.16.100.63Docker 版本 当前最新版 # 部署zk有2种方法 ## 注意 \后不要跟空格 一 . 端口映射 172.16 ...

  4. Docker下安装kafka

    先看一下有哪些选择 额,没有官方的,但是可以根据stars来找一个,大多数人都选择第一个,我们看一下GitHub就知道了. 第一个:https://github.com/wurstmeister/ka ...

  5. docker下部署kafka集群(多个broker+多个zookeeper)

    网上关于kafka集群的搭建,基本是单个broker和单个zookeeper,测试研究的意义不大.于是折腾了下,终于把正宗的Kafka集群搭建出来了,在折腾中遇到了很多坑,后续有时间再专门整理份搭建问 ...

  6. Docker快速安装kafka

    Docker快速安装kafka | 沈健的技术博客 盒子 盒子 博客 分类 标签 友链 搜索 文章目录 同样基于docker-compose安装,Docker快速部署nginx中有讲到,不在重述 1. ...

  7. Docker实战之Kafka集群

    1. 概述 Apache Kafka 是一个快速.可扩展的.高吞吐.可容错的分布式发布订阅消息系统.其具有高吞吐量.内置分区.支持数据副本和容错的特性,适合在大规模消息处理场景中使用. 笔者之前在物联 ...

  8. docker启动服务---------------kafka+zookeeper

    docker run -d --name zookeeper -p 2181:2181 wurstmeister/zookeeperdocker run -d --name kafka -p 9092 ...

  9. 用 Docker 快速搭建 Kafka 集群

    开源Linux 一个执着于技术的公众号 版本 •JDK 14•Zookeeper•Kafka 安装 Zookeeper 和 Kafka Kafka 依赖 Zookeeper,所以我们需要在安装 Kaf ...

随机推荐

  1. IOS 生成静态库文件(.a文件)

    http://www.cnblogs.com/lyy-5518/p/5459643.html

  2. Kylin引入Spark引擎

    1 引入Spark引擎 Kylin v2开始引入了Spark引擎,可以在构建Cube步骤中替换MapReduce. 关于配置spark引擎的文档,下面给出官方链接以便查阅:http://kylin.a ...

  3. WEBBASE篇: 第十一篇, JavaScript知识6

    JavaScript 知识6 一, String 对象 1,分隔字符串, 函数: split(seperator) 作用: 将字符串,通过seperator 拆分成一个数组: eg: var msg= ...

  4. snprintf笔记

    在weibo上看到Laruence大神修复了一个使用snprintf的bug (http://t.cn/Rm6AuFh) 引起了TK教主的关注.TK教主着重提到了在windows下snprintf与_ ...

  5. vue生命周期图片

  6. django如何查看mysql已有数据库中已有表格

    首先正常创建django项目,配虚拟环境,改配置文件,正常创建models,创建迁移和启动迁移等 接着通过找到Terminal窗户,双击,打开窗口 输入命令 python manage.py in i ...

  7. http协议常见状态码含义

    状态码有三位数字组成,第一个数字定义了响应的类别,且有五种可能取值: 2xx:成功--表示请求已被成功接收.理解.接受 200(成功)  服务器已成功处理了请求.通常,这表示服务器提供了请求的网页. ...

  8. 2018.5.11 B树总结

    小结 B树:二叉树,每个结点只存储一个关键字,等于则命中,小于走左结点,大于 走右结点: B-树:多路搜索树,每个结点存储M/2到M个关键字,非叶子结点存储指向关键 字范围的子结点: 所有关键字在整颗 ...

  9. MySQL Hardware--网络测试

    使用Ping测试丢包 ## ping测试 ## -c 100表示100次 ping -c 100 192.168.1.2 输出结果: ping -c 100 192.168.1.2 PING 192. ...

  10. Pac-Man 吃豆人

    发售年份 1980 平台 街机 开发商 南梦宫(Namco) 类型 迷宫 https://www.youtube.com/watch?v=dScq4P5gn4A