kafka 0.11 spark 2.11 streaming例子
"""
Counts words in UTF8 encoded, '\n' delimited text received from the network every second.
Usage: kafka_wordcount.py <zk> <topic>
To run this on your local machine, you need to setup Kafka and create a producer first, see
http://kafka.apache.org/documentation.html#quickstart
and then run the example
`$ bin/spark-submit --jars \
external/kafka-assembly/target/scala-*/spark-streaming-kafka-assembly-*.jar \
examples/src/main/python/streaming/kafka_wordcount.py \
localhost:2181 test`
"""
from __future__ import print_function import sys from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: kafka_wordcount.py <zk> <topic>", file=sys.stderr)
exit(-1) sc = SparkContext(appName="PythonStreamingKafkaWordCount")
ssc = StreamingContext(sc, 1) zkQuorum, topic = sys.argv[1:]
kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer", {topic: 1})
lines = kvs.map(lambda x: x[1])
counts = lines.flatMap(lambda line: line.split(" ")) \
.map(lambda word: (word, 1)) \
.reduceByKey(lambda a, b: a+b)
counts.pprint() ssc.start()
ssc.awaitTermination()
/spark-kafka/spark-2.1.1-bin-hadoop2.6# ./bin/spark-submit --jars ~/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar examples/src/main/python/streaming/kafka_wordcount.py localhost:2181 test
其中:spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar在 http://search.maven.org/#search%7Cga%7C1%7Cspark-streaming-kafka-0-8-assembly 下载
kafka 使用0.11版本:
1.3 Quick Start
This tutorial assumes you are starting fresh and have no existing Kafka or ZooKeeper data. Since Kafka console scripts are different for Unix-based and Windows platforms, on Windows platforms use bin\windows\
instead of bin/
, and change the script extension to .bat
.
Step 1: Download the code
Download the 0.11.0.0 release and un-tar it.
1
2
|
> tar -xzf kafka_2.11-0.11.0.0.tgz > cd kafka_2.11-0.11.0.0 |
Step 2: Start the server
Kafka uses ZooKeeper so you need to first start a ZooKeeper server if you don't already have one. You can use the convenience script packaged with kafka to get a quick-and-dirty single-node ZooKeeper instance.
1
2
3
|
> bin /zookeeper-server-start .sh config /zookeeper .properties [2013-04-22 15:01:37,495] INFO Reading configuration from: config /zookeeper .properties (org.apache.zookeeper.server.quorum.QuorumPeerConfig) ... |
Now start the Kafka server:
1
2
3
4
|
> bin /kafka-server-start .sh config /server .properties [2013-04-22 15:01:47,028] INFO Verifying properties (kafka.utils.VerifiableProperties) [2013-04-22 15:01:47,051] INFO Property socket.send.buffer.bytes is overridden to 1048576 (kafka.utils.VerifiableProperties) ... |
Step 3: Create a topic
Let's create a topic named "test" with a single partition and only one replica:
1
|
> bin /kafka-topics .sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test |
We can now see that topic if we run the list topic command:
1
2
|
> bin /kafka-topics .sh --list --zookeeper localhost:2181 test |
Alternatively, instead of manually creating topics you can also configure your brokers to auto-create topics when a non-existent topic is published to.
Step 4: Send some messages
Kafka comes with a command line client that will take input from a file or from standard input and send it out as messages to the Kafka cluster. By default, each line will be sent as a separate message.
Run the producer and then type a few messages into the console to send to the server.
1
2
3
|
> bin /kafka-console-producer .sh --broker-list localhost:9092 --topic test This is a message This is another message |
Step 5: Start a consumer
Kafka also has a command line consumer that will dump out messages to standard output.
1
2
3
|
> bin /kafka-console-consumer .sh --bootstrap-server localhost:9092 --topic test --from-beginning This is a message This is another message |
kafka 0.11 spark 2.11 streaming例子的更多相关文章
- kafka 0.8+spark offset 提交至mysql
kafka版本:<kafka.version> 0.8.2.1</kafka.version> spark版本 <artifactId>spark-streamin ...
- 【原创】Kafka 0.11消息设计
Kafka 0.11版本增加了很多新功能,包括支持事务.精确一次处理语义和幂等producer等,而实现这些新功能的前提就是要提供支持这些功能的新版本消息格式,同时也要维护与老版本的兼容性.本文将详细 ...
- 【译】Flink + Kafka 0.11端到端精确一次处理语义的实现
本文是翻译作品,作者是Piotr Nowojski和Michael Winters.前者是该方案的实现者. 原文地址是https://data-artisans.com/blog/end-to-end ...
- Kafka 0.11.0.0 实现 producer的Exactly-once 语义(中文)
很高兴地告诉大家,具备新的里程碑意义的功能的Kafka 0.11.x版本(对应 Confluent Platform 3.3)已经release,该版本引入了exactly-once语义,本文阐述的内 ...
- Kafka 0.11.0.0 实现 producer的Exactly-once 语义(英文)
Exactly-once Semantics are Possible: Here’s How Kafka Does it I’m thrilled that we have hit an excit ...
- Kafka设计解析(二十二)Flink + Kafka 0.11端到端精确一次处理语义的实现
转载自 huxihx,原文链接 [译]Flink + Kafka 0.11端到端精确一次处理语义的实现 本文是翻译作品,作者是Piotr Nowojski和Michael Winters.前者是该方案 ...
- Kafka设计解析(十六)Kafka 0.11消息设计
转载自 huxihx,原文链接 [原创]Kafka 0.11消息设计 目录 一.Kafka消息层次设计 1. v1格式 2. v2格式 二.v1消息格式 三.v2消息格式 四.测试对比 Kafka 0 ...
- Kafka 0.11.0.0 实现 producer的Exactly-once 语义(官方DEMO)
<dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-clients&l ...
- Kafka 0.11新功能介绍:空消费组延迟rebalance
Kafka 0.11新功能介绍:空消费组延迟rebalance 在0.11之前的版本中,多个consumer实例加入到一个空消费组将导致多次的rebalance,这是由于每个consumer inst ...
随机推荐
- 局域网网络性能測试方法HDtune 64K有缓存測速法,让你得知你的网络性能
该方法能够有效測试出您的网络传输性能究竟有多高,该方法通用于有盘,无盘(系统虚拟盘) ,游戏虚拟盘,以及其它带有缓存功能的虚拟盘软件,可是由于每款软件的工作方式和原理都不仅同样,所以每款软件的測试结果 ...
- 受 SQLite 多年青睐,C 语言到底好在哪儿?
SQLite 近日发表了一篇博文,解释了为什么多年来 SQLite 一直坚持用 C 语言来实现,以下是正文内容: C 语言是最佳选择 从2000年5月29日发布至今,SQLite 一直都是用 C 语言 ...
- ubuntu dig timeout解决方法,dnscat执行失败也是这个原因
dig 失败一般是什么原因?怎么排查? sudo vi /etc/resolvconf/resolv.conf.d/head 写入: nameserver 223.5.5.5nameserver 22 ...
- ubuntu16.04下配置caffe无GPU
1. 安装依赖项 1 sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5- ...
- 转:utf8汉字编码16进制对照
http://blog.chinaunix.net/uid-25544300-id-3281847.html GB Unicode UTF-8 Chinese Character Co ...
- mysql(8.0.*版本 windows10 )忘记密码解决方案
安装完mysql-8.0.13-winx64后,一些列的安装命令过后再执行mysql -uroot -p之后 报错了 what fuck 什么鬼,就是这个错 ERROR (): Access deni ...
- NOIp2018之前打模板出现的问题汇总
灵感来源是因为调试了一下午dij,就想把错误记下来 dij:结构体里的重载运算符不会写 结构体声明后要加引号 用于排序的结构体按照边长度排序 匈牙利:在dfs中的if语句中,要继续搜 ...
- Rsync 服务器搭建
Rsync简介 rsync 是一个 Unix 系统下的文件同步和传输工具. 它具备以下特性: 1. 能更新整个目录和树和文件系统 2. 有选择性的保持符号链链.硬链接.文件属于.权限.设备以及时间 等 ...
- WPF MVVM 从Prism中学习设计模式之Event Aggregator 模式
Prism简介 Prism是由微软Patterns & Practices团队开发的项目,目的在于帮助开发人员构建松散耦合的.更灵活.更易于维护并且更易于测试的WPF应用或是Silverlig ...
- Edge 通过代理无法打开网页,解决方案
netsh winhttp import proxy source=ie