storm集成kafka的应用,从kafka读取,写入kafka

                                                      by 小闪电

0前言

  storm的主要作用是进行流式的实时计算,对于一直产生的数据流处理是非常迅速的,然而大部分数据并不是均匀的数据流,而是时而多时而少。对于这种情况下进行批处理是不合适的,因此引入了kafka作为消息队列,与storm完美配合,这样可以实现稳定的流式计算。下面是一个简单的示例实现从kafka读取数据,并写入到kafka,以此来掌握storm与kafka之间的交互。

1程序框图

  实质上就是storm的kafkaspout作为一个consumer,kafkabolt作为一个producer。

  框图如下:

        

2 pom.xml

  建立一个maven项目,将storm,kafka,zookeeper的外部依赖叠加起来。

  

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <project xmlns="http://maven.apache.org/POM/4.0.0"
  3. xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  4. xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  5. <modelVersion>4.0.0</modelVersion>
  6.  
  7. <groupId>org.tony</groupId>
  8. <artifactId>storm-example</artifactId>
  9. <version>1.0-SNAPSHOT</version>
  10.  
  11. <dependencies>
  12. <dependency>
  13. <groupId>org.apache.storm</groupId>
  14. <artifactId>storm-core</artifactId>
  15. <version>0.9.3</version>
  16. <!--<scope>provided</scope>-->
  17. </dependency>
  18.  
  19. <dependency>
  20. <groupId>org.apache.storm</groupId>
  21. <artifactId>storm-kafka</artifactId>
  22. <version>0.9.3</version>
  23. <!--<scope>provided</scope>-->
  24. </dependency>
  25.  
  26. <dependency>
  27.  
  28. <groupId>com.google.protobuf</groupId>
  29.  
  30. <artifactId>protobuf-java</artifactId>
  31.  
  32. <version>2.5.0</version>
  33.  
  34. </dependency>
  35.  
  36. <!-- storm-kafka模块需要的依赖 -->
  37. <dependency>
  38. <groupId>org.apache.curator</groupId>
  39. <artifactId>curator-framework</artifactId>
  40. <version>2.5.0</version>
  41. <exclusions>
  42. <exclusion>
  43. <groupId>log4j</groupId>
  44. <artifactId>log4j</artifactId>
  45. </exclusion>
  46. <exclusion>
  47. <groupId>org.slf4j</groupId>
  48. <artifactId>slf4j-log4j12</artifactId>
  49. </exclusion>
  50. </exclusions>
  51. </dependency>
  52.  
  53. <!-- kafka -->
  54. <dependency>
  55. <groupId>org.apache.kafka</groupId>
  56. <artifactId>kafka_2.10</artifactId>
  57. <version>0.8.1.1</version>
  58. <exclusions>
  59. <exclusion>
  60. <groupId>org.apache.zookeeper</groupId>
  61. <artifactId>zookeeper</artifactId>
  62. </exclusion>
  63. <exclusion>
  64. <groupId>log4j</groupId>
  65. <artifactId>log4j</artifactId>
  66. </exclusion>
  67. </exclusions>
  68. </dependency>
  69. </dependencies>
  70.  
  71. <repositories>
  72. <repository>
  73. <id>central</id>
  74. <url>http://repo1.maven.org/maven2/</url>
  75. <snapshots>
  76. <enabled>false</enabled>
  77. </snapshots>
  78. <releases>
  79. <enabled>true</enabled>
  80. </releases>
  81. </repository>
  82. <repository>
  83. <id>clojars</id>
  84. <url>https://clojars.org/repo/</url>
  85. <snapshots>
  86. <enabled>true</enabled>
  87. </snapshots>
  88. <releases>
  89. <enabled>true</enabled>
  90. </releases>
  91. </repository>
  92. <repository>
  93. <id>scala-tools</id>
  94. <url>http://scala-tools.org/repo-releases</url>
  95. <snapshots>
  96. <enabled>true</enabled>
  97. </snapshots>
  98. <releases>
  99. <enabled>true</enabled>
  100. </releases>
  101. </repository>
  102. <repository>
  103. <id>conjars</id>
  104. <url>http://conjars.org/repo/</url>
  105. <snapshots>
  106. <enabled>true</enabled>
  107. </snapshots>
  108. <releases>
  109. <enabled>true</enabled>
  110. </releases>
  111. </repository>
  112. </repositories>
  113.  
  114. <build>
  115. <plugins>
  116. <plugin>
  117. <groupId>org.apache.maven.plugins</groupId>
  118. <artifactId>maven-compiler-plugin</artifactId>
  119. <version>3.1</version>
  120. <configuration>
  121. <source>1.6</source>
  122. <target>1.6</target>
  123. <encoding>UTF-8</encoding>
  124. <showDeprecation>true</showDeprecation>
  125. <showWarnings>true</showWarnings>
  126. </configuration>
  127. </plugin>
  128. <plugin>
  129. <artifactId>maven-assembly-plugin</artifactId>
  130. <configuration>
  131. <descriptorRefs>
  132. <descriptorRef>jar-with-dependencies</descriptorRef>
  133. </descriptorRefs>
  134. <archive>
  135. <manifest>
  136. <mainClass></mainClass>
  137. </manifest>
  138. </archive>
  139. </configuration>
  140. <executions>
  141. <execution>
  142. <id>make-assembly</id>
  143. <phase>package</phase>
  144. <goals>
  145. <goal>single</goal>
  146. </goals>
  147. </execution>
  148. </executions>
  149. </plugin>
  150. </plugins>
  151. </build>
  152. </project>

3 kafkaspout的消费逻辑,修改MessageScheme类,其中定义了俩个字段,key和message,方便分发到kafkabolt。代码如下

  1. package com.tony.storm_kafka.util;
  2.  
  3. import java.io.UnsupportedEncodingException;
  4. import java.util.List;
  5.  
  6. import backtype.storm.spout.Scheme;
  7. import backtype.storm.tuple.Fields;
  8. import backtype.storm.tuple.Values;
  9.  
  10. /*
  11. *author: hi
  12. *public class MessageScheme{ }
  13. **/
  14. public class MessageScheme implements Scheme {
  15.  
  16. @Override
  17. public List<Object> deserialize(byte[] arg0) {
  18. try{
  19. String msg = new String(arg0, "UTF-8");
  20. String msg_0 = "hello";
  21. return new Values(msg_0,msg);
  22. }
  23. catch (UnsupportedEncodingException e) {
  24. // TODO: handle exception
  25. e.printStackTrace();
  26. }
  27. return null;
  28. }
  29.  
  30. @Override
  31. public Fields getOutputFields() {
  32.  
  33. return new Fields("key","message");
  34. }
  35.  
  36. }

4.编写topology主类,配置kafka,提交topology到storm的代码,其中kafkaspout的zkhost有动态和静态俩种配置,尽量使用动态自寻的方式。

  1. package org.tony.storm_kafka.common;
  2.  
  3. import backtype.storm.Config;
  4. import backtype.storm.LocalCluster;
  5. import backtype.storm.StormSubmitter;
  6. import backtype.storm.generated.AlreadyAliveException;
  7. import backtype.storm.generated.InvalidTopologyException;
  8. import backtype.storm.generated.StormTopology;
  9. import backtype.storm.spout.SchemeAsMultiScheme;
  10. import backtype.storm.topology.BasicOutputCollector;
  11. import backtype.storm.topology.OutputFieldsDeclarer;
  12. import backtype.storm.topology.TopologyBuilder;
  13. import backtype.storm.topology.base.BaseBasicBolt;
  14. import backtype.storm.tuple.Tuple;
  15. import storm.kafka.BrokerHosts;
  16. import storm.kafka.KafkaSpout;
  17. import storm.kafka.SpoutConfig;
  18. import storm.kafka.ZkHosts;
  19. import storm.kafka.trident.TridentKafkaState;
  20.  
  21. import java.util.Arrays;
  22. import java.util.Properties;
  23. import org.tony.storm_kafka.bolt.ToKafkaBolt;
  24. import com.tony.storm_kafka.util.MessageScheme;
  25.  
  26. public class KafkaBoltTestTopology {
  27.  
  28. //配置kafka spout参数
  29. public static String kafka_zk_port = null;
  30. public static String topic = null;
  31. public static String kafka_zk_rootpath = null;
  32. public static BrokerHosts brokerHosts;
  33. public static String spout_name = "spout";
  34. public static String kafka_consume_from_start = null;
  35.  
  36. public static class PrinterBolt extends BaseBasicBolt {
  37.  
  38. /**
  39. *
  40. */
  41. private static final long serialVersionUID = 9114512339402566580L;
  42.  
  43. // @Override
  44. public void declareOutputFields(OutputFieldsDeclarer declarer) {
  45. }
  46.  
  47. // @Override
  48. public void execute(Tuple tuple, BasicOutputCollector collector) {
  49. System.out.println("-----"+(tuple.getValue(1)).toString());
  50. }
  51.  
  52. }
  53.  
  54. public StormTopology buildTopology(){
  55. //kafkaspout 配置文件
  56. kafka_consume_from_start = "true";
  57. kafka_zk_rootpath = "/kafka08";
  58. String spout_id = spout_name;
  59. brokerHosts = new ZkHosts("192.168.201.190:2191,192.168.201.191:2191,192.168.201.192:2191", kafka_zk_rootpath+"/brokers");
  60. kafka_zk_port = "2191";
  61.       
  62. SpoutConfig spoutConf = new SpoutConfig(brokerHosts, "testfromkafka", kafka_zk_rootpath, spout_id);
  63. spoutConf.scheme = new SchemeAsMultiScheme(new MessageScheme());
  64. spoutConf.zkPort = Integer.parseInt(kafka_zk_port);
  65. spoutConf.zkRoot = kafka_zk_rootpath;
  66. spoutConf.zkServers = Arrays.asList(new String[] {"10.9.201.190", "10.9.201.191", "10.9.201.192"});
  67.  
  68. //是否從kafka第一條數據開始讀取
  69. if (kafka_consume_from_start == null) {
  70. kafka_consume_from_start = "false";
  71. }
  72. boolean kafka_consume_frome_start_b = Boolean.valueOf(kafka_consume_from_start);
  73. if (kafka_consume_frome_start_b != true && kafka_consume_frome_start_b != false) {
  74. System.out.println("kafka_comsume_from_start must be true or false!");
  75. }
  76. System.out.println("kafka_consume_from_start: " + kafka_consume_frome_start_b);
  77. spoutConf.forceFromStart=kafka_consume_frome_start_b;
  78.  
  79. TopologyBuilder builder = new TopologyBuilder();
  80. builder.setSpout("spout", new KafkaSpout(spoutConf));
  81. builder.setBolt("forwardToKafka", new ToKafkaBolt<String, String>()).shuffleGrouping("spout");
  82. return builder.createTopology();
  83. }
  84.  
  85. public static void main(String[] args) {
  86.  
  87. KafkaBoltTestTopology kafkaBoltTestTopology = new KafkaBoltTestTopology();
  88. StormTopology stormTopology = kafkaBoltTestTopology.buildTopology();
  89.  
  90. Config conf = new Config();
  91. //设置kafka producer的配置
  92. Properties props = new Properties();
  93. props.put("metadata.broker.list", "192.10.43.150:9092");
  94. props.put("producer.type","async");
  95. props.put("request.required.acks", "0"); // 0 ,-1 ,1
  96. props.put("serializer.class", "kafka.serializer.StringEncoder");
  97. conf.put(TridentKafkaState.KAFKA_BROKER_PROPERTIES, props);
  98. conf.put("topic","testTokafka");
  99.  
  100. if(args.length > 0){
  101. // cluster submit.
  102. try {
  103. StormSubmitter.submitTopology("kafkaboltTest", conf, stormTopology);
  104. } catch (AlreadyAliveException e) {
  105. e.printStackTrace();
  106. } catch (InvalidTopologyException e) {
  107. e.printStackTrace();
  108. }
  109. }else{
  110. new LocalCluster().submitTopology("kafkaboltTest", conf, stormTopology);
  111. }
  112.  
  113. }
  114. }

5 示例结果,testfromkafka topic里面的数据可以通过另外写个类来进行持续的生产。

  topic testfromkafka的数据

  topic testTokafka的数据

6 补充ToKfakaBolt,集成基础的Bolt类,主要改写Excute,同时加上Ack机制。

  1. import java.util.Map;
  2. import java.util.Properties;
  3.  
  4. import kafka.javaapi.producer.Producer;
  5. import kafka.producer.KeyedMessage;
  6. import kafka.producer.ProducerConfig;
  7.  
  8. import org.slf4j.Logger;
  9. import org.slf4j.LoggerFactory;
  10.  
  11. import storm.kafka.bolt.mapper.FieldNameBasedTupleToKafkaMapper;
  12. import storm.kafka.bolt.mapper.TupleToKafkaMapper;
  13. import storm.kafka.bolt.selector.KafkaTopicSelector;
  14. import storm.kafka.bolt.selector.DefaultTopicSelector;
  15. import backtype.storm.task.OutputCollector;
  16. import backtype.storm.task.TopologyContext;
  17. import backtype.storm.topology.OutputFieldsDeclarer;
  18. import backtype.storm.topology.base.BaseRichBolt;
  19. import backtype.storm.tuple.Tuple;
  20.  
  21. /*
  22. *author: yue
  23. *public class ToKafkaBolt{ }
  24. **/
  25. public class ToKafkaBolt<K,V> extends BaseRichBolt{
  26. private static final Logger Log = LoggerFactory.getLogger(ToKafkaBolt.class);
  27.  
  28. public static final String TOPIC = "topic";
  29. public static final String KAFKA_BROKER_PROPERTIES = "kafka.broker.properties";
  30.  
  31. private Producer<K, V> producer;
  32. private OutputCollector collector;
  33. private TupleToKafkaMapper<K, V> Mapper;
  34. private KafkaTopicSelector topicselector;
  35.  
  36. public ToKafkaBolt<K,V> withTupleToKafkaMapper(TupleToKafkaMapper<K, V> mapper){
  37. this.Mapper = mapper;
  38. return this;
  39. }
  40.  
  41. public ToKafkaBolt<K, V> withTopicSelector(KafkaTopicSelector topicSelector){
  42. this.topicselector = topicSelector;
  43. return this;
  44. }
  45.  
  46. @Override
  47. public void prepare(Map stormConf, TopologyContext context,
  48. OutputCollector collector) {
  49.  
  50. if (Mapper == null) {
  51. this.Mapper = new FieldNameBasedTupleToKafkaMapper<K, V>();
  52. }
  53.  
  54. if (topicselector == null) {
  55. this.topicselector = new DefaultTopicSelector((String)stormConf.get(TOPIC));
  56. }
  57.  
  58. Map configMap = (Map) stormConf.get(KAFKA_BROKER_PROPERTIES);
  59. Properties properties = new Properties();
  60. properties.putAll(configMap);
  61. ProducerConfig config = new ProducerConfig(properties);
  62. producer = new Producer<K, V>(config);
  63. this.collector = collector;
  64. }
  65.  
  66. @Override
  67. public void execute(Tuple input) {
  68. // String iString = input.getString(0);
  69.  
  70. K key = null;
  71. V message = null;
  72. String topic = null;
  73.  
  74. try {
  75.  
  76. key = Mapper.getKeyFromTuple(input);
  77. message = Mapper.getMessageFromTuple(input);
  78. topic = topicselector.getTopic(input);
  79. if (topic != null) {
  80. producer.send(new KeyedMessage<K, V>(topic,message));
  81.  
  82. }else {
  83. Log.warn("skipping key = "+key+ ",topic selector returned null.");
  84. }
  85.  
  86. } catch ( Exception e) {
  87. // TODO: handle exception
  88. Log.error("Could not send message with key = " + key
  89. + " and value = " + message + " to topic = " + topic, e);
  90. }finally{
  91. collector.ack(input);
  92. }
  93. }
  94.  
  95. @Override
  96. public void declareOutputFields(OutputFieldsDeclarer declarer) {
  97. }
  98.  
  99. }

作 者:小闪电

出处:http://www.cnblogs.com/yueyanyu/

本文版权归作者和博客园共有,欢迎转载、交流,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文链接。如果觉得本文对您有益,欢迎点赞、欢迎探讨。本博客来源于互联网的资源,若侵犯到您的权利,请联系博主予以删除。


storm集成kafka的应用,从kafka读取,写入kafka的更多相关文章

  1. Storm集成Kafka应用的开发

    我们知道storm的作用主要是进行流式计算,对于源源不断的均匀数据流流入处理是非常有效的,而现实生活中大部分场景并不是均匀的数据流,而是时而多时而少的数据流入,这种情况下显然用批量处理是不合适的,如果 ...

  2. Storm集成Kafka编程模型

    原创文章,转载请注明: 转载自http://www.cnblogs.com/tovin/p/3974417.html 本文主要介绍如何在Storm编程实现与Kafka的集成 一.实现模型 数据流程: ...

  3. 5、Storm集成Kafka

    1.pom文件依赖 <!--storm相关jar --> <dependency> <groupId>org.apache.storm</groupId> ...

  4. Flume 读取RabbitMq消息队列消息,并将消息写入kafka

    首先是关于flume的基础介绍 组件名称 功能介绍 Agent代理 使用JVM 运行Flume.每台机器运行一个agent,但是可以在一个agent中包含多个sources和sinks. Client ...

  5. Spark(二十一)【SparkSQL读取Kudu,写入Kafka】

    目录 SparkSQL读取Kudu,写出到Kafka 1. pom.xml 依赖 2.将KafkaProducer利用lazy val的方式进行包装, 创建KafkaSink 3.利用广播变量,将Ka ...

  6. Springboot集成mybatis(mysql),mail,mongodb,cassandra,scheduler,redis,kafka,shiro,websocket

    https://blog.csdn.net/a123demi/article/details/78234023  : Springboot集成mybatis(mysql),mail,mongodb,c ...

  7. Kafka设计解析(十八)Kafka与Flink集成

    转载自 huxihx,原文链接 Kafka与Flink集成 Apache Flink是新一代的分布式流式数据处理框架,它统一的处理引擎既可以处理批数据(batch data)也可以处理流式数据(str ...

  8. spark读取 kafka nginx网站日志消息 并写入HDFS中(转)

    原文链接:spark读取 kafka nginx网站日志消息 并写入HDFS中 spark 版本为1.0 kafka 版本为0.8 首先来看看kafka的架构图 详细了解请参考官方 我这边有三台机器用 ...

  9. Mysql增量写入Hdfs(一) --将Mysql数据写入Kafka Topic

    一. 概述 在大数据的静态数据处理中,目前普遍采用的是用Spark+Hdfs(Hive/Hbase)的技术架构来对数据进行处理. 但有时候有其他的需求,需要从其他不同数据源不间断得采集数据,然后存储到 ...

随机推荐

  1. Matlab forward Euler demo

    % forward Euler demo % take two steps in the solution of % dy/dt = y, y(0) = 1 % exact solution is y ...

  2. Web应用架构入门之11个基本要素

    译者: 读完这篇博客,你就可以回答一个经典的面试题:当你访问Google时,到底发生了什么? 原文:Web Architecture 101 译者:Fundebug 为了保证可读性,本文采用意译而非直 ...

  3. jQuery中是事件绑定方式--on、bind、live、delegate

    概述:jQuery是我们最常用的js库,对于事件的绑定也是有很多种,on.one.live.bind.delegate等等,接下来我们逐一来进行讲解. 本片文章中事件所带的为版本号,例:v1.7+为1 ...

  4. 微信小程序地图报错——ret is not defined

    刚刚在使用微信的map做地图时候 发现如下报错: 后来找了一会发现错误为经纬度写反了导致经纬度超出了范围 正确取值范围: latitude   纬度  浮点数,范围 -90 ~ 90 longitud ...

  5. mysql 之库, 表的简易操作

    一. 库的操作 1.创建数据库 创建数据库: create database 库名 charset utf8;   charset uft8  可选项 1.2 数据库命名规范: 可以由字母.数字.下划 ...

  6. SWFUpload多文件上传使用指南

    SWFUpload是一个flash和js相结合而成的文件上传插件,其功能非常强大.以前在项目中用过几次,但它的配置参数太多了,用过后就忘记怎么用了,到以后要用时又得到官网上看它的文档,真是太烦了.所以 ...

  7. EasyUI tabs指定要显示的tab

    <div id="DivBox"  class="easyui-tabs" style="width: 100%; height: 100%;& ...

  8. jQ 移动端返回顶部代码整理

    //返回顶部 $('#btn-scroll').on('touchend',function(){ var T = $(window).scrollTop(); var t = setInterval ...

  9. Django admin 的模仿流程

  10. 使用hint优化Oracle的运行计划 以及 SQL Tune Advisor的使用

    背景: 某表忽然出现查询很缓慢的情况.cost 100+ 秒以上:严重影响生产. 原SQL: explain plan for select * from ( select ID id,RET_NO ...