消息通过各种方式进入到Kafka消息中间件,比如可以通过使用Flume来收集日志数据,然后在Kafka中路由暂存,然后再由实时计算程序Storm做实时分析,最后将结果保存在HDFS中,这时我们就需要将在Storm的Spout中读取Kafka中的消息,然后交由具体的Spot组件去分析处理。下面开发一个简单WordCount示例程序,从Kafka读取订阅的消息行,通过空格拆分出单个单词,然后再做词频统计计算,最后将结果保存至HDFS。

1. kafka程序

package com.dxss.storm;

import kafka.producer.KeyedMessage;
import kafka.javaapi.producer.Producer;
import kafka.producer.ProducerConfig;
import java.util.Properties; /**
* Created by hadoop on 2017/7/29.
*/
public class KakfaProceduer {
public static void main(String[] args) throws InterruptedException{
Properties properties = new Properties();
properties.put("zookeeper.connect","12.13.41.41:2182");
properties.put("metadata.broker.list","12.13.41.41:9092");
properties.put("serializer.class","kafka.serializer.StringEncoder");
ProducerConfig producerConfig = new ProducerConfig(properties);
Producer producer = new Producer<String, String>(producerConfig);
KeyedMessage<String,String> keyedMessage = new KeyedMessage<String, String>("sunfei","I am a chinese");
while(true){
producer.send(keyedMessage);
}
}
}

2. Storm--spout

package com.dxss.stormkafka;

import java.util.Map;

import backtype.storm.task.OutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseRichBolt;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values;
import scala.collection.mutable.ArrayBuilder.ofBoolean; public class PrinterBolt extends BaseRichBolt{ private OutputCollector collector;
@Override
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
this.collector = collector;
} @Override
public void execute(Tuple input) {
// System.out.println("kafak所发布的消息为"+input.getString(0));
String value = input.getString();
String[] strings = value.split(" ");
for(String var : strings){
collector.emit(new Values(var,));
}
} @Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word", "num")); } }

3. Storm-bolt

package com.dxss.stormkafka;

import java.util.HashMap;
import java.util.Map; import backtype.storm.task.OutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseRichBolt;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values; public class CountBolt extends BaseRichBolt{ private OutputCollector collector;
private Map<String,Integer> map = new HashMap(); @Override
public void execute(Tuple arg0) {
// TODO Auto-generated method stub
String word = arg0.getString();
Integer num = arg0.getInteger();
if (map.containsKey(word)){
Integer number = map.get(word);
map.put(word,number+);
}else{
map.put(word, num);
}
String result = word + ":" + map.get(word);
this.collector.emit(new Values(result));
} @Override
public void prepare(Map arg0, TopologyContext arg1, OutputCollector arg2) {
this.collector = arg2;
} @Override
public void declareOutputFields(OutputFieldsDeclarer arg0) {
arg0.declare(new Fields("result")); } }

4. Storm--Topology

package com.dxss.stormkafka;

import org.apache.storm.hdfs.bolt.HdfsBolt;
import org.apache.storm.hdfs.bolt.format.DefaultFileNameFormat;
import org.apache.storm.hdfs.bolt.format.DelimitedRecordFormat;
import org.apache.storm.hdfs.bolt.format.FileNameFormat;
import org.apache.storm.hdfs.bolt.format.RecordFormat;
import org.apache.storm.hdfs.bolt.rotation.FileRotationPolicy;
import org.apache.storm.hdfs.bolt.rotation.FileSizeRotationPolicy;
import org.apache.storm.hdfs.bolt.sync.CountSyncPolicy;
import org.apache.storm.hdfs.bolt.sync.SyncPolicy;
import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.generated.StormTopology;
import backtype.storm.spout.SchemeAsMultiScheme;
import backtype.storm.topology.TopologyBuilder;
import backtype.storm.tuple.Fields;
import storm.kafka.BrokerHosts;
import storm.kafka.KafkaSpout;
import storm.kafka.SpoutConfig;
import storm.kafka.StringScheme;
import storm.kafka.ZkHosts; public class StormKafka { public static StormTopology createTopology(){
BrokerHosts boBrokerHosts = new ZkHosts("12.13.41.41:2182");
String topic = "sunfei";
String zkRoot = "/sunfei";
String spoutId = "sunfei_storm";
SpoutConfig spoutConfig = new SpoutConfig(boBrokerHosts, topic, zkRoot, spoutId);
spoutConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
// 输出字段分隔符
RecordFormat format = new DelimitedRecordFormat().withFieldDelimiter(":");
// 每1000个tuple同步到hdfs一次
SyncPolicy syncPolicy = new CountSyncPolicy();
// 每个文件的大小为100M
FileRotationPolicy policy = new FileSizeRotationPolicy(100.0f,FileSizeRotationPolicy.Units.MB);
// 设置输出目录
FileNameFormat fileNameFormat = new DefaultFileNameFormat().withPath("/test/storm")
.withPrefix("storm_").withExtension(".txt");
// 执行HDFS地址
HdfsBolt hdfsBolt = new HdfsBolt().withFsUrl("hdfs://12.13.41.43:8020")
.withFileNameFormat(fileNameFormat)
.withRecordFormat(format)
.withRotationPolicy(policy)
.withSyncPolicy(syncPolicy); TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("words", new KafkaSpout(spoutConfig));
builder.setBolt("print", new PrinterBolt()).shuffleGrouping("words");
builder.setBolt("totalBolt", new CountBolt()).fieldsGrouping("print", new Fields("word"));
builder.setBolt("hdfsBolt", hdfsBolt).shuffleGrouping("totalBolt"); return builder.createTopology(); }
public static void main(String[] args) throws InterruptedException { Config config = new Config();
config.put(Config.TOPOLOGY_TRIDENT_BATCH_EMIT_INTERVAL_MILLIS, );
StormTopology topology = StormKafka.createTopology();
config.setNumWorkers();
config.setMaxTaskParallelism();
//提交集群模式
// try {
// StormSubmitter.submitTopology("mywordcount", config, topology);
// } catch (AlreadyAliveException e) {
// // TODO Auto-generated catch block
// e.printStackTrace();
// } catch (InvalidTopologyException e) {
// // TODO Auto-generated catch block
// e.printStackTrace();
// }
//本地运行模式
LocalCluster localCluster = new LocalCluster();
localCluster.submitTopology("kafkastorm", config, topology);
Thread.sleep(); } }

5. 项目pom.xml文件

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.</modelVersion>
<groupId>com.kafka</groupId>
<artifactId>StormDemo</artifactId>
<version>0.0.-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
<version>0.9.</version>
<!--<scope>provided</scope>-->
</dependency>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-kafka</artifactId>
<version>0.9.</version>
<!--<scope>provided</scope>-->
</dependency>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.5</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.</version>
<exclusions>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.</version>
<exclusions>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-hdfs</artifactId>
<version>0.10.</version>
<exclusions>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>jdk.tools</groupId>
<artifactId>jdk.tools</artifactId>
<version>1.8</version>
<scope>system</scope>
<systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
</dependency> <dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
<version>2.5.</version>
</dependency>
<!-- storm-kafka模块需要的依赖 -->
<dependency>
<groupId>org.apache.curator</groupId>
<artifactId>curator-framework</artifactId>
<version>2.5.</version>
<exclusions>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-io</artifactId>
<version>1.3.</version>
</dependency>
<!-- kafka -->
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.</artifactId>
<version>0.8.1.1</version>
<exclusions>
<exclusion>
<groupId>org.apache.zookeeper</groupId>
<artifactId>zookeeper</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
<repositories>
<repository>
<id>central</id>
<url>http://repo1.maven.org/maven2/</url>
<snapshots>
<enabled>false</enabled>
</snapshots>
<releases>
<enabled>true</enabled>
</releases>
</repository>
<repository>
<id>clojars</id>
<url>https://clojars.org/repo/</url>
<snapshots>
<enabled>true</enabled>
</snapshots>
<releases>
<enabled>true</enabled>
</releases>
</repository>
<repository>
<id>scala-tools</id>
<url>http://scala-tools.org/repo-releases</url>
<snapshots>
<enabled>true</enabled>
</snapshots>
<releases>
<enabled>true</enabled>
</releases>
</repository>
<repository>
<id>conjars</id>
<url>http://conjars.org/repo/</url>
<snapshots>
<enabled>true</enabled>
</snapshots>
<releases>
<enabled>true</enabled>
</releases>
</repository>
</repositories>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>1.6</source>
<target>1.6</target>
<encoding>UTF-</encoding>
<showDeprecation>true</showDeprecation>
<showWarnings>true</showWarnings>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass></mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build> </project>

Kafka+Storm+HDFS 整合示例的更多相关文章

  1. 大数据学习——kafka+storm+hdfs整合

    1 需求 kafka,storm,hdfs整合是流式数据常用的一套框架组合,现在 根据需求使用代码实现该需求 需求:应用所学技术实现,kafka接收随机句子,对接到storm中:使用storm集群统计 ...

  2. Kafka+Storm+HDFS整合实践

    在基于Hadoop平台的很多应用场景中,我们需要对数据进行离线和实时分析,离线分析可以很容易地借助于Hive来实现统计分析,但是对于实时的需求Hive就不合适了.实时应用场景可以使用Storm,它是一 ...

  3. [转载] Kafka+Storm+HDFS整合实践

    转载自http://www.tuicool.com/articles/NzyqAn 在基于Hadoop平台的很多应用场景中,我们需要对数据进行离线和实时分析,离线分析可以很容易地借助于Hive来实现统 ...

  4. Zookeeper+Kafka+Storm+HDFS实践

    Kafka是一种高吞吐量的分布式发布订阅消息系统,它可以处理消费者规模的网站中的所有动作流数据. Hadoop一般用在离线的分析计算中,而storm区别于hadoop,用在实时的流式计算中,被广泛用来 ...

  5. flume-ng+Kafka+Storm+HDFS 实时系统搭建

    转自:http://www.tuicool.com/articles/mMrQnu7 一 直以来都想接触Storm实时计算这块的东西,最近在群里看到上海一哥们罗宝写的Flume+Kafka+Storm ...

  6. 大数据架构:flume-ng+Kafka+Storm+HDFS 实时系统组合

    http://www.aboutyun.com/thread-6855-1-1.html 个人观点:大数据我们都知道hadoop,但并不都是hadoop.我们该如何构建大数据库项目.对于离线处理,ha ...

  7. [转]flume-ng+Kafka+Storm+HDFS 实时系统搭建

    http://blog.csdn.net/weijonathan/article/details/18301321 一直以来都想接触Storm实时计算这块的东西,最近在群里看到上海一哥们罗宝写的Flu ...

  8. 转:大数据架构:flume-ng+Kafka+Storm+HDFS 实时系统组合

    虽然比较久,但是这套架构已经很成熟了,记录一下 一般数据流向,从“数据采集--数据接入--流失计算--数据输出/存储”<ignore_js_op> 1).数据采集 负责从各节点上实时采集数 ...

  9. Flume+kafka+storm+hdfs

    摘自:http://www.aboutyun.com/thread-6855-1-1.html

随机推荐

  1. lkl风控.随机森林模型测试代码spark1.6

    /** * Created by lkl on 2017/10/9. */ import org.apache.spark.sql.hive.HiveContext import org.apache ...

  2. The difference between the request time and the current time is too large.阿里云oss上传图片报错

    The difference between the request time and the current time is too large. 阿里云oss上传图片的时候报错如上, 解决办法,把 ...

  3. 安装MySQL-python: EnvironmentError:mysql config not found

    1执行 sudo yum install python-devel 2 find / -name mysql_config 在/usr/bin/下发现了这个文件 3. 后修改MySQL-python- ...

  4. Linux 查看磁盘分区、文件系统、磁盘的使用情况相关的命令和工具介绍

    https://www.cnblogs.com/alexyuyu/articles/3454907.html

  5. MTK 预置apk

    一.如何将带源码的APK预置进系统? 1)     在 packages/apps 下面以需要预置的 APK的 名字创建一个新文件夹,以预置一个名为Test的APK 为例 2)     将 Test ...

  6. Connect to a ROS Network---2

    原创博文:转载请标明出处(周学伟):http://www.cnblogs.com/zxouxuewei/tag/ 一.Introduction ROS网络由单个ROS主机和多个ROS节点组成. ROS ...

  7. Eclipse------使用Debug As时报错java.lang.IllegalStateException: Failed to read Class-Path attribute from manifest of jar file:/XXX

    报错信息: java.lang.IllegalStateException: Failed to read Class-Path attribute from manifest of jar file ...

  8. 诡异的DataTime.Now.ToString()

    昨天晚上调程序的时候在服务器上出现这种问题 DataTime.Now.ToString("yyyy-MM-dd HH:mm:ss") 居然出现了2014-8-14 8:nn:14: ...

  9. 产品半夜发现bug让程序员加班,程序员应如何回应?

    群友半夜2点被产品叫起来改东西,然后他打车去公司加班,群友愤慨,特为此创作一个小段子 产品:XXX出bug了你赶紧看看 修复一下 开发:我宿舍没电脑 没网络啊 产品:你看看周围有没有网吧远程一下 或者 ...

  10. form enctype:"multipart/form-data",method:"post" 提交表单,后台获取不到数据

    在解决博问node.js接受参数的时候,发现当form中添加enctype:"multipart/form-data",后台确实获取不到数据,于是跑到百度上查了一下,终于明白为什么 ...