一:介绍Storm设计模型

1.Topology

  Storm对任务的抽象,其实 就是将实时数据分析任务 分解为 不同的阶段    

  点: 计算组件   Spout   Bolt

  边: 数据流向    数据从上一个组件流向下一个组件  带方向

2.tuple

  Storm每条记录 封装成一个tuple

  其实就是一些keyvalue对按顺序排列

  方便组件获取数据

3.Spout

  数据采集器

  源源不断的日志记录  如何被topology接收进行处理?

  Spout负责从数据源上获取数据,简单处理 封装成tuple向后面的bolt发射

4.Bolt

  数据处理器

  

二:开发wordcount案例

1.书写整个大纲的点线图

  

2..程序结构

  

3.修改pom文件

  这个地方需要注意,在集群上的时候,这时候storm的包是有的,不要再打包,所以将provided打开。

  

 <?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion> <groupId>com.cj.it</groupId>
<artifactId>storm</artifactId>
<version>1.0-SNAPSHOT</version> <properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<hbase.version>0.98.6-cdh5.3.6</hbase.version>
<hdfs.version>2.5.0-cdh5.3.6</hdfs.version>
<storm.version>0.9.6</storm.version>
</properties> <repositories>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
<repository>
<id>alimaven</id>
<name>aliyun maven</name>
<url>http://maven.aliyun.com/nexus/content/groups/public/</url>
</repository>
</repositories> <dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
<version>${storm.version}</version>
<!-- IDEA执行要注释掉下面这行,打包时解开 -->
<!--<scope>provided</scope>--> </dependency>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-hbase</artifactId>
<version>${storm.version}</version>
<exclusions>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
</exclusion>
</exclusions>
</dependency> <dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>${hdfs.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>${hbase.version}</version>
<exclusions>
<exclusion>
<artifactId>slf4j-log4j12</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.zookeeper</groupId>
<artifactId>zookeeper</artifactId>
<version>3.4.6</version>
<exclusions>
<exclusion>
<artifactId>slf4j-log4j12</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-kafka</artifactId>
<version>${storm.version}</version>
<exclusions>
<exclusion>
<groupId>org.apache.zookeeper</groupId>
<artifactId>zookeeper</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.10</artifactId>
<version>0.8.1.1</version>
<exclusions>
<exclusion>
<groupId>org.apache.zookeeper</groupId>
<artifactId>zookeeper</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-all</artifactId>
<version>1.9.5</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>cz.mallat.uasparser</groupId>
<artifactId>uasparser</artifactId>
<version>0.6.1</version>
</dependency>
</dependencies> <build>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.3</version>
<configuration>
<source>1.7</source>
<target>1.7</target>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.4</version>
<configuration>
<descriptors>
<descriptor>src/main/assembly/src.xml</descriptor>
</descriptors>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id> <!-- this is used for inheritance merges -->
<phase>package</phase> <!-- bind to the packaging phase -->
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build> </project>

4.src.xml

 <?xml version="1.0" encoding="UTF-8"?>
<assembly
xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd">
<id>jar-with-dependencies</id>
<formats>
<format>jar</format>
</formats>
<includeBaseDirectory>false</includeBaseDirectory>
<dependencySets>
<dependencySet>
<unpack>false</unpack>
<scope>runtime</scope>
</dependencySet>
</dependencySets>
<fileSets>
<fileSet>
<directory>/lib</directory>
</fileSet>
</fileSets>
</assembly>

5.log

 log4j.rootLogger=info,console

 log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.layout=org.apache.log4j.SimpleLayout log4j.logger.com.ibeifeng=INFO

6.SentenceSpout.java

 package com.jun.it;

 import backtype.storm.spout.SpoutOutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.IRichSpout;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseRichSpout;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Values;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory; import java.util.Map;
import java.util.Random; public class SentenceSpout extends BaseRichSpout {
private static final Logger logger= LoggerFactory.getLogger(SentenceSpout.class);
private SpoutOutputCollector collector;
//制造数据
private static final String[] SENTENCES={
"hadoop oozie storm hive",
"hadoop spark sqoop hbase",
"error flume yarn mapreduce"
};
//初始化collector
@Override
public void open(Map map, TopologyContext topologyContext, SpoutOutputCollector spoutOutputCollector) {
this.collector=spoutOutputCollector;
}
//Key的设置
@Override
public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
outputFieldsDeclarer.declare(new Fields("sentence"));
}
//Tuple的组装
@Override
public void nextTuple() {
String sentence=SENTENCES[new Random().nextInt(SENTENCES.length)];
if(sentence.contains("error")){
logger.error("记录有问题"+sentence);
}else{
this.collector.emit(new Values(sentence));
}
try{
Thread.sleep(1000);
}catch (Exception e){
e.printStackTrace();
}
} public SentenceSpout() {
super();
} @Override
public void close() { } @Override
public void activate() {
super.activate();
} @Override
public void deactivate() {
super.deactivate();
} @Override
public void ack(Object msgId) {
super.ack(msgId);
} @Override
public void fail(Object msgId) {
super.fail(msgId);
} }

7.SplitBolt.java

 package com.jun.it;

 import backtype.storm.task.OutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.IRichBolt;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values; import java.util.Map; public class SplitBolt implements IRichBolt {
private OutputCollector collector;
@Override
public void prepare(Map map, TopologyContext topologyContext, OutputCollector outputCollector) {
this.collector=outputCollector;
} @Override
public void execute(Tuple tuple) {
String sentence=tuple.getStringByField("sentence");
if(sentence!=null&&!"".equals(sentence)){
String[] words=sentence.split(" ");
for (String word:words){
this.collector.emit(new Values(word));
}
}
} @Override
public void cleanup() { } @Override
public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
outputFieldsDeclarer.declare(new Fields("word"));
} @Override
public Map<String, Object> getComponentConfiguration() {
return null;
}
}

8.CountBolt.java

 package com.jun.it;

 import backtype.storm.task.OutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.IRichBolt;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values; import java.util.HashMap;
import java.util.Map; public class CountBolt implements IRichBolt {
private Map<String,Integer> counts;
private OutputCollector collector;
@Override
public void prepare(Map map, TopologyContext topologyContext, OutputCollector outputCollector) {
this.collector=outputCollector;
counts=new HashMap<>();
} @Override
public void execute(Tuple tuple) {
String word=tuple.getStringByField("word");
int count=1;
if(counts.containsKey(word)){
count=counts.get(word)+1;
}
counts.put(word,count);
this.collector.emit(new Values(word,count));
} @Override
public void cleanup() { } @Override
public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
outputFieldsDeclarer.declare(new Fields("word","count"));
} @Override
public Map<String, Object> getComponentConfiguration() {
return null;
}
}

9.printBolt.java

 package com.jun.it;

 import backtype.storm.task.OutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.IRichBolt;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.tuple.Tuple; import java.util.Map; public class PrintBolt implements IRichBolt {
@Override
public void prepare(Map map, TopologyContext topologyContext, OutputCollector outputCollector) { } @Override
public void execute(Tuple tuple) {
String word=tuple.getStringByField("word");
int count=tuple.getIntegerByField("count");
System.out.println("word:"+word+", count:"+count);
} @Override
public void cleanup() { } @Override
public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) { } @Override
public Map<String, Object> getComponentConfiguration() {
return null;
}
}

10.WordCountTopology.java

 package com.jun.it;

 import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.StormSubmitter;
import backtype.storm.generated.AlreadyAliveException;
import backtype.storm.generated.InvalidTopologyException;
import backtype.storm.topology.TopologyBuilder;
import backtype.storm.tuple.Fields; public class WordCountTopology {
private static final String SENTENCE_SPOUT="sentenceSpout";
private static final String SPLIT_BOLT="splitBolt";
private static final String COUNT_BOLT="countBolt";
private static final String PRINT_BOLT="printBolt";
public static void main(String[] args){
TopologyBuilder topologyBuilder=new TopologyBuilder();
topologyBuilder.setSpout(SENTENCE_SPOUT,new SentenceSpout());
topologyBuilder.setBolt(SPLIT_BOLT,new SplitBolt()).shuffleGrouping(SENTENCE_SPOUT);
topologyBuilder.setBolt(COUNT_BOLT,new CountBolt()).fieldsGrouping(SPLIT_BOLT,new Fields("word"));
topologyBuilder.setBolt(PRINT_BOLT,new PrintBolt()).globalGrouping(COUNT_BOLT);
Config config=new Config();
if(args==null||args.length==0){
LocalCluster localCluster=new LocalCluster();
localCluster.submitTopology("wordcount",config,topologyBuilder.createTopology());
}else{
config.setNumWorkers(1);
try {
StormSubmitter.submitTopology(args[0],config,topologyBuilder.createTopology());
} catch (AlreadyAliveException e) {
e.printStackTrace();
} catch (InvalidTopologyException e) {
e.printStackTrace();
}
} }
}

三:本地运行

1.前提

  原本以为需要启动storm,后来发现,不需要启动Storm。

  只需要在main的时候Run即可

2.结果

  

四:集群运行

1.在IDEA下打包

  下面的是有依赖的包。

 

2.上传到datas下

  

3.运行

   bin/storm jar /opt/datas/storm-1.0-SNAPSHOT-jar-with-dependencies.jar com.jun.it.WordCountTopology wordcount

  

4.UI效果

  

Storm中关于Topology的设计的更多相关文章

  1. Storm官方文档翻译之在生产环境集群中运行Topology

    在进群生产环境下运行Topology和在本地模式下运行非常相似.下面是步骤: 1.定义Topology(如果使用Java开发语言,则使用TopologyBuilder来创建) 2.使用StormSub ...

  2. 关于Storm 中Topology的并发度的理解

    来自:https://storm.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-topology.html htt ...

  3. Twitter Storm中Topology的状态

    Twitter Storm中Topology的状态 状态转换如下,Topology 的持久化状态包括: active, inactive, killed, rebalancing 四个状态. 代码上看 ...

  4. 【Storm篇】--Storm中的同步服务DRPC

    一.前述 Drpc(分布式远程过程调用)是一种同步服务实现的机制,在Storm中客户端提交数据请求之后,立刻取得计算结果并返回给客户端.同时充分利用Storm的计算能力实现高密度的并行实时计算. 二. ...

  5. Storm中遇到的日志多次重写问题(一)

    业务描述: 统计从kafka spout中读取的数据条数,以及写入redis的数据的条数,写入hdfs的数据条数,写入kafaka的数据条数.并且每过5秒将数据按照json文件的形式写入日志.其中保存 ...

  6. dotNET使用DRPC远程调用运行在Storm上的Topology

    Distributed RPC(DRPC)是Storm构建在Thrift协议上的RPC的实现,DRPC使得你可以通过多种语言远程的使用Storm集群的计算能力.DRPC并非Storm的基础特性,但它确 ...

  7. Storm中Spout使用注意事项小结

    Storm中Spout用于读取并向计算拓扑中发送数据源,最近在调试一个topology时遇到了系统qps低,处理速度达不到要求的问题,经过排查后发现是由于对Spout的使用模式不当导致的多线程同步等待 ...

  8. storm源码之理解Storm中Worker、Executor、Task关系 + 并发度详解

    本文导读: 1 Worker.Executor.task详解 2 配置拓扑的并发度 3 拓扑示例 4 动态配置拓扑并发度 Worker.Executor.Task详解: Storm在集群上运行一个To ...

  9. 【原】storm源码之理解Storm中Worker、Executor、Task关系

    Storm在集群上运行一个Topology时,主要通过以下3个实体来完成Topology的执行工作:1. Worker(进程)2. Executor(线程)3. Task 下图简要描述了这3者之间的关 ...

随机推荐

  1. 第5月第7天 php slim

    1. <?php require 'Slim/Slim.php'; require 'DBManagement.php'; \Slim\Slim::registerAutoloader(); $ ...

  2. 模板·点分治(luogu P3806)

    [模板]洛谷·点分治 1.求树的重心 树的重心:若A点的子树中最大的子树的size[] 最小时,A为该树的中心 步骤: 所需变量:siz[x] 表示 x 的子树大小(含自己),msz[x] 表示 其子 ...

  3. how tomcat works

    本文中只是提取了每个模块的关键部分,具体技术细节只能通过看代码来掌握. 1.socket .serversocket tcp通信 2.servlet init destory process(req, ...

  4. 使用sqlmap中tamper脚本绕过waf

    使用sqlmap中tamper脚本绕过waf 刘海哥 · 2015/02/02 11:26 0x00 背景 sqlmap中的tamper脚本来对目标进行更高效的攻击. 由于乌云知识库少了sqlmap- ...

  5. Python 入门基础16 -- ATM + 购物车

    ATM + 购物车 1.需求分析 2.设计程序以及程序的架构 设计程序的好处: - 扩展性强 - 逻辑清晰 3.分任务开发 4.测试 黑盒: 白盒: 对程序性能的测试 5.上线运行 # Tank -- ...

  6. jquery选择器最后一个,倒数第二个元素

    <div> <p>1</p> <p>2</p> <p>3</p> <p>4</p> < ...

  7. 推荐系统之协同过滤的原理及C++实现

    1.引言 假如你经营着一家网店,里面卖各种商品(Items),有很多用户在你的店里面买过东西,并对买过的Items进行了评分,我们称之为历史信息,现在为了提高销售量,必须主动向用户推销产品,所以关键是 ...

  8. Docker相关

    1.理念 通过对应用组件的封装.分发.部署.运行等生命周期的管理,使用户的App(可以是一个Web应用或数据库应用等)及其运行环境能够做到“一次封装,处处运行”. 2.一句话总结 解决运行环境和配置问 ...

  9. Java探针-Java Agent技术-阿里面试题

    Java探针参考:Java探针技术在应用安全领域的新突破 最近面试阿里,面试官先是问我类加载的流程,然后问了个问题,能否在加载类的时候,对字节码进行修改 我懵逼了,答曰不知道,面试官说可以的,使用Ja ...

  10. gunicorn+flask使用与配置

    gun.conf的内容 import os bind = '10.1.240.222:5000' workers = 4 backlog = 2048 worker_class = "syn ...