【Storm】storm安装、配置、使用以及Storm单词计数程序的实例分析
前言:阅读笔记
storm jar all-my-code.jar backtype.storm.MyTopology arg1 arg2
storm jar负责连接nimbus而且上传jar。
spouts和bolts是执行应用逻辑要实现的接口。
watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvc2ltb25jaGk=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="">
watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvc2ltb25jaGk=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="">
下载安装
http://storm.apache.org/downloads.html
1、依赖安装
2、zk集群
http://blog.csdn.net/simonchi/article/details/43019401
3、zeromq&jzmq
4、python
配置执行
storm.yaml
watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvc2ltb25jaGk=/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="">
单词计数程序
Spout
package com.cmcc.chiwei.storm; import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.util.Map;
import backtype.storm.spout.SpoutOutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseRichSpout;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Values; public class WordReader extends BaseRichSpout { private static final long serialVersionUID = 1L;
private SpoutOutputCollector collector;
private FileReader fileReader;
private boolean completed = false;
public boolean isDistributed() {
return false;
}
public void ack(Object msgId) {
System.out.println("OK:"+msgId);
}
public void close() {}
public void fail(Object msgId) {
System.out.println("FAIL:"+msgId);
} /**
* The only thing that the methods will do It is emit each
* file line
*/
public void nextTuple() {
if(completed){
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
//Do nothing
}
return;
}
String str;
//Open the reader
BufferedReader reader = new BufferedReader(fileReader);
try{
//Read all lines
while((str = reader.readLine()) != null){
/**
* By each line emmit a new value with the line as a their
*/
this.collector.emit(new Values(str),str);
}
}catch(Exception e){
throw new RuntimeException("Error reading tuple",e);
}finally{
completed = true;
}
} /**
* We will create the file and get the collector object
*/
public void open(Map conf, TopologyContext context,
SpoutOutputCollector collector) {
try {
this.fileReader = new FileReader(conf.get("wordsFile").toString());
} catch (FileNotFoundException e) {
throw new RuntimeException("Error reading file ["+conf.get("wordFile")+"]");
}
this.collector = collector;
} /**
* Declare the output field "word"
*/
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("line"));
}
}
Bolt1
package com.cmcc.chiwei.storm; import backtype.storm.topology.BasicOutputCollector;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseBasicBolt;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values; public class WordNormalizer extends BaseBasicBolt { private static final long serialVersionUID = 1L; public void cleanup() {} /**
* The bolt will receive the line from the
* words file and process it to Normalize this line
*
* The normalize will be put the words in lower case
* and split the line to get all words in this
*/
public void execute(Tuple input, BasicOutputCollector collector) {
String sentence = input.getString(0);
String[] words = sentence.split(" ");
for(String word : words){
word = word.trim();
if(!word.isEmpty()){
word = word.toLowerCase();
collector.emit(new Values(word));
}
}
} /**
* The bolt will only emit the field "word"
*/
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
}
Bolt2
package com.cmcc.chiwei.storm; import java.util.HashMap;
import java.util.Map; import backtype.storm.task.TopologyContext;
import backtype.storm.topology.BasicOutputCollector;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseBasicBolt;
import backtype.storm.tuple.Tuple; public class WordCounter extends BaseBasicBolt { private static final long serialVersionUID = 1L;
Integer id;
String name;
Map<String, Integer> counters; /**
* At the end of the spout (when the cluster is shutdown
* We will show the word counters
*/
@Override
public void cleanup() {
System.out.println("-- Word Counter ["+name+"-"+id+"] --");
for(Map.Entry<String, Integer> entry : counters.entrySet()){
System.out.println(entry.getKey()+": "+entry.getValue());
}
} /**
* On create
*/
@Override
public void prepare(Map stormConf, TopologyContext context) {
this.counters = new HashMap<String, Integer>();
this.name = context.getThisComponentId();
this.id = context.getThisTaskId();
} public void declareOutputFields(OutputFieldsDeclarer declarer) {} public void execute(Tuple input, BasicOutputCollector collector) {
String str = input.getString(0);
if(!counters.containsKey(str)){
counters.put(str, 1);
}else{
Integer c = counters.get(str) + 1;
counters.put(str, c);
}
}
}
Topology
package com.cmcc.chiwei.storm; import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.topology.TopologyBuilder;
import backtype.storm.tuple.Fields; public class TopologyMain {
public static void main(String[] args) throws InterruptedException { //Topology创建拓扑,安排storm各个节点以及它们交换数据的方式
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("word-reader",new WordReader());
builder.setBolt("word-normalizer", new WordNormalizer())
.shuffleGrouping("word-reader");
builder.setBolt("word-counter", new WordCounter(),1)
.fieldsGrouping("word-normalizer", new Fields("word")); //Configuration
Config conf = new Config();
conf.put("wordsFile", args[0]);
conf.setDebug(false);
//Topology run
conf.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 1);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("Getting-Started-Topology", conf, builder.createTopology());
Thread.sleep(2000);
cluster.shutdown();
}
}
words.txt
hello
world
storm flume hadoop hdfs
what's wrong flume ?
what's up hdfs ?
Hi,storm,what are you doing ?
执行结果
OK:hello
OK:world
OK:storm flume hadoop hdfs
OK:what's wrong flume ? OK:what's up hdfs ?
OK:Hi,storm,what are you doing ?
-- Word Counter [word-counter-2] --
what's: 2
flume: 2
hdfs: 2
you: 1
storm: 1
up: 1
hello: 1
hadoop: 1
hi,storm,what: 1
are: 1
doing: 1
wrong: 1
? : 3
world: 1
分析内容:
Spout
open --> nextTuple
Bolt1
declareOutputFields --> execute
Bolt2
prepare --> execute --> cleanup
更具体的内容,将在兴许慢慢解说,我也在研究中。
。。
。。
望各位不吝不吝赐教!。
【Storm】storm安装、配置、使用以及Storm单词计数程序的实例分析的更多相关文章
- Hadoop分布环境搭建步骤,及自带MapReduce单词计数程序实现
Hadoop分布环境搭建步骤: 1.软硬件环境 CentOS 7.2 64 位 JDK- 1.8 Hadoo p- 2.7.4 2.安装SSH sudo yum install openssh-cli ...
- 第一章 flex单词计数程序
学习Flex&Bison目标, 读懂SQLite中SQL解析部分代码 Flex&Bison简介Flex做词法分析Bison做语法分析 第一个Flex程序, wc.fl, 单词计数程序 ...
- ubuntu14.04LTS 下storm单机版安装配置
1.下载storm 的安装文件 http://www.apache.org/dyn/closer.cgi/incubator/storm/apache-storm-0.9.2-incubating/a ...
- storm的安装配置
一.安装Zookeeper 1.设置.profile文件: export ZOOKEEPER_HOME=/home/hadoop/streamdata/zookeeper-3.4.5-cdh4.5.0 ...
- 三:Storm设计一个Topology用来统计单词的TopN的实例
Storm的单词统计设计 一:Storm的wordCount和Hadoop的wordCount实例对比
- storm单机版安装配置
1,install zeromq 期间可能出现:configure: error: cannot link with -luuid, install uuid-dev. 因此可以先安装 sudo ap ...
- asp.Net Core免费开源分布式异常日志收集框架Exceptionless安装配置以及简单使用图文教程
最近在学习张善友老师的NanoFabric 框架的时了解到Exceptionless : https://exceptionless.com/ !因此学习了一下这个开源框架!下面对Exceptionl ...
- C#实现多级子目录Zip压缩解压实例 NET4.6下的UTC时间转换 [译]ASP.NET Core Web API 中使用Oracle数据库和Dapper看这篇就够了 asp.Net Core免费开源分布式异常日志收集框架Exceptionless安装配置以及简单使用图文教程 asp.net core异步进行新增操作并且需要判断某些字段是否重复的三种解决方案 .NET Core开发日志
C#实现多级子目录Zip压缩解压实例 参考 https://blog.csdn.net/lki_suidongdong/article/details/20942977 重点: 实现多级子目录的压缩, ...
- 【转】asp.Net Core免费开源分布式异常日志收集框架Exceptionless安装配置以及简单使用图文教程
最近在学习张善友老师的NanoFabric 框架的时了解到Exceptionless : https://exceptionless.com/ !因此学习了一下这个开源框架!下面对Exceptionl ...
随机推荐
- sublime的ctags安装
首先,是ctags的下载.在这里:http://pan.baidu.com/s/1gdAMFab 我们用sublime几乎都会首先安装这个插件,这个插件是管理插件的功能,先安装它,再安装其他插件就方便 ...
- DNS(域名系统)
DNS(Domain Name System),因特网上作为域名和IP地址相互映射的一个分布式数据库,能够使用户更方便的访问互联网,而不用去记住能够被机器直接读取的Ip数串.通过主机名,最终得到该主机 ...
- Appium + python - automator定位操作
# coding:utf-8from appium import webdriverfrom time import sleep desired_caps = { 'platformName': 'A ...
- maven添加本地jar包的方法
1.将一个本地的jar包随便放在一个放入本地文件夹中 (文件夹位置 和 jar包名称都随意) 例:F:\java\repository\a 文件夹下,名称为:icepdf-core-6.0.jar 2 ...
- JavaScript 原型和引用趣点
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <html> <head ...
- jQuery学习笔记之插件开发(4)
jQuery学习笔记之插件开发(4) github源码地址 插件:了让原有功能的增强. 1.插件的种类(3种):局部.全局.选择器插件 1.1封装对象方法的插件 这种类型的插件是把一些常用或者重复使用 ...
- jsp连接MySQL实现登录
1.下载驱动,并把jar包放到Tomcat的lib目录下 下载连接 2.把jar包添加到项目中 3.登录页面 <%@ page language="java" content ...
- CDC之fast->slow (2)
1 Open-loop solution One potential solution is to assert CDC signals for a period of time that excee ...
- Java_Web之神奇的Ajax
为什么使用Ajax? 无刷新:不刷新整个页面,只刷新局部 无刷新的好处 提供类似C/S的交互效果,操作更方面 只更新部分页面,有效利用带宽 什么是Ajax? XMLHttpRequest常用方 ...
- Ubuntu安装中文语言包
使用Ubuntu 默认的界面感觉不习惯,于是安装KDE界面. 1.安装kde 使用命令行: sudo apt-get install kubuntu-desktop 安装后发现不能使用中文, 在 se ...