storm 的分组策略深入理解(-)
storm的分组策略
洗牌分组(Shuffle grouping): 随机分配元组到Bolt的某个任务上,这样保证同一个Bolt的每个任务都能够得到相同数量的元组。
字段分组(Fields grouping): 按照指定的分组字段来进行流的分组。例如,流是用字段“user-id"来分组的,那有着相同“user-id"的元组就会分到同一个任务里,但是有不同“user-id"的元组就会分到不同的任务里。这是一种非常重要的分组方式,通过这种流分组方式,我们就可以做到让Storm产出的消息在这个"user-id"级别是严格有序的,这对一些对时序敏感的应用(例如,计费系统)是非常重要的。
Partial Key grouping: 跟字段分组一样,流也是用指定的分组字段进行分组的,但是在多个下游Bolt之间是有负载均衡的,这样当输入数据有倾斜时可以更好的利用资源。这篇论文很好的解释了这是如何工作的,有哪些优势。
All grouping: 流会复制给Bolt的所有任务。小心使用这种分组方式。在拓扑中,如果希望某类元祖发送到所有的下游消费者,就可以使用这种All grouping的流分组策略。
Global grouping: 整个流会分配给Bolt的一个任务。具体一点,会分配给有最小ID的任务。
不分组(None grouping): 说明不关心流是如何分组的。目前,None grouping等价于洗牌分组。Direct grouping:一种特殊的分组。对于这样分组的流,元组的生产者决定消费者的哪个任务会接收处理这个元组。只能在声明做直连的流(direct streams)上声明Direct groupings分组方式。只能通过使用emitDirect系列函数来吐元组给直连流。一个Bolt可以通过提供的TopologyContext来获得消费者的任务ID,也可以通过OutputCollector对象的emit函数(会返回元组被发送到的任务的ID)来跟踪消费者的任务ID。在ack的实现中,Spout有两个直连输入流,ack和ackFail,使用了这种直连分组的方式。
Local or shuffle grouping:如果目标Bolt在同一个worker进程里有一个或多个任务,元组就会通过洗牌的方式分配到这些同一个进程内的任务里。否则,就跟普通的洗牌分组一样。这种方式的好处是可以提高拓扑的处理效率,因为worker内部通信就是进程内部通信了,相比拓扑间的进程间通信要高效的多。worker进程间通信是通过使用Netty来进行网络通信的。
根据实例来分析分组策略
common配置:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.sonly.strom</groupId>
<artifactId>strom-study</artifactId>
<version>1.0-SNAPSHOT</version>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>7</source>
<target>7</target>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass>com.sonly.storm.demo1.HelloToplogy</mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
<version>1.2.2</version>
<scope>provided</scope>
</dependency>
</dependencies>
</project>
Shuffle grouping
shuffle grouping的实例代码
package com.sonly.storm.demo1.grouppings.spout;
import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichSpout;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.Map;
import java.util.Random;
import java.util.concurrent.atomic.AtomicInteger;
/**
* <b>package:com.sonly.storm.demo1</b>
* <b>project(项目):stormstudy</b>
* <b>class(类)${CLASS_NAME}</b>
* <b>creat date(创建时间):2019-05-09 20:27</b>
* <b>author(作者):</b>xxydliuyss</br>
* <b>note(备注)):</b>
* If you want to change the file header,please modify zhe File and Code Templates.
*/
public class WordSpout extends BaseRichSpout {
public static final Logger LOGGER = LoggerFactory.getLogger(WordSpout.class);
//拓扑上下文
private TopologyContext context;
private SpoutOutputCollector collector;
private Map config;
private AtomicInteger atomicInteger = new AtomicInteger(0);
public void open(Map conf, TopologyContext topologyContext, SpoutOutputCollector collector) {
this.config = conf;
this.context = topologyContext;
this.collector = collector;
LOGGER.warn("WordSpout->open:hashcode:{}->ThreadId:{},TaskId:{}", this.hashCode(), Thread.currentThread().getId(), context.getThisTaskId());
}
public void nextTuple() {
String[] sentences = new String[]{"zhangsan","zhangsan","zhangsan","zhangsan","zhangsan","zhangsan","zhangsan","zhangsan","lisi","lisi"};
int i = atomicInteger.get();
if(i<10){
atomicInteger.incrementAndGet();
final String sentence = sentences[i];
collector.emit(new Values(sentence));
LOGGER.warn("WordSpout->nextTuple:hashcode:{}->ThreadId:{},TaskId:{},Values:{}", this.hashCode(), Thread.currentThread().getId(), context.getThisTaskId(), sentence);
}
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("sentence"));
}
}
bolt1
package com.sonly.storm.demo1.grouppings;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichBolt;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.HashMap;
import java.util.Map;
/**
* <b>package:com.sonly.storm.demo1</b>
* <b>project(项目):stormstudy</b>
* <b>class(类)${CLASS_NAME}</b>
* <b>creat date(创建时间):2019-05-09 21:19</b>
* <b>author(作者):</b>xxydliuyss</br>
* <b>note(备注)):</b>
* If you want to change the file header,please modify zhe File and Code Templates.
*/
public class SheffleGroupingBolt extends BaseRichBolt {
public static final Logger LOGGER = LoggerFactory.getLogger(SheffleGroupingBolt.class);
private TopologyContext context;
private Map conf;
private OutputCollector collector;
private Map<String,Integer> counts = new HashMap(16);
public void prepare(Map map, TopologyContext topologyContext, OutputCollector outputCollector) {
this.conf=map;
this.context = topologyContext;
this.collector = outputCollector;
LOGGER.warn("SheffleGroupingBolt->prepare:hashcode:{}->ThreadId:{},TaskId:{}",this.hashCode(),Thread.currentThread().getId(),context.getThisTaskId());
}
public void execute(Tuple tuple) {
String word = tuple.getString(0);
LOGGER.warn("SheffleGroupingBolt->execute:hashcode:{}->ThreadId:{},TaskId:{},value:{}",this.hashCode(),Thread.currentThread().getId(),context.getThisTaskId(),word);
collector.emit(new Values(word));
collector.ack(tuple);
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("bolt1"));
}
}
bolt
package com.sonly.storm.demo1.grouppings;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichBolt;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.Map;
/**
* <b>package:com.sonly.storm.demo1</b>
* <b>project(项目):stormstudy</b>
* <b>class(类)${CLASS_NAME}</b>
* <b>creat date(创建时间):2019-05-09 21:29</b>
* <b>author(作者):</b>xxydliuyss</br>
* <b>note(备注)):</b>
* If you want to change the file header,please modify zhe File and Code Templates.
*/
public class SheffleGrouppingBolt1 extends BaseRichBolt {
public static final Logger LOGGER = LoggerFactory.getLogger(SheffleGrouppingBolt1.class);
private TopologyContext context;
private Map conf;
private OutputCollector collector;
public void prepare(Map map, TopologyContext topologyContext, OutputCollector outputCollector) {
this.conf=map;
this.context = topologyContext;
this.collector = outputCollector;
LOGGER.warn("SheffleGrouppingBolt1->prepare:hashcode:{}->ThreadId:{},TaskId:{}",this.hashCode(),Thread.currentThread().getId(),context.getThisTaskId());
}
public void execute(Tuple tuple) {
String word = tuple.getStringByField("sentence");
LOGGER.warn("SheffleGroupingBolt1->execute:hashcode:{}->ThreadId:{},TaskId:{},value:{}",this.hashCode(),Thread.currentThread().getId(),context.getThisTaskId(),word);
collector.emit(new Values(word));
collector.ack(tuple);
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("bolt"));
}
}
topology
package com.sonly.storm.demo1.grouppings;
import com.sonly.storm.demo1.grouppings.spout.WordSpout;
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.StormSubmitter;
import org.apache.storm.generated.AlreadyAliveException;
import org.apache.storm.generated.AuthorizationException;
import org.apache.storm.generated.InvalidTopologyException;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.tuple.Fields;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
/**
* <b>package:com.sonly.storm.demo1</b>
* <b>project(项目):stormstudy</b>
* <b>class(类)${CLASS_NAME}</b>
* <b>creat date(创建时间):2019-05-09 21:55</b>
* <b>author(作者):</b>xxydliuyss</br>
* <b>note(备注)):</b>
* If you want to change the file header,please modify zhe File and Code Templates.
*/
public class ShuffleGroupingToplogy {
public static final Logger LOGGER = LoggerFactory.getLogger(ShuffleGroupingToplogy.class);
//Topology Name
//component prefix
//workers
//spout executor (parallelism_hint)
//spout task size
//bolt executor (parallelism_hint)
//bolt task size
public static void main(String[] args) throws InterruptedException {
TopologyBuilder builder = new TopologyBuilder();
Config conf = new Config();
conf.setDebug(true);
if (args==null || args.length < 7) {
conf.setNumWorkers(3);
builder.setSpout("spout", new WordSpout(), 4).setNumTasks(4);
builder.setBolt("split-bolt", new SheffleGrouppingBolt1(), 4).shuffleGrouping("spout").setNumTasks(8);
builder.setBolt("count-bolt", new SheffleGroupingBolt(), 8).fieldsGrouping("split-bolt", new Fields("word")).setNumTasks(8);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("word-count", conf, builder.createTopology());
Thread.sleep(10000);
cluster.killTopology("word-count");
cluster.shutdown();
}
else {
Options options = Options.builder(args);
LOGGER.warn("The Topology Options {} is Submited ",options.toString());
conf.setNumWorkers(options.getWorkers());
builder.setSpout(options.getPrefix()+"-spout", new WordSpout(), options.getSpoutParallelismHint()).setNumTasks(options.getSpoutTaskSize());
builder.setBolt("bolt1", new SheffleGrouppingBolt1(), options.getBoltParallelismHint()).shuffleGrouping(options.getPrefix()+"-spout").setNumTasks(options.getBoltTaskSize());
builder.setBolt("bolt", new SheffleGroupingBolt(), options.getBoltParallelismHint()).shuffleGrouping(options.getPrefix()+"-spout").setNumTasks(options.getBoltTaskSize());
try {
StormSubmitter.submitTopologyWithProgressBar(options.getTopologyName(), conf, builder.createTopology());
LOGGER.warn("===========================================================");
LOGGER.warn("The Topology {} is Submited ",options.getTopologyName());
LOGGER.warn("===========================================================");
} catch (AlreadyAliveException | InvalidTopologyException | AuthorizationException e) {
e.printStackTrace();
}
}
}
public static class Options{
private String topologyName;
private String prefix;
private Integer workers;
private Integer spoutParallelismHint;
private Integer spoutTaskSize;
private Integer boltParallelismHint;
private Integer boltTaskSize;
public Options(String topologyName, String prefix, Integer workers, Integer spoutParallelismHint, Integer spoutTaskSize, Integer boltParallelismHint, Integer boltTaskSize) {
this.topologyName = topologyName;
this.prefix = prefix;
this.workers = workers;
this.spoutParallelismHint = spoutParallelismHint;
this.spoutTaskSize = spoutTaskSize;
this.boltParallelismHint = boltParallelismHint;
this.boltTaskSize = boltTaskSize;
}
public static Options builder(String[] args){
return new Options(args[0],args[1],Integer.parseInt(args[2])
,Integer.parseInt(args[3]),Integer.parseInt(args[4]),Integer.parseInt(args[5]),Integer.parseInt(args[6])
);
}
public String getTopologyName() {
return topologyName;
}
public void setTopologyName(String topologyName) {
this.topologyName = topologyName;
}
public String getPrefix() {
return prefix;
}
public void setPrefix(String prefix) {
this.prefix = prefix;
}
public Integer getWorkers() {
return workers;
}
public void setWorkers(Integer workers) {
this.workers = workers;
}
public Integer getSpoutParallelismHint() {
return spoutParallelismHint;
}
public void setSpoutParallelismHint(Integer spoutParallelismHint) {
this.spoutParallelismHint = spoutParallelismHint;
}
public Integer getSpoutTaskSize() {
return spoutTaskSize;
}
public void setSpoutTaskSize(Integer spoutTaskSize) {
this.spoutTaskSize = spoutTaskSize;
}
public Integer getBoltParallelismHint() {
return boltParallelismHint;
}
public void setBoltParallelismHint(Integer boltParallelismHint) {
this.boltParallelismHint = boltParallelismHint;
}
public Integer getBoltTaskSize() {
return boltTaskSize;
}
public void setBoltTaskSize(Integer boltTaskSize) {
this.boltTaskSize = boltTaskSize;
}
@Override
public String toString() {
return "Options{" +
"topologyName='" + topologyName + '\'' +
", prefix='" + prefix + '\'' +
", workers=" + workers +
", spoutParallelismHint=" + spoutParallelismHint +
", spoutTaskSize=" + spoutTaskSize +
", boltParallelismHint=" + boltParallelismHint +
", boltTaskSize=" + boltTaskSize +
'}';
}
}
}
mvn package 打包,上传到storm服务器
ShuffleGrouping 样例分析
1)样例1
1.执行:
storm jar strom-study-1.0-SNAPSHOT-jar-with-dependencies.jar com.sonly.storm.demo1.grouppings.ShuffleGroupingToplogy ShuffleGrouping ShuffleGrouping 1 2 1 2 1
2.参数:
topologyName='ShuffleGrouping', prefix='ShuffleGrouping', workers=1, spoutParallelismHint=2, spoutTaskSize=1, boltParallelismHint=2, boltTaskSize=1
3.拓扑图:
一个spout接了两个bolt
4.查看一下这个bolt分布情况:
5.进入服务器去看每一个bolt的日志
2019-05-07 18:09:13.109 c.s.s.d.g.SheffleGrouppingBolt1 Thread-11-bolt1-executor[5 5] [WARN] SheffleGroupingBolt1->execute:hashcode:1393282516->ThreadId:45,TaskId:5,value:zhangsan
2019-05-07 18:09:13.110 c.s.s.d.g.SheffleGrouppingBolt1 Thread-11-bolt1-executor[5 5] [WARN] SheffleGroupingBolt1->execute:hashcode:1393282516->ThreadId:45,TaskId:5,value:zhangsan
2019-05-07 18:09:13.110 c.s.s.d.g.SheffleGrouppingBolt1 Thread-11-bolt1-executor[5 5] [WARN] SheffleGroupingBolt1->execute:hashcode:1393282516->ThreadId:45,TaskId:5,value:zhangsan
2019-05-07 18:09:13.111 c.s.s.d.g.SheffleGrouppingBolt1 Thread-11-bolt1-executor[5 5] [WARN] SheffleGroupingBolt1->execute:hashcode:1393282516->ThreadId:45,TaskId:5,value:zhangsan
2019-05-07 18:09:13.112 c.s.s.d.g.SheffleGrouppingBolt1 Thread-11-bolt1-executor[5 5] [WARN] SheffleGroupingBolt1->execute:hashcode:1393282516->ThreadId:45,TaskId:5,value:zhangsan
2019-05-07 18:09:13.115 c.s.s.d.g.SheffleGrouppingBolt1 Thread-11-bolt1-executor[5 5] [WARN] SheffleGroupingBolt1->execute:hashcode:1393282516->ThreadId:45,TaskId:5,value:zhangsan
2019-05-07 18:09:13.116 c.s.s.d.g.SheffleGrouppingBolt1 Thread-11-bolt1-executor[5 5] [WARN] SheffleGroupingBolt1->execute:hashcode:1393282516->ThreadId:45,TaskId:5,value:zhangsan
2019-05-07 18:09:13.117 c.s.s.d.g.SheffleGrouppingBolt1 Thread-11-bolt1-executor[5 5] [WARN] SheffleGroupingBolt1->execute:hashcode:1393282516->ThreadId:45,TaskId:5,value:zhangsan
2019-05-07 18:09:13.118 c.s.s.d.g.SheffleGrouppingBolt1 Thread-11-bolt1-executor[5 5] [WARN] SheffleGroupingBolt1->execute:hashcode:1393282516->ThreadId:45,TaskId:5,value:lisi
2019-05-07 18:09:13.119 c.s.s.d.g.SheffleGrouppingBolt1 Thread-11-bolt1-executor[5 5] [WARN] SheffleGroupingBolt1->execute:hashcode:1393282516->ThreadId:45,TaskId:5,value:lisi
6.进入另外一个bolt的日志 10条信息被处理了
2019-05-07 18:09:00.791 c.s.s.d.g.SheffleGroupingBolt Thread-9-bolt-executor[4 4] [WARN] SheffleGroupingBolt->execute:hashcode:1430296959->ThreadId:43,TaskId:4,value:zhangsan
2019-05-07 18:09:00.793 c.s.s.d.g.SheffleGroupingBolt Thread-9-bolt-executor[4 4] [WARN] SheffleGroupingBolt->execute:hashcode:1430296959->ThreadId:43,TaskId:4,value:zhangsan
2019-05-07 18:09:00.794 c.s.s.d.g.SheffleGroupingBolt Thread-9-bolt-executor[4 4] [WARN] SheffleGroupingBolt->execute:hashcode:1430296959->ThreadId:43,TaskId:4,value:zhangsan
2019-05-07 18:09:00.795 c.s.s.d.g.SheffleGroupingBolt Thread-9-bolt-executor[4 4] [WARN] SheffleGroupingBolt->execute:hashcode:1430296959->ThreadId:43,TaskId:4,value:zhangsan
2019-05-07 18:09:00.795 c.s.s.d.g.SheffleGroupingBolt Thread-9-bolt-executor[4 4] [WARN] SheffleGroupingBolt->execute:hashcode:1430296959->ThreadId:43,TaskId:4,value:zhangsan
2019-05-07 18:09:00.796 c.s.s.d.g.SheffleGroupingBolt Thread-9-bolt-executor[4 4] [WARN] SheffleGroupingBolt->execute:hashcode:1430296959->ThreadId:43,TaskId:4,value:zhangsan
2019-05-07 18:09:00.797 c.s.s.d.g.SheffleGroupingBolt Thread-9-bolt-executor[4 4] [WARN] SheffleGroupingBolt->execute:hashcode:1430296959->ThreadId:43,TaskId:4,value:zhangsan
2019-05-07 18:09:00.805 c.s.s.d.g.SheffleGroupingBolt Thread-9-bolt-executor[4 4] [WARN] SheffleGroupingBolt->execute:hashcode:1430296959->ThreadId:43,TaskId:4,value:zhangsan
2019-05-07 18:09:00.805 c.s.s.d.g.SheffleGroupingBolt Thread-9-bolt-executor[4 4] [WARN] SheffleGroupingBolt->execute:hashcode:1430296959->ThreadId:43,TaskId:4,value:lisi
2019-05-07 18:09:00.806 c.s.s.d.g.SheffleGroupingBolt Thread-9-bolt-executor[4 4] [WARN] SheffleGroupingBolt->execute:hashcode:1430296959->ThreadId:43,TaskId:4,value:lisi
也是一样10条被处理了
总结:
对于spout直接对接两个bolt,sheffgrouping 分组不会随机给两个bolt分配消息,而是全量发给两个BOlT
2)样例2
1.修改一下参数看一下:
topologyName='ShuffleGrouping1', prefix='ShuffleGrouping1', workers=2, spoutParallelismHint=1, spoutTaskSize=2, boltParallelismHint=2, boltTaskSize=2
总共4个bolt,两个spout,总共发送了40条消息,spout产生消息20条。transfer了40次。
看看4个bolt的消息分配的情况。
因为只有两个worker所以会有两个bolt在同一个work上,日志会打在一起,但是从名字可以可以区分开来,同样每个bolt都是10条。
2.修改拓扑结构为:
3.修改代码:
bolt
String word = tuple.getStringByField("bolt");
topoloy:
builder.setBolt("bolt1", new SheffleGrouppingBolt1(), 1).shuffleGrouping(options.getPrefix()+"-spout");
builder.setBolt("bolt", new SheffleGroupingBolt(), options.getBoltParallelismHint()).shuffleGrouping("bolt1").setNumTasks(options.getBoltTaskSize());
4.参数:
topologyName='ShuffleGrouping2', prefix='ShuffleGrouping2', workers=2, spoutParallelismHint=1, spoutTaskSize=1, boltParallelismHint=2, boltTaskSize=2
5.查看日志:k8s-n2 这个节点只有bolt bolt1这个节点在k8s-n3上
[root@k8s-n2 6706]# grep "SheffleGroupingBolt->execute" worker.log |wc -l
3
[root@k8s-n2 6706]# grep "SheffleGroupingBolt1->execute" worker.log |wc -l
0
[root@k8s-n3 6706]# grep "SheffleGroupingBolt->execute" worker.log |wc -l
7
[root@k8s-n3 6706]# grep "SheffleGroupingBolt1->execute" worker.log |wc -l
10
可以看出来bolt1->bolt这条线上的数据被随机分配了一个三条一个两条。
总结:
对于bolt 连接bolt的shuffingGrouping,消息是随机分配到多个bolt上面的
Fields grouping
Fields grouping 的实例
代码:
public class FeildGroupingToplogy {
public static final Logger LOGGER = LoggerFactory.getLogger(FeildGroupingToplogy.class);
//Topology Name
//component prefix
//workers
//spout executor (parallelism_hint)
//spout task size
//bolt executor (parallelism_hint)
//bolt task size
public static void main(String[] args) throws InterruptedException {
TopologyBuilder builder = new TopologyBuilder();
Config conf = new Config();
conf.setDebug(true);
Options options = Options.builder(args);
LOGGER.warn("The Topology Options {} is Submited ", options.toString());
conf.setNumWorkers(options.getWorkers());
String spoutName = options.getPrefix() + "-spout";
builder.setSpout(spoutName, new WordSpout(), options.getSpoutParallelismHint()).setNumTasks(options.getSpoutTaskSize());
builder.setBolt(options.getPrefix() + "bolt1", new FieldGrouppingBolt1(), options.getBoltParallelismHint()).fieldsGrouping(spoutName, new Fields("sentence")).setNumTasks(options.getBoltTaskSize());
builder.setBolt(options.getPrefix() + "bolt", new FieldGroupingBolt(), options.getBoltParallelismHint()).fieldsGrouping(spoutName, new Fields("sentence")).setNumTasks(options.getBoltTaskSize());
// builder.setBolt("bolt1", new FieldGrouppingBolt1(), 1).shuffleGrouping(options.getPrefix()+"-spout");
// builder.setBolt("bolt", new FieldGroupingBolt(), options.getBoltParallelismHint()).fieldsGrouping("bolt1",new Fields("bolt")).setNumTasks(options.getBoltTaskSize());
try {
StormSubmitter.submitTopologyWithProgressBar(options.getTopologyName(), conf, builder.createTopology());
LOGGER.warn("===========================================================");
LOGGER.warn("The Topology {} is Submited ", options.getTopologyName());
LOGGER.warn("===========================================================");
} catch (AlreadyAliveException | InvalidTopologyException | AuthorizationException e) {
e.printStackTrace();
}
}
public static class Options {
private String topologyName;
private String prefix;
private Integer workers;
private Integer spoutParallelismHint;
private Integer spoutTaskSize;
private Integer boltParallelismHint;
private Integer boltTaskSize;
public Options(String topologyName, String prefix, Integer workers, Integer spoutParallelismHint, Integer spoutTaskSize, Integer boltParallelismHint, Integer boltTaskSize) {
this.topologyName = topologyName;
this.prefix = prefix;
this.workers = workers;
this.spoutParallelismHint = spoutParallelismHint;
this.spoutTaskSize = spoutTaskSize;
this.boltParallelismHint = boltParallelismHint;
this.boltTaskSize = boltTaskSize;
}
public static Options builder(String[] args) {
return new Options(args[0], args[1], Integer.parseInt(args[2])
, Integer.parseInt(args[3]), Integer.parseInt(args[4]), Integer.parseInt(args[5]), Integer.parseInt(args[6])
);
}
public String getTopologyName() {
return topologyName;
}
public void setTopologyName(String topologyName) {
this.topologyName = topologyName;
}
public String getPrefix() {
return prefix;
}
public void setPrefix(String prefix) {
this.prefix = prefix;
}
public Integer getWorkers() {
return workers;
}
public void setWorkers(Integer workers) {
this.workers = workers;
}
public Integer getSpoutParallelismHint() {
return spoutParallelismHint;
}
public void setSpoutParallelismHint(Integer spoutParallelismHint) {
this.spoutParallelismHint = spoutParallelismHint;
}
public Integer getSpoutTaskSize() {
return spoutTaskSize;
}
public void setSpoutTaskSize(Integer spoutTaskSize) {
this.spoutTaskSize = spoutTaskSize;
}
public Integer getBoltParallelismHint() {
return boltParallelismHint;
}
public void setBoltParallelismHint(Integer boltParallelismHint) {
this.boltParallelismHint = boltParallelismHint;
}
public Integer getBoltTaskSize() {
return boltTaskSize;
}
public void setBoltTaskSize(Integer boltTaskSize) {
this.boltTaskSize = boltTaskSize;
}
@Override
public String toString() {
return "Options{" +
"topologyName='" + topologyName + '\'' +
", prefix='" + prefix + '\'' +
", workers=" + workers +
", spoutParallelismHint=" + spoutParallelismHint +
", spoutTaskSize=" + spoutTaskSize +
", boltParallelismHint=" + boltParallelismHint +
", boltTaskSize=" + boltTaskSize +
'}';
}
}
}
public class FieldGroupingBolt extends BaseRichBolt {
public static final Logger LOGGER = LoggerFactory.getLogger(FieldGroupingBolt.class);
private TopologyContext context;
private Map conf;
private OutputCollector collector;
private Map<String,Integer> counts = new HashMap(16);
public void prepare(Map map, TopologyContext topologyContext, OutputCollector outputCollector) {
this.conf=map;
this.context = topologyContext;
this.collector = outputCollector;
LOGGER.warn("FieldGroupingBolt->prepare:hashcode:{}->ThreadId:{},TaskId:{}",this.hashCode(),Thread.currentThread().getId(),context.getThisTaskId());
}
public void execute(Tuple tuple) {
String word = tuple.getStringByField("bolt");
LOGGER.warn("FieldGroupingBolt->execute:hashcode:{}->ThreadId:{},TaskId:{},value:{}",this.hashCode(),Thread.currentThread().getId(),context.getThisTaskId(),word);
collector.emit(new Values(word));
collector.ack(tuple);
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("bolt1"));
}
}
public class FieldGrouppingBolt1 extends BaseRichBolt {
public static final Logger LOGGER = LoggerFactory.getLogger(FieldGrouppingBolt1.class);
private TopologyContext context;
private OutputCollector collector;
public void prepare(Map map, TopologyContext topologyContext, OutputCollector outputCollector) {
this.context = topologyContext;
this.collector = outputCollector;
LOGGER.warn("FieldGrouppingBolt1->prepare:hashcode:{}->ThreadId:{},TaskId:{}",this.hashCode(),Thread.currentThread().getId(),context.getThisTaskId());
}
public void execute(Tuple tuple) {
String word = tuple.getStringByField("sentence");
LOGGER.warn("SheffleGroupingBolt1->execute:hashcode:{}->ThreadId:{},TaskId:{},value:{}",this.hashCode(),Thread.currentThread().getId(),context.getThisTaskId(),word);
collector.emit(new Values(word));
collector.ack(tuple);
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("bolt"));
}
}
2.打包上传到服务器
3.执行:
storm jar strom-study-1.0-SNAPSHOT-jar-with-dependencies.jar com.sonly.storm.demo1.grouppings.fieldgrouping.FeildGroupingToplogy FieldGrouping1 FieldGrouping1 2 1 1 2 2
4.参数
topologyName='FieldGrouping1', prefix='FieldGrouping1', workers=2, spoutParallelismHint=1, spoutTaskSize=1, boltParallelismHint=2, boltTaskSize=2
5。拓扑图:
6.并发度以及组件分布图:
同样看图可以看到消息被发送了20次,但是被transfer40次。这是因为spout对bolt,对消息进行了复制,全量发送到了每个bolt,所以每个bolt都会有10条消息。
总结:
和sheffleGrouping 一样,spout->bolt是全量广播发送,每个bolt都会spout的全量消息。
样例2
1.修改拓扑的代码
public static void main(String[] args) throws InterruptedException {
TopologyBuilder builder = new TopologyBuilder();
Config conf = new Config();
conf.setDebug(true);
Options options = Options.builder(args);
LOGGER.warn("The Topology Options {} is Submited ", options.toString());
conf.setNumWorkers(options.getWorkers());
String spoutName = options.getPrefix() + "-spout";
builder.setSpout(spoutName, new WordSpout(), options.getSpoutParallelismHint()).setNumTasks(options.getSpoutTaskSize());
// builder.setBolt(options.getPrefix() + "bolt1", new FieldGrouppingBolt1(), options.getBoltParallelismHint()).fieldsGrouping(spoutName, new Fields("sentence")).setNumTasks(options.getBoltTaskSize());
// builder.setBolt(options.getPrefix() + "bolt", new FieldGroupingBolt(), options.getBoltParallelismHint()).fieldsGrouping(spoutName, new Fields("sentence")).setNumTasks(options.getBoltTaskSize());
builder.setBolt("bolt1", new FieldGrouppingBolt1(), 1).fieldsGrouping(spoutName, new Fields("sentence"));
builder.setBolt("bolt", new FieldGroupingBolt(), options.getBoltParallelismHint()).fieldsGrouping("bolt1",new Fields("bolt")).setNumTasks(options.getBoltTaskSize());
try {
StormSubmitter.submitTopologyWithProgressBar(options.getTopologyName(), conf, builder.createTopology());
LOGGER.warn("===========================================================");
LOGGER.warn("The Topology {} is Submited ", options.getTopologyName());
LOGGER.warn("===========================================================");
} catch (AlreadyAliveException | InvalidTopologyException | AuthorizationException e) {
e.printStackTrace();
}
}
FieldGrouping 样例分析
1)样例1
2.将上面代码打包上传服务器执行命令
storm jar strom-study-1.0-SNAPSHOT-jar-with-dependencies.jar com.sonly.storm.demo1.grouppings.fieldgrouping.FeildGroupingToplogy FieldGrouping2 FieldGrouping2 2 1 1 2 2
3.参数
topologyName='FieldGrouping2', prefix='FieldGrouping2', workers=2, spoutParallelismHint=1, spoutTaskSize=1, boltParallelismHint=2, boltTaskSize=
4.拓扑图:
6.并发度以及组件分布图:
7.根据分布情况检查各个work的日志查看消息的发送情况
k8s-n3节点上:
[root@k8s-n3 6701]# grep "SheffleGroupingBolt1->execute" worker.log|wc -l
10
[root@k8s-n3 6701]# grep "FieldGroupingBolt->execute" worker.log|wc -l
2
k8s-n2节点:
[root@k8s-n2 6701]# grep "SheffleGroupingBolt1->execute" worker.log|wc -l
0
[root@k8s-n2 6701]# grep "FieldGroupingBolt->execute" worker.log|wc -l
8
再看一下详情如何
k8s-n3:bolt1有10条消息应为bolt只有一个所以Fied分组是不会生效的。
[root@k8s-n3 6701]# grep "SheffleGroupingBolt1->execute" worker.log
2019-05-07 21:59:35.805 c.s.s.d.g.f.FieldGrouppingBolt1 Thread-7-bolt1-executor[6 6] [WARN] SheffleGroupingBolt1->execute:hashcode:107880849->ThreadId:41,TaskId:6,value:zhangsan
2019-05-07 21:59:35.810 c.s.s.d.g.f.FieldGrouppingBolt1 Thread-7-bolt1-executor[6 6] [WARN] SheffleGroupingBolt1->execute:hashcode:107880849->ThreadId:41,TaskId:6,value:zhangsan
2019-05-07 21:59:35.810 c.s.s.d.g.f.FieldGrouppingBolt1 Thread-7-bolt1-executor[6 6] [WARN] SheffleGroupingBolt1->execute:hashcode:107880849->ThreadId:41,TaskId:6,value:zhangsan
2019-05-07 21:59:35.811 c.s.s.d.g.f.FieldGrouppingBolt1 Thread-7-bolt1-executor[6 6] [WARN] SheffleGroupingBolt1->execute:hashcode:107880849->ThreadId:41,TaskId:6,value:zhangsan
2019-05-07 21:59:35.811 c.s.s.d.g.f.FieldGrouppingBolt1 Thread-7-bolt1-executor[6 6] [WARN] SheffleGroupingBolt1->execute:hashcode:107880849->ThreadId:41,TaskId:6,value:zhangsan
2019-05-07 21:59:35.812 c.s.s.d.g.f.FieldGrouppingBolt1 Thread-7-bolt1-executor[6 6] [WARN] SheffleGroupingBolt1->execute:hashcode:107880849->ThreadId:41,TaskId:6,value:zhangsan
2019-05-07 21:59:35.813 c.s.s.d.g.f.FieldGrouppingBolt1 Thread-7-bolt1-executor[6 6] [WARN] SheffleGroupingBolt1->execute:hashcode:107880849->ThreadId:41,TaskId:6,value:zhangsan
2019-05-07 21:59:35.814 c.s.s.d.g.f.FieldGrouppingBolt1 Thread-7-bolt1-executor[6 6] [WARN] SheffleGroupingBolt1->execute:hashcode:107880849->ThreadId:41,TaskId:6,value:zhangsan
2019-05-07 21:59:35.815 c.s.s.d.g.f.FieldGrouppingBolt1 Thread-7-bolt1-executor[6 6] [WARN] SheffleGroupingBolt1->execute:hashcode:107880849->ThreadId:41,TaskId:6,value:lisi
2019-05-07 21:59:35.838 c.s.s.d.g.f.FieldGrouppingBolt1 Thread-7-bolt1-executor[6 6] [WARN] SheffleGroupingBolt1->execute:hashcode:107880849->ThreadId:41,TaskId:6,value:lisi
k8s-n3:bolt 有两个实例,按照field分组。里面有两条消息。都是lisi
[root@k8s-n3 6701]# grep "FieldGroupingBolt->execute" worker.log
2019-05-07 21:59:35.855 c.s.s.d.g.f.FieldGroupingBolt Thread-11-bolt-executor[4 4] [WARN] FieldGroupingBolt->execute:hashcode:281792799->ThreadId:45,TaskId:4,value:lisi
2019-05-07 21:59:35.856 c.s.s.d.g.f.FieldGroupingBolt Thread-11-bolt-executor[4 4] [WARN] FieldGroupingBolt->execute:hashcode:281792799->ThreadId:45,TaskId:4,value:lisi
k8s-n2: bolt 应该就是8条消息,验证一下
[root@k8s-n2 6701]# grep "FieldGroupingBolt->execute" worker.log
2019-05-07 21:59:48.315 c.s.s.d.g.f.FieldGroupingBolt Thread-11-bolt-executor[5 5] [WARN] FieldGroupingBolt->execute:hashcode:1858735164->ThreadId:45,TaskId:5,value:zhangsan
2019-05-07 21:59:48.317 c.s.s.d.g.f.FieldGroupingBolt Thread-11-bolt-executor[5 5] [WARN] FieldGroupingBolt->execute:hashcode:1858735164->ThreadId:45,TaskId:5,value:zhangsan
2019-05-07 21:59:48.317 c.s.s.d.g.f.FieldGroupingBolt Thread-11-bolt-executor[5 5] [WARN] FieldGroupingBolt->execute:hashcode:1858735164->ThreadId:45,TaskId:5,value:zhangsan
2019-05-07 21:59:48.318 c.s.s.d.g.f.FieldGroupingBolt Thread-11-bolt-executor[5 5] [WARN] FieldGroupingBolt->execute:hashcode:1858735164->ThreadId:45,TaskId:5,value:zhangsan
2019-05-07 21:59:48.318 c.s.s.d.g.f.FieldGroupingBolt Thread-11-bolt-executor[5 5] [WARN] FieldGroupingBolt->execute:hashcode:1858735164->ThreadId:45,TaskId:5,value:zhangsan
2019-05-07 21:59:48.319 c.s.s.d.g.f.FieldGroupingBolt Thread-11-bolt-executor[5 5] [WARN] FieldGroupingBolt->execute:hashcode:1858735164->ThreadId:45,TaskId:5,value:zhangsan
2019-05-07 21:59:48.319 c.s.s.d.g.f.FieldGroupingBolt Thread-11-bolt-executor[5 5] [WARN] FieldGroupingBolt->execute:hashcode:1858735164->ThreadId:45,TaskId:5,value:zhangsan
2019-05-07 21:59:48.320 c.s.s.d.g.f.FieldGroupingBolt Thread-11-bolt-executor[5 5] [WARN] FieldGroupingBolt->execute:hashcode:1858735164->ThreadId:45,TaskId:5,value:zhangsan
总结:
bolt->bolt节点时,feild分组会按照field字段的key值进行分组,key相同的会被分配到一个bolt里面。
如果执行
storm jar strom-study-1.0-SNAPSHOT-jar-with-dependencies.jar com.sonly.storm.demo1.grouppings.fieldgrouping.FeildGroupingToplogy FieldGrouping3 FieldGrouping3 4 1 1 4 4
bolt 的分配情况是什么样子?这个留给大家去思考一下。例子中按照field分组后只有两种数据,但是这两种数据要分配给4个bolt,那这个是怎么分配的?我将在下一篇博客里揭晓答案!
下一篇我会继续分析这个分组策略,我会把我在学习这个storm的时候当时的自己思考的一个过程展现给大家,如果有什么错误的,或者没有讲清楚的地方,欢迎大家给我留言,咱们可以一起交流讨论。
storm 的分组策略深入理解(-)的更多相关文章
- 简单聊聊Storm的流分组策略
简单聊聊Storm的流分组策略 首先我要强调的是,Storm的分组策略对结果有着直接的影响,不同的分组的结果一定是不一样的.其次,不同的分组策略对资源的利用也是有着非常大的不同,本文主要讲一讲loca ...
- 大数据量场景下storm自定义分组与Hbase预分区完美结合大幅度节省内存空间
前言:在系统中向hbase中插入数据时,常常通过设置region的预分区来防止大数据量插入的热点问题,提高数据插入的效率,同时可以减少当数据猛增时由于Region split带来的资源消耗.大量的预分 ...
- storm自定义分组与Hbase预分区结合节省内存消耗
Hbas预分区 在系统中向hbase中插入数据时,常常通过设置region的预分区来防止大数据量插入的热点问题,提高数据插入的效率,同时可以减少当数据猛增时由于Region split带来的资源消耗. ...
- 【Storm篇】--Storm分组策略
一.前述 Storm由数源泉spout到bolt时,可以选择分组策略,实现对spout发出的数据的分发.对多个并行度的时候有用. 二.具体原理 1. Shuffle Grouping 随机分组,随机派 ...
- storm Tutorial 的解读 + 个人理解
参考链接: Tutorial storm Tutorial 中文解读+分析 导读.摘要: .hadoop有master与slave,Storm与之对应的节点是什么? .Storm控制节点上面运行一个后 ...
- 第1节 storm编程:8、storm的分发策略
8. Storm的分发策略 Storm当中的分组策略,一共有八种: 所谓的grouping策略就是在Spout与Bolt.Bolt与Bolt之间传递Tuple的方式.总共有八种方式: 1)shuffl ...
- Storm流分组介绍
Storm流分组介绍 流分组是拓扑定义的一部分,每个Bolt指定应该接收哪个流作为输入.流分组定义了流/元组如何在Bolt的任务之间进行分发.在设计拓扑的时候需要定义数据 ...
- Storm Grouping —— 流分组策略
Storm Grouping: Shuffle Grouping :随机分组,尽量均匀分布到下游Bolt中 将流分组定义为混排.这种混排分组意味着来自Spout的输入将混排,或随机分发给此Bolt中的 ...
- Storm中并发程度的理解
Storm中涉及到了很多组件,例如nimbus,supervisor等等,在参考了这两篇文章之后,对这个有了更好的理解. Understanding the parallelism of a Stor ...
随机推荐
- iOS——plist的创建,数据写入与读取
iOS中plist的创建,数据写入与读取 Documents:应用将数据存储在Documents中,但基于NSuserDefaults的首选项设置除外Library:基于NSUserDefaults的 ...
- postman+jmeter接口实例
接口基础 一.为什么要单独测试接口? 1. 程序是分开开发的,前端还没有开发,后端已经开发完了,可以提前进入测试2. 接口直接返回的数据------越底层发现bug,修复成本是越低的3. 接口测试能模 ...
- python-Web-flask-视图内容和模板
2 视图内容和模板: 基本使用 #设置cookie值 @app.route('/set_cookie') def set_cookie(): response = make_response(&quo ...
- git命令自动补全
git安装好后发现命令不能自动补全于是搜了很多方法,先按博客里面的方法试了下: 1.下载配置文件 git clone git://git.kernel.org/pub/scm/git/git.git ...
- 【ARM-Linux开发】wubi.exe安装Ubuntu14.04出现为/检查磁盘时发生严重错误的解决方法
原文:http://jingyan.baidu.com/article/0aa22375bbffbe88cc0d6419.html 我写过一篇叫做 如何硬盘装麒麟系统 的经验,介绍了如何不使用优盘的情 ...
- confluence导出PDF格式文件不显示中文解决
由于confluence导出PDF格式文件需要应用字体文件,下载字体文件在confluence管理员界面安装即可. 我这里使用从simhei.ttf楷体,可以从windowns主机里下载(c:/win ...
- 【51nod】2591 最终讨伐
[51nod]2591 最终讨伐 敲51nod是啥评测机啊,好几次都编译超时然后同一份代码莫名奇妙在众多0ms中忽然超时 这道题很简单就是\(M\)名既被诅咒也有石头的人,要么就把石头给没有石头被诅咒 ...
- Python解Leetcode: 539. Minimum Time Difference
题目描述:给定一个由时间字符组成的列表,找出任意两个时间之间最小的差值. 思路: 把给定的链表排序,并且在排序的同时把60进制的时间转化成十进制整数: 遍历排序的数组,求出两个相邻值之间的差值: 求出 ...
- java日期比较工具类
package com.net.util; import java.text.DateFormat; import java.text.SimpleDateFormat; import java.ut ...
- 【浅析C++11】std::function和std::bind
目录 std::function可调用对象包装器 std::function基本用法 std::function/std::bind与抽象工厂.工厂方法的一点思考 std::function可调用对象 ...