storm的分组策略

  • 洗牌分组(Shuffle grouping): 随机分配元组到Bolt的某个任务上,这样保证同一个Bolt的每个任务都能够得到相同数量的元组。

  • 字段分组(Fields grouping): 按照指定的分组字段来进行流的分组。例如,流是用字段“user-id"来分组的,那有着相同“user-id"的元组就会分到同一个任务里,但是有不同“user-id"的元组就会分到不同的任务里。这是一种非常重要的分组方式,通过这种流分组方式,我们就可以做到让Storm产出的消息在这个"user-id"级别是严格有序的,这对一些对时序敏感的应用(例如,计费系统)是非常重要的。

  • Partial Key grouping: 跟字段分组一样,流也是用指定的分组字段进行分组的,但是在多个下游Bolt之间是有负载均衡的,这样当输入数据有倾斜时可以更好的利用资源。这篇论文很好的解释了这是如何工作的,有哪些优势。

  • All grouping: 流会复制给Bolt的所有任务。小心使用这种分组方式。在拓扑中,如果希望某类元祖发送到所有的下游消费者,就可以使用这种All grouping的流分组策略。

  • Global grouping: 整个流会分配给Bolt的一个任务。具体一点,会分配给有最小ID的任务。

    不分组(None grouping): 说明不关心流是如何分组的。目前,None grouping等价于洗牌分组。

  • Direct grouping:一种特殊的分组。对于这样分组的流,元组的生产者决定消费者的哪个任务会接收处理这个元组。只能在声明做直连的流(direct streams)上声明Direct groupings分组方式。只能通过使用emitDirect系列函数来吐元组给直连流。一个Bolt可以通过提供的TopologyContext来获得消费者的任务ID,也可以通过OutputCollector对象的emit函数(会返回元组被发送到的任务的ID)来跟踪消费者的任务ID。在ack的实现中,Spout有两个直连输入流,ack和ackFail,使用了这种直连分组的方式。

  • Local or shuffle grouping:如果目标Bolt在同一个worker进程里有一个或多个任务,元组就会通过洗牌的方式分配到这些同一个进程内的任务里。否则,就跟普通的洗牌分组一样。这种方式的好处是可以提高拓扑的处理效率,因为worker内部通信就是进程内部通信了,相比拓扑间的进程间通信要高效的多。worker进程间通信是通过使用Netty来进行网络通信的。

根据实例来分析分组策略

common配置:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion> <groupId>com.sonly.strom</groupId>
<artifactId>strom-study</artifactId>
<version>1.0-SNAPSHOT</version>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>7</source>
<target>7</target>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass>com.sonly.storm.demo1.HelloToplogy</mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
<version>1.2.2</version>
<scope>provided</scope>
</dependency>
</dependencies>
</project>

Shuffle grouping

shuffle grouping的实例代码

package com.sonly.storm.demo1.grouppings.spout;

import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichSpout;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory; import java.util.Map;
import java.util.Random;
import java.util.concurrent.atomic.AtomicInteger; /**
* <b>package:com.sonly.storm.demo1</b>
* <b>project(项目):stormstudy</b>
* <b>class(类)${CLASS_NAME}</b>
* <b>creat date(创建时间):2019-05-09 20:27</b>
* <b>author(作者):</b>xxydliuyss</br>
* <b>note(备注)):</b>
* If you want to change the file header,please modify zhe File and Code Templates.
*/
public class WordSpout extends BaseRichSpout {
public static final Logger LOGGER = LoggerFactory.getLogger(WordSpout.class);
//拓扑上下文
private TopologyContext context;
private SpoutOutputCollector collector;
private Map config;
private AtomicInteger atomicInteger = new AtomicInteger(0); public void open(Map conf, TopologyContext topologyContext, SpoutOutputCollector collector) {
this.config = conf;
this.context = topologyContext;
this.collector = collector;
LOGGER.warn("WordSpout->open:hashcode:{}->ThreadId:{},TaskId:{}", this.hashCode(), Thread.currentThread().getId(), context.getThisTaskId());
} public void nextTuple() {
String[] sentences = new String[]{"zhangsan","zhangsan","zhangsan","zhangsan","zhangsan","zhangsan","zhangsan","zhangsan","lisi","lisi"};
int i = atomicInteger.get();
if(i<10){
atomicInteger.incrementAndGet();
final String sentence = sentences[i];
collector.emit(new Values(sentence));
LOGGER.warn("WordSpout->nextTuple:hashcode:{}->ThreadId:{},TaskId:{},Values:{}", this.hashCode(), Thread.currentThread().getId(), context.getThisTaskId(), sentence);
}
} public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("sentence"));
}
}

bolt1

package com.sonly.storm.demo1.grouppings;

import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichBolt;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory; import java.util.HashMap;
import java.util.Map; /**
* <b>package:com.sonly.storm.demo1</b>
* <b>project(项目):stormstudy</b>
* <b>class(类)${CLASS_NAME}</b>
* <b>creat date(创建时间):2019-05-09 21:19</b>
* <b>author(作者):</b>xxydliuyss</br>
* <b>note(备注)):</b>
* If you want to change the file header,please modify zhe File and Code Templates.
*/
public class SheffleGroupingBolt extends BaseRichBolt {
public static final Logger LOGGER = LoggerFactory.getLogger(SheffleGroupingBolt.class);
private TopologyContext context;
private Map conf;
private OutputCollector collector;
private Map<String,Integer> counts = new HashMap(16);
public void prepare(Map map, TopologyContext topologyContext, OutputCollector outputCollector) {
this.conf=map;
this.context = topologyContext;
this.collector = outputCollector;
LOGGER.warn("SheffleGroupingBolt->prepare:hashcode:{}->ThreadId:{},TaskId:{}",this.hashCode(),Thread.currentThread().getId(),context.getThisTaskId());
} public void execute(Tuple tuple) { String word = tuple.getString(0);
LOGGER.warn("SheffleGroupingBolt->execute:hashcode:{}->ThreadId:{},TaskId:{},value:{}",this.hashCode(),Thread.currentThread().getId(),context.getThisTaskId(),word);
collector.emit(new Values(word));
collector.ack(tuple);
} public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("bolt1"));
}
}

bolt

package com.sonly.storm.demo1.grouppings;

import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichBolt;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory; import java.util.Map; /**
* <b>package:com.sonly.storm.demo1</b>
* <b>project(项目):stormstudy</b>
* <b>class(类)${CLASS_NAME}</b>
* <b>creat date(创建时间):2019-05-09 21:29</b>
* <b>author(作者):</b>xxydliuyss</br>
* <b>note(备注)):</b>
* If you want to change the file header,please modify zhe File and Code Templates.
*/
public class SheffleGrouppingBolt1 extends BaseRichBolt {
public static final Logger LOGGER = LoggerFactory.getLogger(SheffleGrouppingBolt1.class);
private TopologyContext context;
private Map conf;
private OutputCollector collector;
public void prepare(Map map, TopologyContext topologyContext, OutputCollector outputCollector) {
this.conf=map;
this.context = topologyContext;
this.collector = outputCollector;
LOGGER.warn("SheffleGrouppingBolt1->prepare:hashcode:{}->ThreadId:{},TaskId:{}",this.hashCode(),Thread.currentThread().getId(),context.getThisTaskId());
} public void execute(Tuple tuple) {
String word = tuple.getStringByField("sentence");
LOGGER.warn("SheffleGroupingBolt1->execute:hashcode:{}->ThreadId:{},TaskId:{},value:{}",this.hashCode(),Thread.currentThread().getId(),context.getThisTaskId(),word);
collector.emit(new Values(word));
collector.ack(tuple);
} public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("bolt"));
}
}

topology

package com.sonly.storm.demo1.grouppings;

import com.sonly.storm.demo1.grouppings.spout.WordSpout;
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.StormSubmitter;
import org.apache.storm.generated.AlreadyAliveException;
import org.apache.storm.generated.AuthorizationException;
import org.apache.storm.generated.InvalidTopologyException;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.tuple.Fields;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory; /**
* <b>package:com.sonly.storm.demo1</b>
* <b>project(项目):stormstudy</b>
* <b>class(类)${CLASS_NAME}</b>
* <b>creat date(创建时间):2019-05-09 21:55</b>
* <b>author(作者):</b>xxydliuyss</br>
* <b>note(备注)):</b>
* If you want to change the file header,please modify zhe File and Code Templates.
*/
public class ShuffleGroupingToplogy {
public static final Logger LOGGER = LoggerFactory.getLogger(ShuffleGroupingToplogy.class);
//Topology Name
//component prefix
//workers
//spout executor (parallelism_hint)
//spout task size
//bolt executor (parallelism_hint)
//bolt task size
public static void main(String[] args) throws InterruptedException {
TopologyBuilder builder = new TopologyBuilder();
Config conf = new Config();
conf.setDebug(true);
if (args==null || args.length < 7) {
conf.setNumWorkers(3);
builder.setSpout("spout", new WordSpout(), 4).setNumTasks(4); builder.setBolt("split-bolt", new SheffleGrouppingBolt1(), 4).shuffleGrouping("spout").setNumTasks(8);
builder.setBolt("count-bolt", new SheffleGroupingBolt(), 8).fieldsGrouping("split-bolt", new Fields("word")).setNumTasks(8);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("word-count", conf, builder.createTopology()); Thread.sleep(10000);
cluster.killTopology("word-count");
cluster.shutdown();
}
else {
Options options = Options.builder(args);
LOGGER.warn("The Topology Options {} is Submited ",options.toString());
conf.setNumWorkers(options.getWorkers());
builder.setSpout(options.getPrefix()+"-spout", new WordSpout(), options.getSpoutParallelismHint()).setNumTasks(options.getSpoutTaskSize()); builder.setBolt("bolt1", new SheffleGrouppingBolt1(), options.getBoltParallelismHint()).shuffleGrouping(options.getPrefix()+"-spout").setNumTasks(options.getBoltTaskSize());
builder.setBolt("bolt", new SheffleGroupingBolt(), options.getBoltParallelismHint()).shuffleGrouping(options.getPrefix()+"-spout").setNumTasks(options.getBoltTaskSize());
try {
StormSubmitter.submitTopologyWithProgressBar(options.getTopologyName(), conf, builder.createTopology());
LOGGER.warn("===========================================================");
LOGGER.warn("The Topology {} is Submited ",options.getTopologyName());
LOGGER.warn("===========================================================");
} catch (AlreadyAliveException | InvalidTopologyException | AuthorizationException e) {
e.printStackTrace();
} }
}
public static class Options{
private String topologyName;
private String prefix;
private Integer workers;
private Integer spoutParallelismHint;
private Integer spoutTaskSize;
private Integer boltParallelismHint;
private Integer boltTaskSize; public Options(String topologyName, String prefix, Integer workers, Integer spoutParallelismHint, Integer spoutTaskSize, Integer boltParallelismHint, Integer boltTaskSize) {
this.topologyName = topologyName;
this.prefix = prefix;
this.workers = workers;
this.spoutParallelismHint = spoutParallelismHint;
this.spoutTaskSize = spoutTaskSize;
this.boltParallelismHint = boltParallelismHint;
this.boltTaskSize = boltTaskSize;
}
public static Options builder(String[] args){
return new Options(args[0],args[1],Integer.parseInt(args[2])
,Integer.parseInt(args[3]),Integer.parseInt(args[4]),Integer.parseInt(args[5]),Integer.parseInt(args[6])
);
}
public String getTopologyName() {
return topologyName;
} public void setTopologyName(String topologyName) {
this.topologyName = topologyName;
} public String getPrefix() {
return prefix;
} public void setPrefix(String prefix) {
this.prefix = prefix;
} public Integer getWorkers() {
return workers;
} public void setWorkers(Integer workers) {
this.workers = workers;
} public Integer getSpoutParallelismHint() {
return spoutParallelismHint;
} public void setSpoutParallelismHint(Integer spoutParallelismHint) {
this.spoutParallelismHint = spoutParallelismHint;
} public Integer getSpoutTaskSize() {
return spoutTaskSize;
} public void setSpoutTaskSize(Integer spoutTaskSize) {
this.spoutTaskSize = spoutTaskSize;
} public Integer getBoltParallelismHint() {
return boltParallelismHint;
} public void setBoltParallelismHint(Integer boltParallelismHint) {
this.boltParallelismHint = boltParallelismHint;
} public Integer getBoltTaskSize() {
return boltTaskSize;
} public void setBoltTaskSize(Integer boltTaskSize) {
this.boltTaskSize = boltTaskSize;
} @Override
public String toString() {
return "Options{" +
"topologyName='" + topologyName + '\'' +
", prefix='" + prefix + '\'' +
", workers=" + workers +
", spoutParallelismHint=" + spoutParallelismHint +
", spoutTaskSize=" + spoutTaskSize +
", boltParallelismHint=" + boltParallelismHint +
", boltTaskSize=" + boltTaskSize +
'}';
}
}
}

mvn package 打包,上传到storm服务器

ShuffleGrouping 样例分析

1)样例1

1.执行:

storm jar strom-study-1.0-SNAPSHOT-jar-with-dependencies.jar com.sonly.storm.demo1.grouppings.ShuffleGroupingToplogy ShuffleGrouping ShuffleGrouping 1 2 1 2 1

2.参数:

topologyName='ShuffleGrouping', prefix='ShuffleGrouping', workers=1, spoutParallelismHint=2, spoutTaskSize=1, boltParallelismHint=2, boltTaskSize=1

3.拓扑图:



一个spout接了两个bolt

4.查看一下这个bolt分布情况:



5.进入服务器去看每一个bolt的日志

2019-05-07 18:09:13.109 c.s.s.d.g.SheffleGrouppingBolt1 Thread-11-bolt1-executor[5 5] [WARN] SheffleGroupingBolt1->execute:hashcode:1393282516->ThreadId:45,TaskId:5,value:zhangsan
2019-05-07 18:09:13.110 c.s.s.d.g.SheffleGrouppingBolt1 Thread-11-bolt1-executor[5 5] [WARN] SheffleGroupingBolt1->execute:hashcode:1393282516->ThreadId:45,TaskId:5,value:zhangsan
2019-05-07 18:09:13.110 c.s.s.d.g.SheffleGrouppingBolt1 Thread-11-bolt1-executor[5 5] [WARN] SheffleGroupingBolt1->execute:hashcode:1393282516->ThreadId:45,TaskId:5,value:zhangsan
2019-05-07 18:09:13.111 c.s.s.d.g.SheffleGrouppingBolt1 Thread-11-bolt1-executor[5 5] [WARN] SheffleGroupingBolt1->execute:hashcode:1393282516->ThreadId:45,TaskId:5,value:zhangsan
2019-05-07 18:09:13.112 c.s.s.d.g.SheffleGrouppingBolt1 Thread-11-bolt1-executor[5 5] [WARN] SheffleGroupingBolt1->execute:hashcode:1393282516->ThreadId:45,TaskId:5,value:zhangsan
2019-05-07 18:09:13.115 c.s.s.d.g.SheffleGrouppingBolt1 Thread-11-bolt1-executor[5 5] [WARN] SheffleGroupingBolt1->execute:hashcode:1393282516->ThreadId:45,TaskId:5,value:zhangsan
2019-05-07 18:09:13.116 c.s.s.d.g.SheffleGrouppingBolt1 Thread-11-bolt1-executor[5 5] [WARN] SheffleGroupingBolt1->execute:hashcode:1393282516->ThreadId:45,TaskId:5,value:zhangsan
2019-05-07 18:09:13.117 c.s.s.d.g.SheffleGrouppingBolt1 Thread-11-bolt1-executor[5 5] [WARN] SheffleGroupingBolt1->execute:hashcode:1393282516->ThreadId:45,TaskId:5,value:zhangsan
2019-05-07 18:09:13.118 c.s.s.d.g.SheffleGrouppingBolt1 Thread-11-bolt1-executor[5 5] [WARN] SheffleGroupingBolt1->execute:hashcode:1393282516->ThreadId:45,TaskId:5,value:lisi
2019-05-07 18:09:13.119 c.s.s.d.g.SheffleGrouppingBolt1 Thread-11-bolt1-executor[5 5] [WARN] SheffleGroupingBolt1->execute:hashcode:1393282516->ThreadId:45,TaskId:5,value:lisi

6.进入另外一个bolt的日志 10条信息被处理了

2019-05-07 18:09:00.791 c.s.s.d.g.SheffleGroupingBolt Thread-9-bolt-executor[4 4] [WARN] SheffleGroupingBolt->execute:hashcode:1430296959->ThreadId:43,TaskId:4,value:zhangsan
2019-05-07 18:09:00.793 c.s.s.d.g.SheffleGroupingBolt Thread-9-bolt-executor[4 4] [WARN] SheffleGroupingBolt->execute:hashcode:1430296959->ThreadId:43,TaskId:4,value:zhangsan
2019-05-07 18:09:00.794 c.s.s.d.g.SheffleGroupingBolt Thread-9-bolt-executor[4 4] [WARN] SheffleGroupingBolt->execute:hashcode:1430296959->ThreadId:43,TaskId:4,value:zhangsan
2019-05-07 18:09:00.795 c.s.s.d.g.SheffleGroupingBolt Thread-9-bolt-executor[4 4] [WARN] SheffleGroupingBolt->execute:hashcode:1430296959->ThreadId:43,TaskId:4,value:zhangsan
2019-05-07 18:09:00.795 c.s.s.d.g.SheffleGroupingBolt Thread-9-bolt-executor[4 4] [WARN] SheffleGroupingBolt->execute:hashcode:1430296959->ThreadId:43,TaskId:4,value:zhangsan
2019-05-07 18:09:00.796 c.s.s.d.g.SheffleGroupingBolt Thread-9-bolt-executor[4 4] [WARN] SheffleGroupingBolt->execute:hashcode:1430296959->ThreadId:43,TaskId:4,value:zhangsan
2019-05-07 18:09:00.797 c.s.s.d.g.SheffleGroupingBolt Thread-9-bolt-executor[4 4] [WARN] SheffleGroupingBolt->execute:hashcode:1430296959->ThreadId:43,TaskId:4,value:zhangsan
2019-05-07 18:09:00.805 c.s.s.d.g.SheffleGroupingBolt Thread-9-bolt-executor[4 4] [WARN] SheffleGroupingBolt->execute:hashcode:1430296959->ThreadId:43,TaskId:4,value:zhangsan
2019-05-07 18:09:00.805 c.s.s.d.g.SheffleGroupingBolt Thread-9-bolt-executor[4 4] [WARN] SheffleGroupingBolt->execute:hashcode:1430296959->ThreadId:43,TaskId:4,value:lisi
2019-05-07 18:09:00.806 c.s.s.d.g.SheffleGroupingBolt Thread-9-bolt-executor[4 4] [WARN] SheffleGroupingBolt->execute:hashcode:1430296959->ThreadId:43,TaskId:4,value:lisi

也是一样10条被处理了

总结:

对于spout直接对接两个bolt,sheffgrouping 分组不会随机给两个bolt分配消息,而是全量发给两个BOlT

2)样例2

1.修改一下参数看一下:

topologyName='ShuffleGrouping1', prefix='ShuffleGrouping1', workers=2, spoutParallelismHint=1, spoutTaskSize=2, boltParallelismHint=2, boltTaskSize=2



总共4个bolt,两个spout,总共发送了40条消息,spout产生消息20条。transfer了40次。

看看4个bolt的消息分配的情况。

因为只有两个worker所以会有两个bolt在同一个work上,日志会打在一起,但是从名字可以可以区分开来,同样每个bolt都是10条。

2.修改拓扑结构为:



3.修改代码:

bolt

 String word = tuple.getStringByField("bolt");

topoloy:

 builder.setBolt("bolt1", new SheffleGrouppingBolt1(), 1).shuffleGrouping(options.getPrefix()+"-spout");
builder.setBolt("bolt", new SheffleGroupingBolt(), options.getBoltParallelismHint()).shuffleGrouping("bolt1").setNumTasks(options.getBoltTaskSize());

4.参数:

topologyName='ShuffleGrouping2', prefix='ShuffleGrouping2', workers=2, spoutParallelismHint=1, spoutTaskSize=1, boltParallelismHint=2, boltTaskSize=2



5.查看日志:k8s-n2 这个节点只有bolt bolt1这个节点在k8s-n3上

[root@k8s-n2 6706]# grep "SheffleGroupingBolt->execute" worker.log |wc -l
3
[root@k8s-n2 6706]# grep "SheffleGroupingBolt1->execute" worker.log |wc -l
0
[root@k8s-n3 6706]# grep "SheffleGroupingBolt->execute" worker.log |wc -l
7
[root@k8s-n3 6706]# grep "SheffleGroupingBolt1->execute" worker.log |wc -l
10

可以看出来bolt1->bolt这条线上的数据被随机分配了一个三条一个两条。

总结:

对于bolt 连接bolt的shuffingGrouping,消息是随机分配到多个bolt上面的

Fields grouping

Fields grouping 的实例

代码:

public class FeildGroupingToplogy {
public static final Logger LOGGER = LoggerFactory.getLogger(FeildGroupingToplogy.class); //Topology Name
//component prefix
//workers
//spout executor (parallelism_hint)
//spout task size
//bolt executor (parallelism_hint)
//bolt task size
public static void main(String[] args) throws InterruptedException {
TopologyBuilder builder = new TopologyBuilder();
Config conf = new Config();
conf.setDebug(true); Options options = Options.builder(args);
LOGGER.warn("The Topology Options {} is Submited ", options.toString());
conf.setNumWorkers(options.getWorkers());
String spoutName = options.getPrefix() + "-spout";
builder.setSpout(spoutName, new WordSpout(), options.getSpoutParallelismHint()).setNumTasks(options.getSpoutTaskSize());
builder.setBolt(options.getPrefix() + "bolt1", new FieldGrouppingBolt1(), options.getBoltParallelismHint()).fieldsGrouping(spoutName, new Fields("sentence")).setNumTasks(options.getBoltTaskSize());
builder.setBolt(options.getPrefix() + "bolt", new FieldGroupingBolt(), options.getBoltParallelismHint()).fieldsGrouping(spoutName, new Fields("sentence")).setNumTasks(options.getBoltTaskSize());
// builder.setBolt("bolt1", new FieldGrouppingBolt1(), 1).shuffleGrouping(options.getPrefix()+"-spout");
// builder.setBolt("bolt", new FieldGroupingBolt(), options.getBoltParallelismHint()).fieldsGrouping("bolt1",new Fields("bolt")).setNumTasks(options.getBoltTaskSize());
try {
StormSubmitter.submitTopologyWithProgressBar(options.getTopologyName(), conf, builder.createTopology());
LOGGER.warn("===========================================================");
LOGGER.warn("The Topology {} is Submited ", options.getTopologyName());
LOGGER.warn("===========================================================");
} catch (AlreadyAliveException | InvalidTopologyException | AuthorizationException e) {
e.printStackTrace();
} } public static class Options {
private String topologyName;
private String prefix;
private Integer workers;
private Integer spoutParallelismHint;
private Integer spoutTaskSize;
private Integer boltParallelismHint;
private Integer boltTaskSize; public Options(String topologyName, String prefix, Integer workers, Integer spoutParallelismHint, Integer spoutTaskSize, Integer boltParallelismHint, Integer boltTaskSize) {
this.topologyName = topologyName;
this.prefix = prefix;
this.workers = workers;
this.spoutParallelismHint = spoutParallelismHint;
this.spoutTaskSize = spoutTaskSize;
this.boltParallelismHint = boltParallelismHint;
this.boltTaskSize = boltTaskSize;
} public static Options builder(String[] args) {
return new Options(args[0], args[1], Integer.parseInt(args[2])
, Integer.parseInt(args[3]), Integer.parseInt(args[4]), Integer.parseInt(args[5]), Integer.parseInt(args[6])
);
} public String getTopologyName() {
return topologyName;
} public void setTopologyName(String topologyName) {
this.topologyName = topologyName;
} public String getPrefix() {
return prefix;
} public void setPrefix(String prefix) {
this.prefix = prefix;
} public Integer getWorkers() {
return workers;
} public void setWorkers(Integer workers) {
this.workers = workers;
} public Integer getSpoutParallelismHint() {
return spoutParallelismHint;
} public void setSpoutParallelismHint(Integer spoutParallelismHint) {
this.spoutParallelismHint = spoutParallelismHint;
} public Integer getSpoutTaskSize() {
return spoutTaskSize;
} public void setSpoutTaskSize(Integer spoutTaskSize) {
this.spoutTaskSize = spoutTaskSize;
} public Integer getBoltParallelismHint() {
return boltParallelismHint;
} public void setBoltParallelismHint(Integer boltParallelismHint) {
this.boltParallelismHint = boltParallelismHint;
} public Integer getBoltTaskSize() {
return boltTaskSize;
} public void setBoltTaskSize(Integer boltTaskSize) {
this.boltTaskSize = boltTaskSize;
} @Override
public String toString() {
return "Options{" +
"topologyName='" + topologyName + '\'' +
", prefix='" + prefix + '\'' +
", workers=" + workers +
", spoutParallelismHint=" + spoutParallelismHint +
", spoutTaskSize=" + spoutTaskSize +
", boltParallelismHint=" + boltParallelismHint +
", boltTaskSize=" + boltTaskSize +
'}';
}
}
}
public class FieldGroupingBolt extends BaseRichBolt {
public static final Logger LOGGER = LoggerFactory.getLogger(FieldGroupingBolt.class);
private TopologyContext context;
private Map conf;
private OutputCollector collector;
private Map<String,Integer> counts = new HashMap(16);
public void prepare(Map map, TopologyContext topologyContext, OutputCollector outputCollector) {
this.conf=map;
this.context = topologyContext;
this.collector = outputCollector;
LOGGER.warn("FieldGroupingBolt->prepare:hashcode:{}->ThreadId:{},TaskId:{}",this.hashCode(),Thread.currentThread().getId(),context.getThisTaskId());
} public void execute(Tuple tuple) { String word = tuple.getStringByField("bolt");
LOGGER.warn("FieldGroupingBolt->execute:hashcode:{}->ThreadId:{},TaskId:{},value:{}",this.hashCode(),Thread.currentThread().getId(),context.getThisTaskId(),word);
collector.emit(new Values(word));
collector.ack(tuple);
} public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("bolt1"));
}
}
public class FieldGrouppingBolt1 extends BaseRichBolt {
public static final Logger LOGGER = LoggerFactory.getLogger(FieldGrouppingBolt1.class);
private TopologyContext context;
private OutputCollector collector;
public void prepare(Map map, TopologyContext topologyContext, OutputCollector outputCollector) {
this.context = topologyContext;
this.collector = outputCollector;
LOGGER.warn("FieldGrouppingBolt1->prepare:hashcode:{}->ThreadId:{},TaskId:{}",this.hashCode(),Thread.currentThread().getId(),context.getThisTaskId());
} public void execute(Tuple tuple) {
String word = tuple.getStringByField("sentence");
LOGGER.warn("SheffleGroupingBolt1->execute:hashcode:{}->ThreadId:{},TaskId:{},value:{}",this.hashCode(),Thread.currentThread().getId(),context.getThisTaskId(),word);
collector.emit(new Values(word));
collector.ack(tuple);
} public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("bolt"));
}
}

2.打包上传到服务器

3.执行:

storm jar strom-study-1.0-SNAPSHOT-jar-with-dependencies.jar com.sonly.storm.demo1.grouppings.fieldgrouping.FeildGroupingToplogy FieldGrouping1 FieldGrouping1 2 1 1 2 2

4.参数

topologyName='FieldGrouping1', prefix='FieldGrouping1', workers=2, spoutParallelismHint=1, spoutTaskSize=1, boltParallelismHint=2, boltTaskSize=2

5。拓扑图:



6.并发度以及组件分布图:



同样看图可以看到消息被发送了20次,但是被transfer40次。这是因为spout对bolt,对消息进行了复制,全量发送到了每个bolt,所以每个bolt都会有10条消息。

总结:

和sheffleGrouping 一样,spout->bolt是全量广播发送,每个bolt都会spout的全量消息。

样例2

1.修改拓扑的代码

public static void main(String[] args) throws InterruptedException {
TopologyBuilder builder = new TopologyBuilder();
Config conf = new Config();
conf.setDebug(true); Options options = Options.builder(args);
LOGGER.warn("The Topology Options {} is Submited ", options.toString());
conf.setNumWorkers(options.getWorkers());
String spoutName = options.getPrefix() + "-spout";
builder.setSpout(spoutName, new WordSpout(), options.getSpoutParallelismHint()).setNumTasks(options.getSpoutTaskSize());
// builder.setBolt(options.getPrefix() + "bolt1", new FieldGrouppingBolt1(), options.getBoltParallelismHint()).fieldsGrouping(spoutName, new Fields("sentence")).setNumTasks(options.getBoltTaskSize());
// builder.setBolt(options.getPrefix() + "bolt", new FieldGroupingBolt(), options.getBoltParallelismHint()).fieldsGrouping(spoutName, new Fields("sentence")).setNumTasks(options.getBoltTaskSize());
builder.setBolt("bolt1", new FieldGrouppingBolt1(), 1).fieldsGrouping(spoutName, new Fields("sentence"));
builder.setBolt("bolt", new FieldGroupingBolt(), options.getBoltParallelismHint()).fieldsGrouping("bolt1",new Fields("bolt")).setNumTasks(options.getBoltTaskSize());
try {
StormSubmitter.submitTopologyWithProgressBar(options.getTopologyName(), conf, builder.createTopology());
LOGGER.warn("===========================================================");
LOGGER.warn("The Topology {} is Submited ", options.getTopologyName());
LOGGER.warn("===========================================================");
} catch (AlreadyAliveException | InvalidTopologyException | AuthorizationException e) {
e.printStackTrace();
} }

FieldGrouping 样例分析

1)样例1

2.将上面代码打包上传服务器执行命令

 storm jar strom-study-1.0-SNAPSHOT-jar-with-dependencies.jar com.sonly.storm.demo1.grouppings.fieldgrouping.FeildGroupingToplogy FieldGrouping2 FieldGrouping2 2 1 1 2 2

3.参数

topologyName='FieldGrouping2', prefix='FieldGrouping2', workers=2, spoutParallelismHint=1, spoutTaskSize=1, boltParallelismHint=2, boltTaskSize=

4.拓扑图:



6.并发度以及组件分布图:



7.根据分布情况检查各个work的日志查看消息的发送情况

k8s-n3节点上:

[root@k8s-n3 6701]# grep "SheffleGroupingBolt1->execute" worker.log|wc -l
10
[root@k8s-n3 6701]# grep "FieldGroupingBolt->execute" worker.log|wc -l
2

k8s-n2节点:

[root@k8s-n2 6701]# grep "SheffleGroupingBolt1->execute" worker.log|wc -l
0
[root@k8s-n2 6701]# grep "FieldGroupingBolt->execute" worker.log|wc -l
8

再看一下详情如何

k8s-n3:bolt1有10条消息应为bolt只有一个所以Fied分组是不会生效的。

[root@k8s-n3 6701]# grep "SheffleGroupingBolt1->execute" worker.log
2019-05-07 21:59:35.805 c.s.s.d.g.f.FieldGrouppingBolt1 Thread-7-bolt1-executor[6 6] [WARN] SheffleGroupingBolt1->execute:hashcode:107880849->ThreadId:41,TaskId:6,value:zhangsan
2019-05-07 21:59:35.810 c.s.s.d.g.f.FieldGrouppingBolt1 Thread-7-bolt1-executor[6 6] [WARN] SheffleGroupingBolt1->execute:hashcode:107880849->ThreadId:41,TaskId:6,value:zhangsan
2019-05-07 21:59:35.810 c.s.s.d.g.f.FieldGrouppingBolt1 Thread-7-bolt1-executor[6 6] [WARN] SheffleGroupingBolt1->execute:hashcode:107880849->ThreadId:41,TaskId:6,value:zhangsan
2019-05-07 21:59:35.811 c.s.s.d.g.f.FieldGrouppingBolt1 Thread-7-bolt1-executor[6 6] [WARN] SheffleGroupingBolt1->execute:hashcode:107880849->ThreadId:41,TaskId:6,value:zhangsan
2019-05-07 21:59:35.811 c.s.s.d.g.f.FieldGrouppingBolt1 Thread-7-bolt1-executor[6 6] [WARN] SheffleGroupingBolt1->execute:hashcode:107880849->ThreadId:41,TaskId:6,value:zhangsan
2019-05-07 21:59:35.812 c.s.s.d.g.f.FieldGrouppingBolt1 Thread-7-bolt1-executor[6 6] [WARN] SheffleGroupingBolt1->execute:hashcode:107880849->ThreadId:41,TaskId:6,value:zhangsan
2019-05-07 21:59:35.813 c.s.s.d.g.f.FieldGrouppingBolt1 Thread-7-bolt1-executor[6 6] [WARN] SheffleGroupingBolt1->execute:hashcode:107880849->ThreadId:41,TaskId:6,value:zhangsan
2019-05-07 21:59:35.814 c.s.s.d.g.f.FieldGrouppingBolt1 Thread-7-bolt1-executor[6 6] [WARN] SheffleGroupingBolt1->execute:hashcode:107880849->ThreadId:41,TaskId:6,value:zhangsan
2019-05-07 21:59:35.815 c.s.s.d.g.f.FieldGrouppingBolt1 Thread-7-bolt1-executor[6 6] [WARN] SheffleGroupingBolt1->execute:hashcode:107880849->ThreadId:41,TaskId:6,value:lisi
2019-05-07 21:59:35.838 c.s.s.d.g.f.FieldGrouppingBolt1 Thread-7-bolt1-executor[6 6] [WARN] SheffleGroupingBolt1->execute:hashcode:107880849->ThreadId:41,TaskId:6,value:lisi

k8s-n3:bolt 有两个实例,按照field分组。里面有两条消息。都是lisi

[root@k8s-n3 6701]# grep "FieldGroupingBolt->execute" worker.log
2019-05-07 21:59:35.855 c.s.s.d.g.f.FieldGroupingBolt Thread-11-bolt-executor[4 4] [WARN] FieldGroupingBolt->execute:hashcode:281792799->ThreadId:45,TaskId:4,value:lisi
2019-05-07 21:59:35.856 c.s.s.d.g.f.FieldGroupingBolt Thread-11-bolt-executor[4 4] [WARN] FieldGroupingBolt->execute:hashcode:281792799->ThreadId:45,TaskId:4,value:lisi

k8s-n2: bolt 应该就是8条消息,验证一下

[root@k8s-n2 6701]# grep "FieldGroupingBolt->execute" worker.log
2019-05-07 21:59:48.315 c.s.s.d.g.f.FieldGroupingBolt Thread-11-bolt-executor[5 5] [WARN] FieldGroupingBolt->execute:hashcode:1858735164->ThreadId:45,TaskId:5,value:zhangsan
2019-05-07 21:59:48.317 c.s.s.d.g.f.FieldGroupingBolt Thread-11-bolt-executor[5 5] [WARN] FieldGroupingBolt->execute:hashcode:1858735164->ThreadId:45,TaskId:5,value:zhangsan
2019-05-07 21:59:48.317 c.s.s.d.g.f.FieldGroupingBolt Thread-11-bolt-executor[5 5] [WARN] FieldGroupingBolt->execute:hashcode:1858735164->ThreadId:45,TaskId:5,value:zhangsan
2019-05-07 21:59:48.318 c.s.s.d.g.f.FieldGroupingBolt Thread-11-bolt-executor[5 5] [WARN] FieldGroupingBolt->execute:hashcode:1858735164->ThreadId:45,TaskId:5,value:zhangsan
2019-05-07 21:59:48.318 c.s.s.d.g.f.FieldGroupingBolt Thread-11-bolt-executor[5 5] [WARN] FieldGroupingBolt->execute:hashcode:1858735164->ThreadId:45,TaskId:5,value:zhangsan
2019-05-07 21:59:48.319 c.s.s.d.g.f.FieldGroupingBolt Thread-11-bolt-executor[5 5] [WARN] FieldGroupingBolt->execute:hashcode:1858735164->ThreadId:45,TaskId:5,value:zhangsan
2019-05-07 21:59:48.319 c.s.s.d.g.f.FieldGroupingBolt Thread-11-bolt-executor[5 5] [WARN] FieldGroupingBolt->execute:hashcode:1858735164->ThreadId:45,TaskId:5,value:zhangsan
2019-05-07 21:59:48.320 c.s.s.d.g.f.FieldGroupingBolt Thread-11-bolt-executor[5 5] [WARN] FieldGroupingBolt->execute:hashcode:1858735164->ThreadId:45,TaskId:5,value:zhangsan

总结:

bolt->bolt节点时,feild分组会按照field字段的key值进行分组,key相同的会被分配到一个bolt里面。

如果执行

storm jar strom-study-1.0-SNAPSHOT-jar-with-dependencies.jar com.sonly.storm.demo1.grouppings.fieldgrouping.FeildGroupingToplogy FieldGrouping3 FieldGrouping3 4 1 1 4 4

bolt 的分配情况是什么样子?这个留给大家去思考一下。例子中按照field分组后只有两种数据,但是这两种数据要分配给4个bolt,那这个是怎么分配的?我将在下一篇博客里揭晓答案!

下一篇我会继续分析这个分组策略,我会把我在学习这个storm的时候当时的自己思考的一个过程展现给大家,如果有什么错误的,或者没有讲清楚的地方,欢迎大家给我留言,咱们可以一起交流讨论。

storm 的分组策略深入理解(-)的更多相关文章

  1. 简单聊聊Storm的流分组策略

    简单聊聊Storm的流分组策略 首先我要强调的是,Storm的分组策略对结果有着直接的影响,不同的分组的结果一定是不一样的.其次,不同的分组策略对资源的利用也是有着非常大的不同,本文主要讲一讲loca ...

  2. 大数据量场景下storm自定义分组与Hbase预分区完美结合大幅度节省内存空间

    前言:在系统中向hbase中插入数据时,常常通过设置region的预分区来防止大数据量插入的热点问题,提高数据插入的效率,同时可以减少当数据猛增时由于Region split带来的资源消耗.大量的预分 ...

  3. storm自定义分组与Hbase预分区结合节省内存消耗

    Hbas预分区 在系统中向hbase中插入数据时,常常通过设置region的预分区来防止大数据量插入的热点问题,提高数据插入的效率,同时可以减少当数据猛增时由于Region split带来的资源消耗. ...

  4. 【Storm篇】--Storm分组策略

    一.前述 Storm由数源泉spout到bolt时,可以选择分组策略,实现对spout发出的数据的分发.对多个并行度的时候有用. 二.具体原理 1. Shuffle Grouping 随机分组,随机派 ...

  5. storm Tutorial 的解读 + 个人理解

    参考链接: Tutorial storm Tutorial 中文解读+分析 导读.摘要: .hadoop有master与slave,Storm与之对应的节点是什么? .Storm控制节点上面运行一个后 ...

  6. 第1节 storm编程:8、storm的分发策略

    8. Storm的分发策略 Storm当中的分组策略,一共有八种: 所谓的grouping策略就是在Spout与Bolt.Bolt与Bolt之间传递Tuple的方式.总共有八种方式: 1)shuffl ...

  7. Storm流分组介绍

    Storm流分组介绍                流分组是拓扑定义的一部分,每个Bolt指定应该接收哪个流作为输入.流分组定义了流/元组如何在Bolt的任务之间进行分发.在设计拓扑的时候需要定义数据 ...

  8. Storm Grouping —— 流分组策略

    Storm Grouping: Shuffle Grouping :随机分组,尽量均匀分布到下游Bolt中 将流分组定义为混排.这种混排分组意味着来自Spout的输入将混排,或随机分发给此Bolt中的 ...

  9. Storm中并发程度的理解

    Storm中涉及到了很多组件,例如nimbus,supervisor等等,在参考了这两篇文章之后,对这个有了更好的理解. Understanding the parallelism of a Stor ...

随机推荐

  1. hello/hi的简单的网络聊天程序

    hello/hi的简单的网络聊天程序 0 Linux Socket API Berkeley套接字接口,一个应用程序接口(API),使用一个Internet套接字的概念,使主机间或者一台计算机上的进程 ...

  2. Cannon 60D 电池卡在电池槽了,拔不出来怎么办?

    事情是这样的,本来好好的电池在电池槽里的,后来拿去充电了,充满后就准备装回去,然后一个不小心,电池掉地上了,就看了一下没摔爆,所以也没特别留意有没有什么地方摔坏摔瘸角,然后就往相机里塞,突然就发现塞不 ...

  3. 【ABAP系列】ABAP CL_ABAP_CONV_IN_CE

    公众号:SAP Technical 本文作者:matinal 原文出处:http://www.cnblogs.com/SAPmatinal/ 原文链接:[ABAP系列]ABAP CL_ABAP_CON ...

  4. souce and bash 的区别

    对于一些环境变量的配置文件,如想使更改后立即生效,多用 souce +file 执行后即可.如/etc/profile 里加了配置, source 和  bash 的区别: source filena ...

  5. C语言字符串之无重复字符的最长子串

    题目描述 给定一个字符串,请你找出其中不含有重复字符的 最长子串 的长度. 示例 输入: "abcabcbb" 输出: 解释: 因为无重复字符的最长子串是 . 输入: " ...

  6. 【计算机视觉】双目测距(六)--三维重建及UI显示

    原文: http://blog.csdn.NET/chenyusiyuan/article/details/5970799 在获取到视差数据后,利用 OpenCV 的 reProjectImageTo ...

  7. eNSP使用-不同网段的互联

    就像下面这个场景: 1.基本配置 先点击左上角的——新建 然后咱们把要用的设备都拖到面板上去 成品就是这样的: 点击这个为他们添加备注 我们来配置一下实验编址 右键单击PC1设置(PC2同理,就不多演 ...

  8. shell中得到当下路径所有文件夹名称

      方法1: for dir in $(ls -al ./|awk '/^d/ {print $NF}') do echo $dir done   方法2: for dir in $(ls ./) d ...

  9. poj3977(折半枚举+二分查找)

    题目链接:https://vjudge.net/problem/POJ-3977 题意:给一个大小<=35的集合,找一个非空子集合,使得子集合元素和的绝对值最小,如果有多个这样的集合,找元素个数 ...

  10. sysconf获取系统参数

    头文件: #include <unistd.h> 原型:long sysconf(int sysnum); 示例: #include <stdio.h> #include &l ...