storm trident 的介绍与使用
一.trident 的介绍
trident 的英文意思是三叉戟,在这里我的理解是因为之前我们通过之前的学习topology spout bolt 去处理数据是没有问题的,但trident 的对spout bolt 更高层次的一个抽象,其实现功能是一样的,只不过是trident做了更多的优化和封装.如果对处理的性能要求比较高,建议要采用spout bolt 来处理,反之则可以用trident
trident 你可以这样理解,本身这个拓扑就是分散的,如果一个spout 可以有2个bolt,跟三叉戟比较像。(个人理解)
因为trident是对storm 更高一层的抽象,它与之前学的spout bolt处理数据流的方式不一样,trident 是以batch(一组tuples)为单位进行处理的。
二.trident API操作
trident采用批处理的方式来处理数据,其API的操作是对数据处理的方式改成了函数。对数据处理的操作有:filter sum aggregator等
function函数的操作都是对流中的tuple进行操作的
下面介绍 trident 常用的API
.each(Fields inputFields, Filter filter)
作用:操作batch中的每一个tuple内容,一般与Filter或者Function函数配合使用。 .peek(Consumer action)
作用:不作任务操作,传的参数是consumer,类似于System.out.println .partitionBy(Fields fields)
作用:将tuples中的数据按设置的字段重定向到下一处理逻辑,设置相同字段的tuple一定会被分配到同一个线程中处理。
三.trident 的常用函数
.FilterFunction 过滤
作用:对一组batch 中的tuple数据过滤 实现过程是:自定义类实现BaseFilter接口,重写isKeep()方法,在each()方法中使用自定义的类即可 .SumFunction 求和
作用:对流中的数据进行加减
实现过程:自定义类实现BaseFunction接口,重写execute方法,在each()方法中使用
.MapFunction (一对一函数)
作用: 对一个tuple进行自定义操作
实现过程:自定义类实现MapFunction接口,重写execute()方法,通过map()方法使用
.ProjectionFunction (投影函数)
作用:投影函数,只保留stream中指定字段的数据。
实现过程:在project()方法中定义所需字段即可
例:有一个Stream包含如下字段: ["x","y","z"],使用投影: project(new Fields("y", "z")) 则输出的流仅包含 ["y","z"]字段 .repatition(重定向)
作用:重定向是指tuple通过下面哪种方式路由到下一层 shuffle: 通过随机分配算法来均衡tuple到各个分区 broadcast: 每个tuple都被广播到所有的分区,这种方式在drcp时非常有用,比如在每个分区上做stateQuery partitionBy:根据指定的字段列表进行划分,具体做法是用指定字段列表的hash值对分区个数做取模运算,确保相同字段列表的数据被划分到同一个分区 global: 所有的tuple都被发送到这个分区上,这个分区用来处理整个Stream的tuple数据的,但这个线程是独立起的 batchGlobal:一个batch中的tuple都被发送到同一个分区,不同的batch会去往不同的分区 partition: 通过一个自定义的分区函数来进行分区,这个自定义函数实现了 backtype.storm.grouping.CustomStreamGrouping
.Aggregation(聚合)
在storm的trident中处理数据是以批的形式进行处理的,所以在聚合时也是对批量内的数据进行的。经过aggregation的tuple,是被改变了原有的数据状态 在Aggregator接口中有3个方法需要实现 init() : 当batch接收到数据时执行。并对tuple中的数据进行初始化
aggregate(): 在接收到batch中的每一个tuple时执行,该方法一个重定向方法,它会随机启动一个单独的线程来进行聚合操作
complete() : 在一个batch的结束时执行
它是对当前partition上的各个batch执行聚合操作,它不是一个重定向操作,即统计batch上的tuple的操作
6.2 aggregator 对一批batch中的tuple数据进行聚合
6.3 ReduceAggregator
对一批batch中第n个元素的操作
对一批batch中的tuple进行聚合操作,它是一个重定向操作
持久化聚合器,在聚合之前先将数据存到一个位置,然后再对数据进行聚合操作
6.6 AggregateChina
聚合链,对一批batch 中的tuple进行多条件聚合操作
7.GroupBy
GroupBy会根据指定字段,把整个stream切分成一个个grouped stream,如果在grouped stream上做聚合操作,那么聚合就会发生在这些grouped stream上而不是整个batch。 如果groupBy后面跟的是aggregator,则是聚合操作,如果跟的是partitionAggregate,则不是聚合操作。
四.trident常用函数示例
1.FilterFunction
需求:在 一组数据中,过滤出第1个值与第2个值相加的值是偶数的
public class FilterTrident {
private static final Logger LOG = LoggerFactory.getLogger(FilterTrident.class); @SuppressWarnings("unchecked")
public static void main(String[] args) throws InterruptedException {
FixedBatchSpout spout = new FixedBatchSpout(new Fields("a","b","c","d"), 3,
new Values(1,4,7,10),
new Values(1,1,3,11),
new Values(2,2,7,1),
new Values(2,5,7,2));
spout.setCycle(false);
Config conf = new Config();
conf.setNumWorkers(4);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
// peek: 不做任务操作,因为参数的consumer
// each:spout中的指定元素进行操作
topology.newStream("filter", spout).parallelismHint(1)
.localOrShuffle()
.peek(input -> LOG.info("peek1 ================{},{},{},{}",input.get(0),input.get(1),input.get(2),input.get(3)))
.parallelismHint(2)
.localOrShuffle()
.each(new Fields("a","b"),new CheckEvenSumFilter())
.parallelismHint(2)
.localOrShuffle()
.peek(input -> LOG.info("peek2 +++++++++++++++++++{},{},{},{}",
input.getIntegerByField("a"),input.getIntegerByField("b"),
input.getIntegerByField("c"),input.getIntegerByField("d"))
).parallelismHint(1); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("FilterTrident", conf, topology.build()); LOG.warn("==================================================");
LOG.warn("the LocalCluster topology {} is submitted.","FilterTrident");
LOG.warn("=================================================="); TimeUnit.SECONDS.sleep(30);
cluster.killTopology("FilterTrident");
cluster.shutdown();
} private static class CheckEvenSumFilter extends BaseFilter{ @Override
public boolean isKeep(TridentTuple tuple) {
Integer a = tuple.getIntegerByField("a");
Integer b = tuple.getIntegerByField("b"); return (a + b) % 2 == 0;
}
}
}
2.SumFunction
需求:对一组数据中的前2个数求各
public class SumFunctionTrident { private static final Logger LOG = LoggerFactory.getLogger(SumFunctionTrident.class); @SuppressWarnings("unchecked")
public static void main(String[] args) throws InterruptedException {
FixedBatchSpout spout = new FixedBatchSpout(new Fields("a","b","c","d"), 3,
new Values(1,4,7,10),
new Values(1,1,3,11),
new Values(2,2,7,1),
new Values(2,5,7,2));
spout.setCycle(false);
Config conf = new Config();
conf.setNumWorkers(4);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
// peek: 不做任务操作,因为参数的consumer
// each:spout中的指定元素进行操作
topology.newStream("function", spout).parallelismHint(1)
.localOrShuffle()
.peek(input -> LOG.info("peek1 ================{},{},{},{}",input.get(0),input.get(1),input.get(2),input.get(3)))
.parallelismHint(2)
.localOrShuffle()
.each(new Fields("a","b"),new SumFunction(),new Fields("sum"))
.parallelismHint(2)
.localOrShuffle()
.peek(input -> LOG.info("peek2 ================{},{},{},{},{}",
input.getIntegerByField("a"),input.getIntegerByField("b"),input.getIntegerByField("c"),input.getIntegerByField("d"),input.getIntegerByField("sum")))
.parallelismHint(1); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("SumFunctionTrident", conf, topology.build()); LOG.warn("==================================================");
LOG.warn("the LocalCluster topology {} is submitted.","SumFunctionTrident");
LOG.warn("=================================================="); TimeUnit.SECONDS.sleep(30);
cluster.killTopology("HelloTridentTopology");
cluster.shutdown();
} private static class SumFunction extends BaseFunction{ @Override
public void execute(TridentTuple tuple, TridentCollector collector) {
Integer a = tuple.getIntegerByField("a");
Integer b = tuple.getIntegerByField("b");
collector.emit(new Values(a+b));
}
}
}
3.MapFunction
需求:对一组batch中的tuple进行大小写转换
public class MapFunctionTrident { private static final Logger LOG = LoggerFactory.getLogger(MapFunctionTrident.class); @SuppressWarnings("unchecked")
public static void main(String[] args) throws InterruptedException, AlreadyAliveException, InvalidTopologyException, AuthorizationException { boolean isRemoteMode = false;
if(args.length > 0){
isRemoteMode = true;
}
FixedBatchSpout spout = new FixedBatchSpout(new Fields("line"),3,
new Values("hello stream"),
new Values("hello kafka"),
new Values("hello hadoop"),
new Values("hello scala"),
new Values("hello java")
);
spout.setCycle(true);
TridentTopology topology = new TridentTopology();
Config conf = new Config();
conf.setNumWorkers(4);
conf.setDebug(false); topology.newStream("hello", spout).parallelismHint(1)
.localOrShuffle()
.map(new MyMapFunction(),new Fields("upper"))
.parallelismHint(2)
.partition(Grouping.fields(ImmutableList.of("upper")))
.peek(input ->LOG.warn("================>> peek process value:{}",input.getStringByField("upper")))
.parallelismHint(3);
if(isRemoteMode){
StormSubmitter.submitTopology("HelloTridentTopology", conf, topology.build());
LOG.warn("==================================================");
LOG.warn("the remote topology {} is submitted.","HelloTridentTopology");
LOG.warn("==================================================");
}else{
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("HelloTridentTopology", conf, topology.build()); LOG.warn("==================================================");
LOG.warn("the LocalCluster topology {} is submitted.","HelloTridentTopology");
LOG.warn("=================================================="); TimeUnit.SECONDS.sleep(5);
cluster.killTopology("HelloTridentTopology");
cluster.shutdown();
} } private static class MyMapFunction implements MapFunction{
private static final Logger LOG = LoggerFactory.getLogger(MyMapFunction.class); @Override
public Values execute(TridentTuple input) {
String line = input.getStringByField("line");
LOG.warn("================>> myMapFunction process execute:value :{}",line);
return new Values(line.toUpperCase());
}
}
}
4.ProjectionFunctionTrident
需求:对一组tuple的数据,取部分数据
public class ProjectionFunctionTrident { private static final Logger LOG = LoggerFactory.getLogger(ProjectionFunctionTrident.class); public static void main(String [] args) throws InterruptedException{ @SuppressWarnings("unchecked")
FixedBatchSpout spout = new FixedBatchSpout(new Fields("x","y","z"), 3,
new Values(1,2,3),
new Values(4,5,6),
new Values(7,8,9),
new Values(10,11,12)
);
spout.setCycle(false); Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
topology.newStream("ProjectionTrident", spout).parallelismHint(1)
.localOrShuffle().peek(tridentTuple ->LOG.info("================ {}",tridentTuple)).parallelismHint(2)
.shuffle()
.project(new Fields("y","z")).parallelismHint(2)
.localOrShuffle().peek(tridentTuple ->LOG.info(">>>>>>>>>>>>>>>> {}",tridentTuple)).parallelismHint(2); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("ProjectionTrident", conf, topology.build());
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("ProjectionTrident");
cluster.shutdown();
}
}
5.2 Broadcast
需求:将一组batch 的tuple数据发送到所有partition上
public class BroadcastRepartitionTrident { private static final Logger LOG = LoggerFactory.getLogger(BroadcastRepartitionTrident.class); public static void main(String [] args) throws InterruptedException{ @SuppressWarnings("unchecked")
FixedBatchSpout spout = new FixedBatchSpout(new Fields("language","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("haddop",3),
new Values("java",4),
new Values("haddop",5)
);
spout.setCycle(false); Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
topology.newStream("BroadcastRepartitionTrident", spout).parallelismHint(1)
.broadcast().peek(tridentTuple ->LOG.info("================ {}",tridentTuple))
.parallelismHint(2); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("BroadcastRepartitionTrident", conf, topology.build());
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("BroadcastRepartitionTrident");
cluster.shutdown();
}
}
5.3 PartitionBy
需求:将一组batch中的tuple通过设置的字段分到同一个task中执行
public class PartitionByRepartitionTrident { private static final Logger LOG = LoggerFactory.getLogger(PartitionByRepartitionTrident.class); public static void main(String [] args) throws InterruptedException{ @SuppressWarnings("unchecked")
//FixedBatchSpout()里面参数解释:
// 1.spout 的字段名称的设置
// 2.设置数据几个为一个批次
// 3.字段值的设置
FixedBatchSpout spout = new FixedBatchSpout(new Fields("language","age"), 3,
new Values("java",23),
new Values("scala",3),
new Values("haddop",10),
new Values("java",23),
new Values("haddop",10)
);
spout.setCycle(false); Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
topology.newStream("PartitionByRepartitionTrident", spout).parallelismHint(1)
.partitionBy(new Fields("language")).peek(tridentTuple ->LOG.info("++++++++++++++++ {}",tridentTuple))
.parallelismHint(3); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("PartitionByRepartitionTrident", conf, topology.build());
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("PartitionByRepartitionTrident");
cluster.shutdown();
}
}
5.4 Global
需求:对一组batch中的tuple 进行全局分组统计
public class GlobalRepatitionTrident { private static final Logger LOG = LoggerFactory.getLogger(GlobalRepatitionTrident.class); public static void main(String [] args) throws InterruptedException{ @SuppressWarnings("unchecked")
//FixedBatchSpout()里面参数解释:
// 1.spout 的字段名称的设置
// 2.设置数据几个为一个批次
// 3.字段值的设置
FixedBatchSpout spout = new FixedBatchSpout(new Fields("language","age"), 3,
new Values("java",23),
new Values("scala",3),
new Values("haddop",10),
new Values("java",23),
new Values("haddop",10)
);
spout.setCycle(false); Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
topology.newStream("PartitionByRepartitionTrident", spout).parallelismHint(1)
.partitionBy(new Fields("language"))
.parallelismHint(3) //不管配多少个并行度,都没有影响
.peek(tridentTuple ->LOG.info(" ================= {}",tridentTuple))
.global()
.peek(tridentTuple ->LOG.info(" >>>>>>>>>>>>>>>>> {}",tridentTuple)); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("GlobalRepatitionTrident", conf, topology.build());
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("GlobalRepatitionTrident");
cluster.shutdown();
}
}
5.5 batchGlobal
需求:不同batch的tuple分到不同的task中
public class BatchGlobalRepatitionTrident2 { private static final Logger LOG = LoggerFactory.getLogger(BatchGlobalRepatitionTrident2.class); public static void main(String [] args) throws InterruptedException{ @SuppressWarnings("unchecked")
FixedBatchSpout spout = new FixedBatchSpout(new Fields("language","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
spout.setCycle(false); Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
topology.newStream("BatchGlobalRepatitionTrident2", spout).parallelismHint(1)
.batchGlobal().peek(tridentTuple ->LOG.info("++++++++++++++++ {}",tridentTuple))
.parallelismHint(3); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("BatchGlobalRepatitionTrident2", conf, topology.build());
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("BatchGlobalRepatitionTrident2");
cluster.shutdown();
}
}
5.6 partition
需求:自定义partition
public class CustomRepartitionTrident { private static final Logger LOG = LoggerFactory.getLogger(CustomRepartitionTrident.class); public static void main(String [] args) throws InterruptedException{ @SuppressWarnings("unchecked")
FixedBatchSpout spout = new FixedBatchSpout(new Fields("language","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("haddop",3),
new Values("java",4),
new Values("haddop",5)
);
spout.setCycle(false); Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
topology.newStream("CustomRepartitionTrident", spout).parallelismHint(1)
.partition(new HighTaskIDGrouping()).peek(tridentTuple ->LOG.info("++++++++++++++++ {}",tridentTuple))
.parallelismHint(2); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("CustomRepartitionTrident", conf, topology.build());
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("CustomRepartitionTrident");
cluster.shutdown();
}
} /**
* 自定义grouping :
* 让task编号更大的执行任务
* @author pengbo.zhao
*
*/
public class HighTaskIDGrouping implements CustomStreamGrouping{ private int taskID; @Override
public void prepare(WorkerTopologyContext context, GlobalStreamId stream, List<Integer> targetTasks) {
//List<Integer> targetTasks: 下游所有的tasks的集合
ArrayList<Integer> tasks = new ArrayList<>(targetTasks);
Collections.sort(tasks); //从小到大排列
this.taskID = tasks.get(tasks.size() -1);
} @Override
public List<Integer> chooseTasks(int taskId, List<Object> values) { return Arrays.asList(taskID);
}
}
6.1 partitionAggregate
需求:对一组batch中tuple个数的统计
public class PartitionAggregateTrident { private static final Logger LOG = LoggerFactory.getLogger(PartitionAggregateTrident.class); private FixedBatchSpout spout; @SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
} @Test
public void testPartitionAggregtor(){ TridentTopology topoloty = new TridentTopology();
topoloty.newStream("PartitionAggregateTrident", spout).parallelismHint(2)//内部的优先级参数是1,所以我们写2是无效的
.shuffle()
.partitionAggregate(new Fields("name","age"), new Count(),new Fields("count"))
.parallelismHint(2)
// .each(new Fields("count"),new Debug());
.peek(input ->LOG.info(" >>>>>>>>>>>>>>>>> {}",input.getLongByField("count"))); this.submitTopology("PartitionAggregateTrident", topoloty.build());
} public void submitTopology(String name,StormTopology topology) { LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology); try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
} public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
}
}
6.2 aggregator
需求:对tuple中的数据进行统计
public class AggregateTrident { private static final Logger LOG = LoggerFactory.getLogger(AggregateTrident.class); private FixedBatchSpout spout; @SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
} @Test
public void testPartitionAggregtor(){ TridentTopology topoloty = new TridentTopology();
topoloty.newStream("AggregateTrident", spout).parallelismHint(2)
.partitionBy(new Fields("name"))
.aggregate(new Fields("name","age"), new Count(),new Fields("count"))
// .aggregate(new Fields("name","age"), new CountAsAggregator(),new Fields("count"))
.parallelismHint(2)
.each(new Fields("count"),new Debug())
.peek(input -> LOG.info("============> count:{}",input.getLongByField("count"))); this.submitTopology("AggregateTrident", topoloty.build());
} public void submitTopology(String name,StormTopology topology) { LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology); try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
} public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
}
}
6.3 reduceAggregator
需求:对一批batch 中的tuple第0个元素求和。 即一批batch中的多少条tuple,对tuple中的指定字段求和
public class ReduceAggregatorTrident { private FixedBatchSpout spout; @SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
} @Test
public void testReduceAggregator(){ TridentTopology topoloty = new TridentTopology();
topoloty.newStream("ReduceAggregator", spout).parallelismHint(2)
.partitionBy(new Fields("name"))
.aggregate(new Fields("age","name"), new MyReduce(),new Fields("sum"))
.parallelismHint(5)
.each(new Fields("sum"),new Debug()); this.submitTopology("ReduceAggregator", topoloty.build());
} public void submitTopology(String name,StormTopology topology) { LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology); try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
} public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
} static class MyReduce implements ReducerAggregator<Integer>{ @Override
public Integer init() {
return 0; //初始值为0
} @Override
public Integer reduce(Integer curr, TridentTuple tuple) { return curr + tuple.getInteger(0);
}
}
}
6.4 combinerAggregate
需求:对tuple中的字段进行求和操作
public class CombinerAggregate { private FixedBatchSpout spout; @SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
} @Test
public void testCombinerAggregate(){ TridentTopology topoloty = new TridentTopology();
topoloty.newStream("CombinerAggregate", spout).parallelismHint(2)
.partitionBy(new Fields("name"))
.aggregate(new Fields("age"), new MyCount(),new Fields("count"))
.parallelismHint(5)
.each(new Fields("count"),new Debug());
this.submitTopology("CombinerAggregate", topoloty.build());
} public void submitTopology(String name,StormTopology topology) { LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology); try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
} public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
} static class MyCount implements CombinerAggregator<Integer>{ @Override
public Integer init(TridentTuple tuple) {
return tuple.getInteger(0);
} @Override
public Integer combine(Integer val1, Integer val2) {
return val1 + val2;
} @Override
public Integer zero() {
return 0;
}
}
}
6.5 persistenceAggregator
需求:对一批batch中tuple元素进行统计
public class PersistenceAggregator { private static final Logger LOG = LoggerFactory.getLogger(PersistenceAggregator.class); private FixedBatchSpout spout; @SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
} @Test
public void testPersistenceAggregator(){ TridentTopology topoloty = new TridentTopology();
topoloty.newStream("testPersistenceAggregator", spout).parallelismHint(2)
.partitionBy(new Fields("name"))
.persistentAggregate(new MemoryMapState.Factory(), new Fields("name"), new Count(),new Fields("count"))
.parallelismHint(4)
.newValuesStream()
.peek(input ->LOG.info("count:{}",input.getLongByField("count")));
this.submitTopology("testPersistenceAggregator", topoloty.build());
} public void submitTopology(String name,StormTopology topology) { LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology); try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
} public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
} }
6.6 AggregateChina
需求:对batch中的tuple进行统计、求和、统计操作
public class AggregateChina {
private static final Logger LOG = LoggerFactory.getLogger(AggregateChina.class); private FixedBatchSpout spout; @SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"),3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
} @Test
public void testAggregateChina(){ TridentTopology topoloty = new TridentTopology();
topoloty.newStream("AggregateChina", spout).parallelismHint(2)
.partitionBy(new Fields("name"))
.chainedAgg()
.aggregate(new Fields("name"),new Count(), new Fields("count"))
.aggregate(new Fields("age"),new Sum(), new Fields("sum"))
.aggregate(new Fields("age"),new Count(), new Fields("count2"))
.chainEnd()
.peek(tuple->LOG.info("{}",tuple));
this.submitTopology("AggregateChina", topoloty.build());
} public void submitTopology(String name,StormTopology topology) { LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology); try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
} public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
}
}
7.GroupBy
需求:对一批batch中的tuple按name来分组,求对分组后的tuple中的数据进行统计
public class GroupBy { private static final Logger LOG = LoggerFactory.getLogger(GroupBy.class); private FixedBatchSpout spout; @SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
} @Test
public void testGroupBy(){ TridentTopology topoloty = new TridentTopology();
topoloty.newStream("GroupBy", spout).parallelismHint(1)
// .partitionBy(new Fields("name"))
.groupBy(new Fields("name"))
.aggregate(new Count(), new Fields("count"))
.peek(tuple -> LOG.info("{},{}",tuple.getFields(),tuple)); this.submitTopology("GroupBy", topoloty.build());
} public void submitTopology(String name,StormTopology topology) { LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology); try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
} public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
}
}
storm trident 的介绍与使用的更多相关文章
- Strom-7 Storm Trident 详细介绍
一.概要 1.1 Storm(简介) Storm是一个实时的可靠地分布式流计算框架. 具体就不多说了,举个例子,它的一个典型的大数据实时计算应用场景:从Kafka消息队列读取消息( ...
- Storm Trident API
在Storm Trident中有五种操作类型 Apply Locally:本地操作,所有操作应用在本地节点数据上,不会产生网络传输 Repartitioning:数据流重定向,单纯的改变数据流向,不会 ...
- Storm专题二:Storm Trident API 使用具体解释
一.概述 Storm Trident中的核心数据模型就是"Stream",也就是说,Storm Trident处理的是Stream.可是实际上Stream是被成批处理的. ...
- storm trident 示例
Storm Trident的核心数据模型是一批一批被处理的“流”,“流”在集群的分区在集群的节点上,对“流”的操作也是并行的在每个分区上进行. Trident有五种对“流”的操作: 1. 不 ...
- Storm流分组介绍
Storm流分组介绍 流分组是拓扑定义的一部分,每个Bolt指定应该接收哪个流作为输入.流分组定义了流/元组如何在Bolt的任务之间进行分发.在设计拓扑的时候需要定义数据 ...
- storm trident merger
import java.util.List; import backtype.storm.Config; import backtype.storm.LocalCluster; import back ...
- storm trident的filter和函数
目的:通过kafka输出的信息进行过滤,添加指定的字段后,进行打印 SentenceSpout: package Trident; import java.util.HashMap; import j ...
- storm trident function函数
package cn.crxy.trident; import java.util.List; import backtype.storm.Config; import backtype.storm. ...
- 第1节 storm编程:2、storm的基本介绍
课程大纲: 1.storm的基本介绍 2.storm的架构模型 3.storm的安装 4.storm的UI管理界面 5.storm的编程模型 6.storm的入门程序 7.storm的并行度 8.st ...
随机推荐
- arcgis api for js 4.X 出现跨域问题
arcgis api for js 4.X 出现跨域问题 XMLHttpRequest cannot load http://localhost/4.3/4.3/esri/workers/mutabl ...
- hadoop之hive高级操作
在输出结果较多,需要输出到文件中时,可以在hive CLI之外执行hive -e "sql" > output.txt操作 但当SQL语句太长或太多时,这种方式不是很方便,可 ...
- Centos 7 防火墙 firewalld 简单使用说明
1.firewalld简介 firewalld是centos7的一大特性,最大的好处有两个:支持动态更新,不用重启服务:第二个就是加入了防火墙的“zone”概念 2.firewalld命令行界面管 ...
- 《阿里巴巴Java开发手册》改名《Java开发手册》,涵盖史无前例的三大升级
2019.06.19 <阿里巴巴Java开发手册>时隔一年,发布更新1.5.0华山版.同时,将更名为<Java开发手册>,涵盖史无前例的三大升级 1)鉴于本手册是社区开发者集体 ...
- 点菜网---Java开源生鲜电商平台-系统架构图(源码可下载)
点菜网---Java开源生鲜电商平台-系统架构图(源码可下载) 1.点菜网-生鲜电商平台的价值与定位. 生鲜电商平台是一家致力于打造全国餐饮行业智能化.便利化.平台化与透明化服务的创新型移动互联网平台 ...
- spring boot 2.x 系列 —— spring boot 整合 servlet 3.0
文章目录 一.说明 1.1 项目结构说明 1.2 项目依赖 二.采用spring 注册方式整合 servlet 2.1 新建过滤器.监听器和servlet 2.2 注册过滤器.监听器和servlet ...
- nginx之gzip压缩
nginx的gizp压缩 为了使网站节省带宽和加快访问速度,在服务器方面的一个优化的就是使用nginx提供的gzip压缩. 一.使用压缩原理: 1.当用户使用浏览器访问网站时,就是在发送一个http请 ...
- Go - Slice 切片
概述 切片是一种动态数组,比数组操作灵活,长度不是固定的,可以进行追加和删除. len() 和 cap() 返回结果可相同和不同. 声明切片 //demo_7.go package main impo ...
- Jenkins+svn+ftp自动化发布asp.net项目
今天将自己所掌握的(Jenkins+svn+ftp自动化发布asp.net项目)知识分享给大家,希望能帮组到大家: (1)先下载Jenkins并安装: (2)安装.Net所需要的插件: (3)配置插件 ...
- Java上机题(封装)(编写student类)
今天帮大一的童鞋写Java上机题 题目虽然很简单,但是刚拿到题目的时候愣了一下,然后就疯狂get set QuQ 其实这是一个特别基本的封装的题目(之前实验室面试大二的时候竟然还有蛮多人不知道封装的概 ...