storm trident 的介绍与使用
一.trident 的介绍
trident 的英文意思是三叉戟,在这里我的理解是因为之前我们通过之前的学习topology spout bolt 去处理数据是没有问题的,但trident 的对spout bolt 更高层次的一个抽象,其实现功能是一样的,只不过是trident做了更多的优化和封装.如果对处理的性能要求比较高,建议要采用spout bolt 来处理,反之则可以用trident
trident 你可以这样理解,本身这个拓扑就是分散的,如果一个spout 可以有2个bolt,跟三叉戟比较像。(个人理解)
因为trident是对storm 更高一层的抽象,它与之前学的spout bolt处理数据流的方式不一样,trident 是以batch(一组tuples)为单位进行处理的。
二.trident API操作
trident采用批处理的方式来处理数据,其API的操作是对数据处理的方式改成了函数。对数据处理的操作有:filter sum aggregator等
function函数的操作都是对流中的tuple进行操作的
下面介绍 trident 常用的API
.each(Fields inputFields, Filter filter)
作用:操作batch中的每一个tuple内容,一般与Filter或者Function函数配合使用。 .peek(Consumer action)
作用:不作任务操作,传的参数是consumer,类似于System.out.println .partitionBy(Fields fields)
作用:将tuples中的数据按设置的字段重定向到下一处理逻辑,设置相同字段的tuple一定会被分配到同一个线程中处理。
三.trident 的常用函数
.FilterFunction 过滤
作用:对一组batch 中的tuple数据过滤 实现过程是:自定义类实现BaseFilter接口,重写isKeep()方法,在each()方法中使用自定义的类即可 .SumFunction 求和
作用:对流中的数据进行加减
实现过程:自定义类实现BaseFunction接口,重写execute方法,在each()方法中使用
.MapFunction (一对一函数)
作用: 对一个tuple进行自定义操作
实现过程:自定义类实现MapFunction接口,重写execute()方法,通过map()方法使用
.ProjectionFunction (投影函数)
作用:投影函数,只保留stream中指定字段的数据。
实现过程:在project()方法中定义所需字段即可
例:有一个Stream包含如下字段: ["x","y","z"],使用投影: project(new Fields("y", "z")) 则输出的流仅包含 ["y","z"]字段 .repatition(重定向)
作用:重定向是指tuple通过下面哪种方式路由到下一层 shuffle: 通过随机分配算法来均衡tuple到各个分区 broadcast: 每个tuple都被广播到所有的分区,这种方式在drcp时非常有用,比如在每个分区上做stateQuery partitionBy:根据指定的字段列表进行划分,具体做法是用指定字段列表的hash值对分区个数做取模运算,确保相同字段列表的数据被划分到同一个分区 global: 所有的tuple都被发送到这个分区上,这个分区用来处理整个Stream的tuple数据的,但这个线程是独立起的 batchGlobal:一个batch中的tuple都被发送到同一个分区,不同的batch会去往不同的分区 partition: 通过一个自定义的分区函数来进行分区,这个自定义函数实现了 backtype.storm.grouping.CustomStreamGrouping
.Aggregation(聚合)
在storm的trident中处理数据是以批的形式进行处理的,所以在聚合时也是对批量内的数据进行的。经过aggregation的tuple,是被改变了原有的数据状态 在Aggregator接口中有3个方法需要实现 init() : 当batch接收到数据时执行。并对tuple中的数据进行初始化
aggregate(): 在接收到batch中的每一个tuple时执行,该方法一个重定向方法,它会随机启动一个单独的线程来进行聚合操作
complete() : 在一个batch的结束时执行
它是对当前partition上的各个batch执行聚合操作,它不是一个重定向操作,即统计batch上的tuple的操作
6.2 aggregator 对一批batch中的tuple数据进行聚合
6.3 ReduceAggregator
对一批batch中第n个元素的操作
对一批batch中的tuple进行聚合操作,它是一个重定向操作
持久化聚合器,在聚合之前先将数据存到一个位置,然后再对数据进行聚合操作
6.6 AggregateChina
聚合链,对一批batch 中的tuple进行多条件聚合操作
7.GroupBy
GroupBy会根据指定字段,把整个stream切分成一个个grouped stream,如果在grouped stream上做聚合操作,那么聚合就会发生在这些grouped stream上而不是整个batch。 如果groupBy后面跟的是aggregator,则是聚合操作,如果跟的是partitionAggregate,则不是聚合操作。
四.trident常用函数示例
1.FilterFunction
需求:在 一组数据中,过滤出第1个值与第2个值相加的值是偶数的
public class FilterTrident {
private static final Logger LOG = LoggerFactory.getLogger(FilterTrident.class); @SuppressWarnings("unchecked")
public static void main(String[] args) throws InterruptedException {
FixedBatchSpout spout = new FixedBatchSpout(new Fields("a","b","c","d"), 3,
new Values(1,4,7,10),
new Values(1,1,3,11),
new Values(2,2,7,1),
new Values(2,5,7,2));
spout.setCycle(false);
Config conf = new Config();
conf.setNumWorkers(4);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
// peek: 不做任务操作,因为参数的consumer
// each:spout中的指定元素进行操作
topology.newStream("filter", spout).parallelismHint(1)
.localOrShuffle()
.peek(input -> LOG.info("peek1 ================{},{},{},{}",input.get(0),input.get(1),input.get(2),input.get(3)))
.parallelismHint(2)
.localOrShuffle()
.each(new Fields("a","b"),new CheckEvenSumFilter())
.parallelismHint(2)
.localOrShuffle()
.peek(input -> LOG.info("peek2 +++++++++++++++++++{},{},{},{}",
input.getIntegerByField("a"),input.getIntegerByField("b"),
input.getIntegerByField("c"),input.getIntegerByField("d"))
).parallelismHint(1); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("FilterTrident", conf, topology.build()); LOG.warn("==================================================");
LOG.warn("the LocalCluster topology {} is submitted.","FilterTrident");
LOG.warn("=================================================="); TimeUnit.SECONDS.sleep(30);
cluster.killTopology("FilterTrident");
cluster.shutdown();
} private static class CheckEvenSumFilter extends BaseFilter{ @Override
public boolean isKeep(TridentTuple tuple) {
Integer a = tuple.getIntegerByField("a");
Integer b = tuple.getIntegerByField("b"); return (a + b) % 2 == 0;
}
}
}
2.SumFunction
需求:对一组数据中的前2个数求各
public class SumFunctionTrident { private static final Logger LOG = LoggerFactory.getLogger(SumFunctionTrident.class); @SuppressWarnings("unchecked")
public static void main(String[] args) throws InterruptedException {
FixedBatchSpout spout = new FixedBatchSpout(new Fields("a","b","c","d"), 3,
new Values(1,4,7,10),
new Values(1,1,3,11),
new Values(2,2,7,1),
new Values(2,5,7,2));
spout.setCycle(false);
Config conf = new Config();
conf.setNumWorkers(4);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
// peek: 不做任务操作,因为参数的consumer
// each:spout中的指定元素进行操作
topology.newStream("function", spout).parallelismHint(1)
.localOrShuffle()
.peek(input -> LOG.info("peek1 ================{},{},{},{}",input.get(0),input.get(1),input.get(2),input.get(3)))
.parallelismHint(2)
.localOrShuffle()
.each(new Fields("a","b"),new SumFunction(),new Fields("sum"))
.parallelismHint(2)
.localOrShuffle()
.peek(input -> LOG.info("peek2 ================{},{},{},{},{}",
input.getIntegerByField("a"),input.getIntegerByField("b"),input.getIntegerByField("c"),input.getIntegerByField("d"),input.getIntegerByField("sum")))
.parallelismHint(1); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("SumFunctionTrident", conf, topology.build()); LOG.warn("==================================================");
LOG.warn("the LocalCluster topology {} is submitted.","SumFunctionTrident");
LOG.warn("=================================================="); TimeUnit.SECONDS.sleep(30);
cluster.killTopology("HelloTridentTopology");
cluster.shutdown();
} private static class SumFunction extends BaseFunction{ @Override
public void execute(TridentTuple tuple, TridentCollector collector) {
Integer a = tuple.getIntegerByField("a");
Integer b = tuple.getIntegerByField("b");
collector.emit(new Values(a+b));
}
}
}
3.MapFunction
需求:对一组batch中的tuple进行大小写转换
public class MapFunctionTrident { private static final Logger LOG = LoggerFactory.getLogger(MapFunctionTrident.class); @SuppressWarnings("unchecked")
public static void main(String[] args) throws InterruptedException, AlreadyAliveException, InvalidTopologyException, AuthorizationException { boolean isRemoteMode = false;
if(args.length > 0){
isRemoteMode = true;
}
FixedBatchSpout spout = new FixedBatchSpout(new Fields("line"),3,
new Values("hello stream"),
new Values("hello kafka"),
new Values("hello hadoop"),
new Values("hello scala"),
new Values("hello java")
);
spout.setCycle(true);
TridentTopology topology = new TridentTopology();
Config conf = new Config();
conf.setNumWorkers(4);
conf.setDebug(false); topology.newStream("hello", spout).parallelismHint(1)
.localOrShuffle()
.map(new MyMapFunction(),new Fields("upper"))
.parallelismHint(2)
.partition(Grouping.fields(ImmutableList.of("upper")))
.peek(input ->LOG.warn("================>> peek process value:{}",input.getStringByField("upper")))
.parallelismHint(3);
if(isRemoteMode){
StormSubmitter.submitTopology("HelloTridentTopology", conf, topology.build());
LOG.warn("==================================================");
LOG.warn("the remote topology {} is submitted.","HelloTridentTopology");
LOG.warn("==================================================");
}else{
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("HelloTridentTopology", conf, topology.build()); LOG.warn("==================================================");
LOG.warn("the LocalCluster topology {} is submitted.","HelloTridentTopology");
LOG.warn("=================================================="); TimeUnit.SECONDS.sleep(5);
cluster.killTopology("HelloTridentTopology");
cluster.shutdown();
} } private static class MyMapFunction implements MapFunction{
private static final Logger LOG = LoggerFactory.getLogger(MyMapFunction.class); @Override
public Values execute(TridentTuple input) {
String line = input.getStringByField("line");
LOG.warn("================>> myMapFunction process execute:value :{}",line);
return new Values(line.toUpperCase());
}
}
}
4.ProjectionFunctionTrident
需求:对一组tuple的数据,取部分数据
public class ProjectionFunctionTrident { private static final Logger LOG = LoggerFactory.getLogger(ProjectionFunctionTrident.class); public static void main(String [] args) throws InterruptedException{ @SuppressWarnings("unchecked")
FixedBatchSpout spout = new FixedBatchSpout(new Fields("x","y","z"), 3,
new Values(1,2,3),
new Values(4,5,6),
new Values(7,8,9),
new Values(10,11,12)
);
spout.setCycle(false); Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
topology.newStream("ProjectionTrident", spout).parallelismHint(1)
.localOrShuffle().peek(tridentTuple ->LOG.info("================ {}",tridentTuple)).parallelismHint(2)
.shuffle()
.project(new Fields("y","z")).parallelismHint(2)
.localOrShuffle().peek(tridentTuple ->LOG.info(">>>>>>>>>>>>>>>> {}",tridentTuple)).parallelismHint(2); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("ProjectionTrident", conf, topology.build());
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("ProjectionTrident");
cluster.shutdown();
}
}
5.2 Broadcast
需求:将一组batch 的tuple数据发送到所有partition上
public class BroadcastRepartitionTrident { private static final Logger LOG = LoggerFactory.getLogger(BroadcastRepartitionTrident.class); public static void main(String [] args) throws InterruptedException{ @SuppressWarnings("unchecked")
FixedBatchSpout spout = new FixedBatchSpout(new Fields("language","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("haddop",3),
new Values("java",4),
new Values("haddop",5)
);
spout.setCycle(false); Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
topology.newStream("BroadcastRepartitionTrident", spout).parallelismHint(1)
.broadcast().peek(tridentTuple ->LOG.info("================ {}",tridentTuple))
.parallelismHint(2); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("BroadcastRepartitionTrident", conf, topology.build());
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("BroadcastRepartitionTrident");
cluster.shutdown();
}
}
5.3 PartitionBy
需求:将一组batch中的tuple通过设置的字段分到同一个task中执行
public class PartitionByRepartitionTrident { private static final Logger LOG = LoggerFactory.getLogger(PartitionByRepartitionTrident.class); public static void main(String [] args) throws InterruptedException{ @SuppressWarnings("unchecked")
//FixedBatchSpout()里面参数解释:
// 1.spout 的字段名称的设置
// 2.设置数据几个为一个批次
// 3.字段值的设置
FixedBatchSpout spout = new FixedBatchSpout(new Fields("language","age"), 3,
new Values("java",23),
new Values("scala",3),
new Values("haddop",10),
new Values("java",23),
new Values("haddop",10)
);
spout.setCycle(false); Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
topology.newStream("PartitionByRepartitionTrident", spout).parallelismHint(1)
.partitionBy(new Fields("language")).peek(tridentTuple ->LOG.info("++++++++++++++++ {}",tridentTuple))
.parallelismHint(3); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("PartitionByRepartitionTrident", conf, topology.build());
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("PartitionByRepartitionTrident");
cluster.shutdown();
}
}
5.4 Global
需求:对一组batch中的tuple 进行全局分组统计
public class GlobalRepatitionTrident { private static final Logger LOG = LoggerFactory.getLogger(GlobalRepatitionTrident.class); public static void main(String [] args) throws InterruptedException{ @SuppressWarnings("unchecked")
//FixedBatchSpout()里面参数解释:
// 1.spout 的字段名称的设置
// 2.设置数据几个为一个批次
// 3.字段值的设置
FixedBatchSpout spout = new FixedBatchSpout(new Fields("language","age"), 3,
new Values("java",23),
new Values("scala",3),
new Values("haddop",10),
new Values("java",23),
new Values("haddop",10)
);
spout.setCycle(false); Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
topology.newStream("PartitionByRepartitionTrident", spout).parallelismHint(1)
.partitionBy(new Fields("language"))
.parallelismHint(3) //不管配多少个并行度,都没有影响
.peek(tridentTuple ->LOG.info(" ================= {}",tridentTuple))
.global()
.peek(tridentTuple ->LOG.info(" >>>>>>>>>>>>>>>>> {}",tridentTuple)); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("GlobalRepatitionTrident", conf, topology.build());
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("GlobalRepatitionTrident");
cluster.shutdown();
}
}
5.5 batchGlobal
需求:不同batch的tuple分到不同的task中
public class BatchGlobalRepatitionTrident2 { private static final Logger LOG = LoggerFactory.getLogger(BatchGlobalRepatitionTrident2.class); public static void main(String [] args) throws InterruptedException{ @SuppressWarnings("unchecked")
FixedBatchSpout spout = new FixedBatchSpout(new Fields("language","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
spout.setCycle(false); Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
topology.newStream("BatchGlobalRepatitionTrident2", spout).parallelismHint(1)
.batchGlobal().peek(tridentTuple ->LOG.info("++++++++++++++++ {}",tridentTuple))
.parallelismHint(3); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("BatchGlobalRepatitionTrident2", conf, topology.build());
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("BatchGlobalRepatitionTrident2");
cluster.shutdown();
}
}
5.6 partition
需求:自定义partition
public class CustomRepartitionTrident { private static final Logger LOG = LoggerFactory.getLogger(CustomRepartitionTrident.class); public static void main(String [] args) throws InterruptedException{ @SuppressWarnings("unchecked")
FixedBatchSpout spout = new FixedBatchSpout(new Fields("language","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("haddop",3),
new Values("java",4),
new Values("haddop",5)
);
spout.setCycle(false); Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
topology.newStream("CustomRepartitionTrident", spout).parallelismHint(1)
.partition(new HighTaskIDGrouping()).peek(tridentTuple ->LOG.info("++++++++++++++++ {}",tridentTuple))
.parallelismHint(2); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("CustomRepartitionTrident", conf, topology.build());
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("CustomRepartitionTrident");
cluster.shutdown();
}
} /**
* 自定义grouping :
* 让task编号更大的执行任务
* @author pengbo.zhao
*
*/
public class HighTaskIDGrouping implements CustomStreamGrouping{ private int taskID; @Override
public void prepare(WorkerTopologyContext context, GlobalStreamId stream, List<Integer> targetTasks) {
//List<Integer> targetTasks: 下游所有的tasks的集合
ArrayList<Integer> tasks = new ArrayList<>(targetTasks);
Collections.sort(tasks); //从小到大排列
this.taskID = tasks.get(tasks.size() -1);
} @Override
public List<Integer> chooseTasks(int taskId, List<Object> values) { return Arrays.asList(taskID);
}
}
6.1 partitionAggregate
需求:对一组batch中tuple个数的统计
public class PartitionAggregateTrident { private static final Logger LOG = LoggerFactory.getLogger(PartitionAggregateTrident.class); private FixedBatchSpout spout; @SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
} @Test
public void testPartitionAggregtor(){ TridentTopology topoloty = new TridentTopology();
topoloty.newStream("PartitionAggregateTrident", spout).parallelismHint(2)//内部的优先级参数是1,所以我们写2是无效的
.shuffle()
.partitionAggregate(new Fields("name","age"), new Count(),new Fields("count"))
.parallelismHint(2)
// .each(new Fields("count"),new Debug());
.peek(input ->LOG.info(" >>>>>>>>>>>>>>>>> {}",input.getLongByField("count"))); this.submitTopology("PartitionAggregateTrident", topoloty.build());
} public void submitTopology(String name,StormTopology topology) { LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology); try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
} public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
}
}
6.2 aggregator
需求:对tuple中的数据进行统计
public class AggregateTrident { private static final Logger LOG = LoggerFactory.getLogger(AggregateTrident.class); private FixedBatchSpout spout; @SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
} @Test
public void testPartitionAggregtor(){ TridentTopology topoloty = new TridentTopology();
topoloty.newStream("AggregateTrident", spout).parallelismHint(2)
.partitionBy(new Fields("name"))
.aggregate(new Fields("name","age"), new Count(),new Fields("count"))
// .aggregate(new Fields("name","age"), new CountAsAggregator(),new Fields("count"))
.parallelismHint(2)
.each(new Fields("count"),new Debug())
.peek(input -> LOG.info("============> count:{}",input.getLongByField("count"))); this.submitTopology("AggregateTrident", topoloty.build());
} public void submitTopology(String name,StormTopology topology) { LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology); try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
} public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
}
}
6.3 reduceAggregator
需求:对一批batch 中的tuple第0个元素求和。 即一批batch中的多少条tuple,对tuple中的指定字段求和
public class ReduceAggregatorTrident { private FixedBatchSpout spout; @SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
} @Test
public void testReduceAggregator(){ TridentTopology topoloty = new TridentTopology();
topoloty.newStream("ReduceAggregator", spout).parallelismHint(2)
.partitionBy(new Fields("name"))
.aggregate(new Fields("age","name"), new MyReduce(),new Fields("sum"))
.parallelismHint(5)
.each(new Fields("sum"),new Debug()); this.submitTopology("ReduceAggregator", topoloty.build());
} public void submitTopology(String name,StormTopology topology) { LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology); try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
} public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
} static class MyReduce implements ReducerAggregator<Integer>{ @Override
public Integer init() {
return 0; //初始值为0
} @Override
public Integer reduce(Integer curr, TridentTuple tuple) { return curr + tuple.getInteger(0);
}
}
}
6.4 combinerAggregate
需求:对tuple中的字段进行求和操作
public class CombinerAggregate { private FixedBatchSpout spout; @SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
} @Test
public void testCombinerAggregate(){ TridentTopology topoloty = new TridentTopology();
topoloty.newStream("CombinerAggregate", spout).parallelismHint(2)
.partitionBy(new Fields("name"))
.aggregate(new Fields("age"), new MyCount(),new Fields("count"))
.parallelismHint(5)
.each(new Fields("count"),new Debug());
this.submitTopology("CombinerAggregate", topoloty.build());
} public void submitTopology(String name,StormTopology topology) { LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology); try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
} public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
} static class MyCount implements CombinerAggregator<Integer>{ @Override
public Integer init(TridentTuple tuple) {
return tuple.getInteger(0);
} @Override
public Integer combine(Integer val1, Integer val2) {
return val1 + val2;
} @Override
public Integer zero() {
return 0;
}
}
}
6.5 persistenceAggregator
需求:对一批batch中tuple元素进行统计
public class PersistenceAggregator { private static final Logger LOG = LoggerFactory.getLogger(PersistenceAggregator.class); private FixedBatchSpout spout; @SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
} @Test
public void testPersistenceAggregator(){ TridentTopology topoloty = new TridentTopology();
topoloty.newStream("testPersistenceAggregator", spout).parallelismHint(2)
.partitionBy(new Fields("name"))
.persistentAggregate(new MemoryMapState.Factory(), new Fields("name"), new Count(),new Fields("count"))
.parallelismHint(4)
.newValuesStream()
.peek(input ->LOG.info("count:{}",input.getLongByField("count")));
this.submitTopology("testPersistenceAggregator", topoloty.build());
} public void submitTopology(String name,StormTopology topology) { LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology); try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
} public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
} }
6.6 AggregateChina
需求:对batch中的tuple进行统计、求和、统计操作
public class AggregateChina {
private static final Logger LOG = LoggerFactory.getLogger(AggregateChina.class); private FixedBatchSpout spout; @SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"),3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
} @Test
public void testAggregateChina(){ TridentTopology topoloty = new TridentTopology();
topoloty.newStream("AggregateChina", spout).parallelismHint(2)
.partitionBy(new Fields("name"))
.chainedAgg()
.aggregate(new Fields("name"),new Count(), new Fields("count"))
.aggregate(new Fields("age"),new Sum(), new Fields("sum"))
.aggregate(new Fields("age"),new Count(), new Fields("count2"))
.chainEnd()
.peek(tuple->LOG.info("{}",tuple));
this.submitTopology("AggregateChina", topoloty.build());
} public void submitTopology(String name,StormTopology topology) { LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology); try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
} public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
}
}
7.GroupBy
需求:对一批batch中的tuple按name来分组,求对分组后的tuple中的数据进行统计
public class GroupBy { private static final Logger LOG = LoggerFactory.getLogger(GroupBy.class); private FixedBatchSpout spout; @SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
} @Test
public void testGroupBy(){ TridentTopology topoloty = new TridentTopology();
topoloty.newStream("GroupBy", spout).parallelismHint(1)
// .partitionBy(new Fields("name"))
.groupBy(new Fields("name"))
.aggregate(new Count(), new Fields("count"))
.peek(tuple -> LOG.info("{},{}",tuple.getFields(),tuple)); this.submitTopology("GroupBy", topoloty.build());
} public void submitTopology(String name,StormTopology topology) { LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology); try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
} public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
}
}
storm trident 的介绍与使用的更多相关文章
- Strom-7 Storm Trident 详细介绍
一.概要 1.1 Storm(简介) Storm是一个实时的可靠地分布式流计算框架. 具体就不多说了,举个例子,它的一个典型的大数据实时计算应用场景:从Kafka消息队列读取消息( ...
- Storm Trident API
在Storm Trident中有五种操作类型 Apply Locally:本地操作,所有操作应用在本地节点数据上,不会产生网络传输 Repartitioning:数据流重定向,单纯的改变数据流向,不会 ...
- Storm专题二:Storm Trident API 使用具体解释
一.概述 Storm Trident中的核心数据模型就是"Stream",也就是说,Storm Trident处理的是Stream.可是实际上Stream是被成批处理的. ...
- storm trident 示例
Storm Trident的核心数据模型是一批一批被处理的“流”,“流”在集群的分区在集群的节点上,对“流”的操作也是并行的在每个分区上进行. Trident有五种对“流”的操作: 1. 不 ...
- Storm流分组介绍
Storm流分组介绍 流分组是拓扑定义的一部分,每个Bolt指定应该接收哪个流作为输入.流分组定义了流/元组如何在Bolt的任务之间进行分发.在设计拓扑的时候需要定义数据 ...
- storm trident merger
import java.util.List; import backtype.storm.Config; import backtype.storm.LocalCluster; import back ...
- storm trident的filter和函数
目的:通过kafka输出的信息进行过滤,添加指定的字段后,进行打印 SentenceSpout: package Trident; import java.util.HashMap; import j ...
- storm trident function函数
package cn.crxy.trident; import java.util.List; import backtype.storm.Config; import backtype.storm. ...
- 第1节 storm编程:2、storm的基本介绍
课程大纲: 1.storm的基本介绍 2.storm的架构模型 3.storm的安装 4.storm的UI管理界面 5.storm的编程模型 6.storm的入门程序 7.storm的并行度 8.st ...
随机推荐
- 当一个控件属性不存在的时候,IDE会出错在这里(说明是TWinControl.ReadState在读属性,并执行相关动作)
procedure TWinControl.ReadState(Reader: TReader); begin DisableAlign; try inherited ReadState(Reader ...
- 使用 docker 搭建 nginx+php-fpm 环境 (两个独立镜像)
:first-child{margin-top:0!important}.markdown-body>:last-child{margin-bottom:0!important}.markdow ...
- hdu4633_Polya定理
典型的Polya定理,还算比较简单,比赛的时候知道是Polya定理但是根本没留出时间去搞,有点小遗憾. 思路:根据Burnside引理,等价类个数等于所有的置换群中的不动点的个数的平均值,根据Poly ...
- Python socket文件上传下载
python网络编程 程序的目录结构 socketDemo ├── client │ ├── cli.py │ └── local_dir │ └── lianxijiangjie.mp4 ...
- 【转】Linux下添加FTP账号和服务器、增加密码和用户,更改FTP目录
转自:http://blog.csdn.net/cloudday/article/details/8640234 1. 启动VSFTP服务器 A:cenos下运行:yum install vs ...
- MQ消息队列搭建命令及方法
MQ 是一款稳定.安全又可靠的消息传递中间件.它使用消息和队列来支持应用程序.系统.服务和文件之间的信息交换.它可以简化和加速多个平台中不同应用程序和业务数据的集成.支持各种 API 和语言,并可以在 ...
- Hadoop起步之图解SSH、免密登录原理和实现
1. 前言 emmm….最近学习大数据,需要搭建Hadoop框架,当弄好linux系统之后,第一件事就是SSH免密登录的设置.对于SSH,我觉得使用过linux系统的程序员应该并不陌生.可是吧,用起来 ...
- python安装Django常见错误
今天没事安装了一下python的web框架,Django.自己踩了一些雷,记录下来,留给后面的学者们,不要踩同样的雷了. 1.pip版本过低,安装不了,升级pip指令 python -m pip in ...
- linux命令---grep命令使用
grep 常用参数: -w 精准匹配 -r 递归匹配 -l 列出匹配内容的文件名称-v 排除 结合sed,批量替换文件内容 sed 's#10.151.30.165#10.0.3.162#g' -i ...
- CentOS7系统安装
CenOS7安装系统 镜像下载地址: http://isoredirect.centos.org/centos/7/isos/x86_64/ https://mirrors.aliyun.com/ce ...