一.trident 的介绍

  trident 的英文意思是三叉戟,在这里我的理解是因为之前我们通过之前的学习topology spout bolt 去处理数据是没有问题的,但trident 的对spout bolt 更高层次的一个抽象,其实现功能是一样的,只不过是trident做了更多的优化和封装.如果对处理的性能要求比较高,建议要采用spout bolt 来处理,反之则可以用trident

  trident 你可以这样理解,本身这个拓扑就是分散的,如果一个spout 可以有2个bolt,跟三叉戟比较像。(个人理解)

  因为trident是对storm 更高一层的抽象,它与之前学的spout bolt处理数据流的方式不一样,trident 是以batch(一组tuples)为单位进行处理的。

二.trident API操作

  trident采用批处理的方式来处理数据,其API的操作是对数据处理的方式改成了函数。对数据处理的操作有:filter sum aggregator等

  function函数的操作都是对流中的tuple进行操作的

  下面介绍 trident 常用的API

  .each(Fields inputFields, Filter filter
     作用:操作batch中的每一个tuple内容,一般与Filter或者Function函数配合使用。   .peek(Consumer action
    作用:不作任务操作,传的参数是consumer,类似于System.out.println   .partitionBy(Fields fields)
    作用:将tuples中的数据按设置的字段重定向到下一处理逻辑,设置相同字段的tuple一定会被分配到同一个线程中处理。

三.trident 的常用函数

  .FilterFunction 过滤  
    作用:对一组batch 中的tuple数据过滤     实现过程是:自定义类实现BaseFilter接口,重写isKeep()方法,在each()方法中使用自定义的类即可   .SumFunction 求和
    作用:对流中的数据进行加减
    实现过程:自定义类实现BaseFunction接口,重写execute方法,在each()方法中使用
  .MapFunction (一对一函数)
    作用: 对一个tuple进行自定义操作
    实现过程:自定义类实现MapFunction接口,重写execute()方法,通过map()方法使用
  .ProjectionFunction (投影函数)
    作用:投影函数,只保留stream中指定字段的数据。
    实现过程:在project()方法中定义所需字段即可
    例:有一个Stream包含如下字段: ["x","y","z"],使用投影: project(new Fields("y", "z")) 则输出的流仅包含 ["y","z"]字段   .repatition(重定向)
    
    作用:重定向是指tuple通过下面哪种方式路由到下一层     shuffle:  通过随机分配算法来均衡tuple到各个分区     broadcast: 每个tuple都被广播到所有的分区,这种方式在drcp时非常有用,比如在每个分区上做stateQuery     partitionBy:根据指定的字段列表进行划分,具体做法是用指定字段列表的hash值对分区个数做取模运算,确保相同字段列表的数据被划分到同一个分区     global:   所有的tuple都被发送到这个分区上,这个分区用来处理整个Stream的tuple数据的,但这个线程是独立起的     batchGlobal:一个batch中的tuple都被发送到同一个分区,不同的batch会去往不同的分区     partition: 通过一个自定义的分区函数来进行分区,这个自定义函数实现了 backtype.storm.grouping.CustomStreamGrouping
     .Aggregation(聚合)
    
    在storm的trident中处理数据是以批的形式进行处理的,所以在聚合时也是对批量内的数据进行的。经过aggregation的tuple,是被改变了原有的数据状态     在Aggregator接口中有3个方法需要实现       init()   : 当batch接收到数据时执行。并对tuple中的数据进行初始化
      aggregate(): 在接收到batch中的每一个tuple时执行,该方法一个重定向方法,它会随机启动一个单独的线程来进行聚合操作
      complete() : 在一个batch的结束时执行

    6.1 partitionAggregator

      它是对当前partition上的各个batch执行聚合操作,它不是一个重定向操作,即统计batch上的tuple的操作
   6.2 aggregator

      对一批batch中的tuple数据进行聚合
     
   6.3 ReduceAggregator
      对一批batch中第n个元素的操作 

   6.4 CombinerAggregate

      对一批batch中的tuple进行聚合操作,它是一个重定向操作

   6.5 PersistenceAggrgator

      持久化聚合器,在聚合之前先将数据存到一个位置,然后再对数据进行聚合操作

   6.6 AggregateChina

     聚合链,对一批batch 中的tuple进行多条件聚合操作

  7.GroupBy

      GroupBy会根据指定字段,把整个stream切分成一个个grouped stream,如果在grouped stream上做聚合操作,那么聚合就会发生在这些grouped stream上而不是整个batch。      如果groupBy后面跟的是aggregator,则是聚合操作,如果跟的是partitionAggregate,则不是聚合操作。

     

四.trident常用函数示例

  1.FilterFunction

    需求:在 一组数据中,过滤出第1个值与第2个值相加的值是偶数的

public class FilterTrident {
private static final Logger LOG = LoggerFactory.getLogger(FilterTrident.class); @SuppressWarnings("unchecked")
public static void main(String[] args) throws InterruptedException {
FixedBatchSpout spout = new FixedBatchSpout(new Fields("a","b","c","d"), 3,
new Values(1,4,7,10),
new Values(1,1,3,11),
new Values(2,2,7,1),
new Values(2,5,7,2));
spout.setCycle(false);
Config conf = new Config();
conf.setNumWorkers(4);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
// peek: 不做任务操作,因为参数的consumer
// each:spout中的指定元素进行操作
topology.newStream("filter", spout).parallelismHint(1)
.localOrShuffle()
.peek(input -> LOG.info("peek1 ================{},{},{},{}",input.get(0),input.get(1),input.get(2),input.get(3)))
.parallelismHint(2)
.localOrShuffle()
.each(new Fields("a","b"),new CheckEvenSumFilter())
.parallelismHint(2)
.localOrShuffle()
.peek(input -> LOG.info("peek2 +++++++++++++++++++{},{},{},{}",
input.getIntegerByField("a"),input.getIntegerByField("b"),
input.getIntegerByField("c"),input.getIntegerByField("d"))
).parallelismHint(1); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("FilterTrident", conf, topology.build()); LOG.warn("==================================================");
LOG.warn("the LocalCluster topology {} is submitted.","FilterTrident");
LOG.warn("=================================================="); TimeUnit.SECONDS.sleep(30);
cluster.killTopology("FilterTrident");
cluster.shutdown();
} private static class CheckEvenSumFilter extends BaseFilter{ @Override
public boolean isKeep(TridentTuple tuple) {
Integer a = tuple.getIntegerByField("a");
Integer b = tuple.getIntegerByField("b"); return (a + b) % 2 == 0;
}
}
}

  2.SumFunction

    需求:对一组数据中的前2个数求各

public class SumFunctionTrident {

    private static final Logger LOG = LoggerFactory.getLogger(SumFunctionTrident.class);

    @SuppressWarnings("unchecked")
public static void main(String[] args) throws InterruptedException {
FixedBatchSpout spout = new FixedBatchSpout(new Fields("a","b","c","d"), 3,
new Values(1,4,7,10),
new Values(1,1,3,11),
new Values(2,2,7,1),
new Values(2,5,7,2));
spout.setCycle(false);
Config conf = new Config();
conf.setNumWorkers(4);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
// peek: 不做任务操作,因为参数的consumer
// each:spout中的指定元素进行操作
topology.newStream("function", spout).parallelismHint(1)
.localOrShuffle()
.peek(input -> LOG.info("peek1 ================{},{},{},{}",input.get(0),input.get(1),input.get(2),input.get(3)))
.parallelismHint(2)
.localOrShuffle()
.each(new Fields("a","b"),new SumFunction(),new Fields("sum"))
.parallelismHint(2)
.localOrShuffle()
.peek(input -> LOG.info("peek2 ================{},{},{},{},{}",
input.getIntegerByField("a"),input.getIntegerByField("b"),input.getIntegerByField("c"),input.getIntegerByField("d"),input.getIntegerByField("sum")))
.parallelismHint(1); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("SumFunctionTrident", conf, topology.build()); LOG.warn("==================================================");
LOG.warn("the LocalCluster topology {} is submitted.","SumFunctionTrident");
LOG.warn("=================================================="); TimeUnit.SECONDS.sleep(30);
cluster.killTopology("HelloTridentTopology");
cluster.shutdown();
} private static class SumFunction extends BaseFunction{ @Override
public void execute(TridentTuple tuple, TridentCollector collector) {
Integer a = tuple.getIntegerByField("a");
Integer b = tuple.getIntegerByField("b");
collector.emit(new Values(a+b));
}
}
}

3.MapFunction

  需求:对一组batch中的tuple进行大小写转换

public class MapFunctionTrident {

    private static final Logger LOG = LoggerFactory.getLogger(MapFunctionTrident.class);

    @SuppressWarnings("unchecked")
public static void main(String[] args) throws InterruptedException, AlreadyAliveException, InvalidTopologyException, AuthorizationException { boolean isRemoteMode = false;
if(args.length > 0){
isRemoteMode = true;
}
FixedBatchSpout spout = new FixedBatchSpout(new Fields("line"),3,
new Values("hello stream"),
new Values("hello kafka"),
new Values("hello hadoop"),
new Values("hello scala"),
new Values("hello java")
);
spout.setCycle(true);
TridentTopology topology = new TridentTopology();
Config conf = new Config();
conf.setNumWorkers(4);
conf.setDebug(false); topology.newStream("hello", spout).parallelismHint(1)
.localOrShuffle()
.map(new MyMapFunction(),new Fields("upper"))
.parallelismHint(2)
.partition(Grouping.fields(ImmutableList.of("upper")))
.peek(input ->LOG.warn("================>> peek process value:{}",input.getStringByField("upper")))
.parallelismHint(3);
if(isRemoteMode){
StormSubmitter.submitTopology("HelloTridentTopology", conf, topology.build());
LOG.warn("==================================================");
LOG.warn("the remote topology {} is submitted.","HelloTridentTopology");
LOG.warn("==================================================");
}else{
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("HelloTridentTopology", conf, topology.build()); LOG.warn("==================================================");
LOG.warn("the LocalCluster topology {} is submitted.","HelloTridentTopology");
LOG.warn("=================================================="); TimeUnit.SECONDS.sleep(5);
cluster.killTopology("HelloTridentTopology");
cluster.shutdown();
} } private static class MyMapFunction implements MapFunction{
private static final Logger LOG = LoggerFactory.getLogger(MyMapFunction.class); @Override
public Values execute(TridentTuple input) {
String line = input.getStringByField("line");
LOG.warn("================>> myMapFunction process execute:value :{}",line);
return new Values(line.toUpperCase());
}
}
}

4.ProjectionFunctionTrident

  需求:对一组tuple的数据,取部分数据

public class ProjectionFunctionTrident {

    private static final Logger LOG = LoggerFactory.getLogger(ProjectionFunctionTrident.class);

    public static void main(String [] args) throws InterruptedException{

        @SuppressWarnings("unchecked")
FixedBatchSpout spout = new FixedBatchSpout(new Fields("x","y","z"), 3,
new Values(1,2,3),
new Values(4,5,6),
new Values(7,8,9),
new Values(10,11,12)
);
spout.setCycle(false); Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
topology.newStream("ProjectionTrident", spout).parallelismHint(1)
.localOrShuffle().peek(tridentTuple ->LOG.info("================ {}",tridentTuple)).parallelismHint(2)
.shuffle()
.project(new Fields("y","z")).parallelismHint(2)
.localOrShuffle().peek(tridentTuple ->LOG.info(">>>>>>>>>>>>>>>> {}",tridentTuple)).parallelismHint(2); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("ProjectionTrident", conf, topology.build());
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("ProjectionTrident");
cluster.shutdown();
}
}

5.2 Broadcast

  需求:将一组batch 的tuple数据发送到所有partition上

public class BroadcastRepartitionTrident {

    private static final Logger LOG = LoggerFactory.getLogger(BroadcastRepartitionTrident.class);

    public static void main(String [] args) throws InterruptedException{

        @SuppressWarnings("unchecked")
FixedBatchSpout spout = new FixedBatchSpout(new Fields("language","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("haddop",3),
new Values("java",4),
new Values("haddop",5)
);
spout.setCycle(false); Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
topology.newStream("BroadcastRepartitionTrident", spout).parallelismHint(1)
.broadcast().peek(tridentTuple ->LOG.info("================ {}",tridentTuple))
.parallelismHint(2); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("BroadcastRepartitionTrident", conf, topology.build());
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("BroadcastRepartitionTrident");
cluster.shutdown();
}
}

5.3 PartitionBy

  需求:将一组batch中的tuple通过设置的字段分到同一个task中执行

public class PartitionByRepartitionTrident {

    private static final Logger LOG = LoggerFactory.getLogger(PartitionByRepartitionTrident.class);

    public static void main(String [] args) throws InterruptedException{

        @SuppressWarnings("unchecked")
//FixedBatchSpout()里面参数解释:
// 1.spout 的字段名称的设置
// 2.设置数据几个为一个批次
// 3.字段值的设置
FixedBatchSpout spout = new FixedBatchSpout(new Fields("language","age"), 3,
new Values("java",23),
new Values("scala",3),
new Values("haddop",10),
new Values("java",23),
new Values("haddop",10)
);
spout.setCycle(false); Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
topology.newStream("PartitionByRepartitionTrident", spout).parallelismHint(1)
.partitionBy(new Fields("language")).peek(tridentTuple ->LOG.info("++++++++++++++++ {}",tridentTuple))
.parallelismHint(3); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("PartitionByRepartitionTrident", conf, topology.build());
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("PartitionByRepartitionTrident");
cluster.shutdown();
}
}

5.4 Global

   需求:对一组batch中的tuple 进行全局分组统计

public class GlobalRepatitionTrident {

    private static final Logger LOG = LoggerFactory.getLogger(GlobalRepatitionTrident.class);

    public static void main(String [] args) throws InterruptedException{

        @SuppressWarnings("unchecked")
//FixedBatchSpout()里面参数解释:
// 1.spout 的字段名称的设置
// 2.设置数据几个为一个批次
// 3.字段值的设置
FixedBatchSpout spout = new FixedBatchSpout(new Fields("language","age"), 3,
new Values("java",23),
new Values("scala",3),
new Values("haddop",10),
new Values("java",23),
new Values("haddop",10)
);
spout.setCycle(false); Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
topology.newStream("PartitionByRepartitionTrident", spout).parallelismHint(1)
.partitionBy(new Fields("language"))
.parallelismHint(3) //不管配多少个并行度,都没有影响
.peek(tridentTuple ->LOG.info(" ================= {}",tridentTuple))
.global()
.peek(tridentTuple ->LOG.info(" >>>>>>>>>>>>>>>>> {}",tridentTuple)); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("GlobalRepatitionTrident", conf, topology.build());
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("GlobalRepatitionTrident");
cluster.shutdown();
}
}

  5.5 batchGlobal

    需求:不同batch的tuple分到不同的task中

public class BatchGlobalRepatitionTrident2 {

    private static final Logger LOG = LoggerFactory.getLogger(BatchGlobalRepatitionTrident2.class);

    public static void main(String [] args) throws InterruptedException{

        @SuppressWarnings("unchecked")
FixedBatchSpout spout = new FixedBatchSpout(new Fields("language","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
spout.setCycle(false); Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
topology.newStream("BatchGlobalRepatitionTrident2", spout).parallelismHint(1)
.batchGlobal().peek(tridentTuple ->LOG.info("++++++++++++++++ {}",tridentTuple))
.parallelismHint(3); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("BatchGlobalRepatitionTrident2", conf, topology.build());
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("BatchGlobalRepatitionTrident2");
cluster.shutdown();
}
}

  5.6 partition

    需求:自定义partition

public class CustomRepartitionTrident {

    private static final Logger LOG = LoggerFactory.getLogger(CustomRepartitionTrident.class);

    public static void main(String [] args) throws InterruptedException{

        @SuppressWarnings("unchecked")
FixedBatchSpout spout = new FixedBatchSpout(new Fields("language","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("haddop",3),
new Values("java",4),
new Values("haddop",5)
);
spout.setCycle(false); Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
topology.newStream("CustomRepartitionTrident", spout).parallelismHint(1)
.partition(new HighTaskIDGrouping()).peek(tridentTuple ->LOG.info("++++++++++++++++ {}",tridentTuple))
.parallelismHint(2); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("CustomRepartitionTrident", conf, topology.build());
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("CustomRepartitionTrident");
cluster.shutdown();
}
} /**
* 自定义grouping :
* 让task编号更大的执行任务
* @author pengbo.zhao
*
*/
public class HighTaskIDGrouping implements CustomStreamGrouping{ private int taskID; @Override
public void prepare(WorkerTopologyContext context, GlobalStreamId stream, List<Integer> targetTasks) {
//List<Integer> targetTasks: 下游所有的tasks的集合
ArrayList<Integer> tasks = new ArrayList<>(targetTasks);
Collections.sort(tasks); //从小到大排列
this.taskID = tasks.get(tasks.size() -1);
} @Override
public List<Integer> chooseTasks(int taskId, List<Object> values) { return Arrays.asList(taskID);
}
}

  6.1 partitionAggregate

    需求:对一组batch中tuple个数的统计

public class PartitionAggregateTrident {

    private static final Logger LOG = LoggerFactory.getLogger(PartitionAggregateTrident.class);

    private FixedBatchSpout spout;

    @SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
} @Test
public void testPartitionAggregtor(){ TridentTopology topoloty = new TridentTopology();
topoloty.newStream("PartitionAggregateTrident", spout).parallelismHint(2)//内部的优先级参数是1,所以我们写2是无效的
.shuffle()
.partitionAggregate(new Fields("name","age"), new Count(),new Fields("count"))
.parallelismHint(2)
// .each(new Fields("count"),new Debug());
.peek(input ->LOG.info(" >>>>>>>>>>>>>>>>> {}",input.getLongByField("count"))); this.submitTopology("PartitionAggregateTrident", topoloty.build());
} public void submitTopology(String name,StormTopology topology) { LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology); try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
} public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
}
}

  6.2 aggregator

    需求:对tuple中的数据进行统计

public class AggregateTrident {

    private static final Logger LOG = LoggerFactory.getLogger(AggregateTrident.class);

    private FixedBatchSpout spout;

    @SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
} @Test
public void testPartitionAggregtor(){ TridentTopology topoloty = new TridentTopology();
topoloty.newStream("AggregateTrident", spout).parallelismHint(2)
.partitionBy(new Fields("name"))
.aggregate(new Fields("name","age"), new Count(),new Fields("count"))
// .aggregate(new Fields("name","age"), new CountAsAggregator(),new Fields("count"))
.parallelismHint(2)
.each(new Fields("count"),new Debug())
.peek(input -> LOG.info("============> count:{}",input.getLongByField("count"))); this.submitTopology("AggregateTrident", topoloty.build());
} public void submitTopology(String name,StormTopology topology) { LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology); try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
} public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
}
}

  6.3 reduceAggregator

     需求:对一批batch 中的tuple第0个元素求和。 即一批batch中的多少条tuple,对tuple中的指定字段求和

public class ReduceAggregatorTrident  {

    private FixedBatchSpout spout;

    @SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
} @Test
public void testReduceAggregator(){ TridentTopology topoloty = new TridentTopology();
topoloty.newStream("ReduceAggregator", spout).parallelismHint(2)
.partitionBy(new Fields("name"))
.aggregate(new Fields("age","name"), new MyReduce(),new Fields("sum"))
.parallelismHint(5)
.each(new Fields("sum"),new Debug()); this.submitTopology("ReduceAggregator", topoloty.build());
} public void submitTopology(String name,StormTopology topology) { LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology); try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
} public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
} static class MyReduce implements ReducerAggregator<Integer>{ @Override
public Integer init() {
return 0; //初始值为0
} @Override
public Integer reduce(Integer curr, TridentTuple tuple) { return curr + tuple.getInteger(0);
}
}
}

  6.4 combinerAggregate

    需求:对tuple中的字段进行求和操作

public class CombinerAggregate {

    private FixedBatchSpout spout;

    @SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
} @Test
public void testCombinerAggregate(){ TridentTopology topoloty = new TridentTopology();
topoloty.newStream("CombinerAggregate", spout).parallelismHint(2)
.partitionBy(new Fields("name"))
.aggregate(new Fields("age"), new MyCount(),new Fields("count"))
.parallelismHint(5)
.each(new Fields("count"),new Debug());
this.submitTopology("CombinerAggregate", topoloty.build());
} public void submitTopology(String name,StormTopology topology) { LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology); try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
} public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
} static class MyCount implements CombinerAggregator<Integer>{ @Override
public Integer init(TridentTuple tuple) {
return tuple.getInteger(0);
} @Override
public Integer combine(Integer val1, Integer val2) {
return val1 + val2;
} @Override
public Integer zero() {
return 0;
}
}
}

  6.5  persistenceAggregator

    需求:对一批batch中tuple元素进行统计

public class PersistenceAggregator {

    private static final Logger LOG = LoggerFactory.getLogger(PersistenceAggregator.class);

    private FixedBatchSpout spout;

    @SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
} @Test
public void testPersistenceAggregator(){ TridentTopology topoloty = new TridentTopology();
topoloty.newStream("testPersistenceAggregator", spout).parallelismHint(2)
.partitionBy(new Fields("name"))
.persistentAggregate(new MemoryMapState.Factory(), new Fields("name"), new Count(),new Fields("count"))
.parallelismHint(4)
.newValuesStream()
.peek(input ->LOG.info("count:{}",input.getLongByField("count")));
this.submitTopology("testPersistenceAggregator", topoloty.build());
} public void submitTopology(String name,StormTopology topology) { LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology); try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
} public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
} }

  6.6  AggregateChina

    需求:对batch中的tuple进行统计、求和、统计操作

public class AggregateChina {
private static final Logger LOG = LoggerFactory.getLogger(AggregateChina.class); private FixedBatchSpout spout; @SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"),3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
} @Test
public void testAggregateChina(){ TridentTopology topoloty = new TridentTopology();
topoloty.newStream("AggregateChina", spout).parallelismHint(2)
.partitionBy(new Fields("name"))
.chainedAgg()
.aggregate(new Fields("name"),new Count(), new Fields("count"))
.aggregate(new Fields("age"),new Sum(), new Fields("sum"))
.aggregate(new Fields("age"),new Count(), new Fields("count2"))
.chainEnd()
.peek(tuple->LOG.info("{}",tuple));
this.submitTopology("AggregateChina", topoloty.build());
} public void submitTopology(String name,StormTopology topology) { LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology); try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
} public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
}
}

  7.GroupBy

    需求:对一批batch中的tuple按name来分组,求对分组后的tuple中的数据进行统计

public class GroupBy {

    private static final Logger LOG = LoggerFactory.getLogger(GroupBy.class);

    private FixedBatchSpout spout;

    @SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
} @Test
public void testGroupBy(){ TridentTopology topoloty = new TridentTopology();
topoloty.newStream("GroupBy", spout).parallelismHint(1)
// .partitionBy(new Fields("name"))
.groupBy(new Fields("name"))
.aggregate(new Count(), new Fields("count"))
.peek(tuple -> LOG.info("{},{}",tuple.getFields(),tuple)); this.submitTopology("GroupBy", topoloty.build());
} public void submitTopology(String name,StormTopology topology) { LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology); try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
} public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
}
}

  

storm trident 的介绍与使用的更多相关文章

  1. Strom-7 Storm Trident 详细介绍

    一.概要 1.1 Storm(简介)      Storm是一个实时的可靠地分布式流计算框架.      具体就不多说了,举个例子,它的一个典型的大数据实时计算应用场景:从Kafka消息队列读取消息( ...

  2. Storm Trident API

    在Storm Trident中有五种操作类型 Apply Locally:本地操作,所有操作应用在本地节点数据上,不会产生网络传输 Repartitioning:数据流重定向,单纯的改变数据流向,不会 ...

  3. Storm专题二:Storm Trident API 使用具体解释

    一.概述      Storm Trident中的核心数据模型就是"Stream",也就是说,Storm Trident处理的是Stream.可是实际上Stream是被成批处理的. ...

  4. storm trident 示例

    Storm Trident的核心数据模型是一批一批被处理的“流”,“流”在集群的分区在集群的节点上,对“流”的操作也是并行的在每个分区上进行. Trident有五种对“流”的操作: 1.      不 ...

  5. Storm流分组介绍

    Storm流分组介绍                流分组是拓扑定义的一部分,每个Bolt指定应该接收哪个流作为输入.流分组定义了流/元组如何在Bolt的任务之间进行分发.在设计拓扑的时候需要定义数据 ...

  6. storm trident merger

    import java.util.List; import backtype.storm.Config; import backtype.storm.LocalCluster; import back ...

  7. storm trident的filter和函数

    目的:通过kafka输出的信息进行过滤,添加指定的字段后,进行打印 SentenceSpout: package Trident; import java.util.HashMap; import j ...

  8. storm trident function函数

    package cn.crxy.trident; import java.util.List; import backtype.storm.Config; import backtype.storm. ...

  9. 第1节 storm编程:2、storm的基本介绍

    课程大纲: 1.storm的基本介绍 2.storm的架构模型 3.storm的安装 4.storm的UI管理界面 5.storm的编程模型 6.storm的入门程序 7.storm的并行度 8.st ...

随机推荐

  1. arcgis api for js 4.X 出现跨域问题

    arcgis api for js 4.X 出现跨域问题 XMLHttpRequest cannot load http://localhost/4.3/4.3/esri/workers/mutabl ...

  2. hadoop之hive高级操作

    在输出结果较多,需要输出到文件中时,可以在hive CLI之外执行hive -e "sql" > output.txt操作 但当SQL语句太长或太多时,这种方式不是很方便,可 ...

  3. Centos 7 防火墙 firewalld 简单使用说明

    1.firewalld简介 firewalld是centos7的一大特性,最大的好处有两个:支持动态更新,不用重启服务:第二个就是加入了防火墙的“zone”概念   2.firewalld命令行界面管 ...

  4. 《阿里巴巴Java开发手册》改名《Java开发手册》,涵盖史无前例的三大升级

    2019.06.19 <阿里巴巴Java开发手册>时隔一年,发布更新1.5.0华山版.同时,将更名为<Java开发手册>,涵盖史无前例的三大升级 1)鉴于本手册是社区开发者集体 ...

  5. 点菜网---Java开源生鲜电商平台-系统架构图(源码可下载)

    点菜网---Java开源生鲜电商平台-系统架构图(源码可下载) 1.点菜网-生鲜电商平台的价值与定位. 生鲜电商平台是一家致力于打造全国餐饮行业智能化.便利化.平台化与透明化服务的创新型移动互联网平台 ...

  6. spring boot 2.x 系列 —— spring boot 整合 servlet 3.0

    文章目录 一.说明 1.1 项目结构说明 1.2 项目依赖 二.采用spring 注册方式整合 servlet 2.1 新建过滤器.监听器和servlet 2.2 注册过滤器.监听器和servlet ...

  7. nginx之gzip压缩

    nginx的gizp压缩 为了使网站节省带宽和加快访问速度,在服务器方面的一个优化的就是使用nginx提供的gzip压缩. 一.使用压缩原理: 1.当用户使用浏览器访问网站时,就是在发送一个http请 ...

  8. Go - Slice 切片

    概述 切片是一种动态数组,比数组操作灵活,长度不是固定的,可以进行追加和删除. len() 和 cap() 返回结果可相同和不同. 声明切片 //demo_7.go package main impo ...

  9. Jenkins+svn+ftp自动化发布asp.net项目

    今天将自己所掌握的(Jenkins+svn+ftp自动化发布asp.net项目)知识分享给大家,希望能帮组到大家: (1)先下载Jenkins并安装: (2)安装.Net所需要的插件: (3)配置插件 ...

  10. Java上机题(封装)(编写student类)

    今天帮大一的童鞋写Java上机题 题目虽然很简单,但是刚拿到题目的时候愣了一下,然后就疯狂get set QuQ 其实这是一个特别基本的封装的题目(之前实验室面试大二的时候竟然还有蛮多人不知道封装的概 ...