一.trident 的介绍

  trident 的英文意思是三叉戟,在这里我的理解是因为之前我们通过之前的学习topology spout bolt 去处理数据是没有问题的,但trident 的对spout bolt 更高层次的一个抽象,其实现功能是一样的,只不过是trident做了更多的优化和封装.如果对处理的性能要求比较高,建议要采用spout bolt 来处理,反之则可以用trident

  trident 你可以这样理解,本身这个拓扑就是分散的,如果一个spout 可以有2个bolt,跟三叉戟比较像。(个人理解)

  因为trident是对storm 更高一层的抽象,它与之前学的spout bolt处理数据流的方式不一样,trident 是以batch(一组tuples)为单位进行处理的。

二.trident API操作

  trident采用批处理的方式来处理数据,其API的操作是对数据处理的方式改成了函数。对数据处理的操作有:filter sum aggregator等

  function函数的操作都是对流中的tuple进行操作的

  下面介绍 trident 常用的API

  .each(Fields inputFields, Filter filter
     作用:操作batch中的每一个tuple内容,一般与Filter或者Function函数配合使用。   .peek(Consumer action
    作用:不作任务操作,传的参数是consumer,类似于System.out.println   .partitionBy(Fields fields)
    作用:将tuples中的数据按设置的字段重定向到下一处理逻辑,设置相同字段的tuple一定会被分配到同一个线程中处理。

三.trident 的常用函数

  .FilterFunction 过滤  
    作用:对一组batch 中的tuple数据过滤     实现过程是:自定义类实现BaseFilter接口,重写isKeep()方法,在each()方法中使用自定义的类即可   .SumFunction 求和
    作用:对流中的数据进行加减
    实现过程:自定义类实现BaseFunction接口,重写execute方法,在each()方法中使用
  .MapFunction (一对一函数)
    作用: 对一个tuple进行自定义操作
    实现过程:自定义类实现MapFunction接口,重写execute()方法,通过map()方法使用
  .ProjectionFunction (投影函数)
    作用:投影函数,只保留stream中指定字段的数据。
    实现过程:在project()方法中定义所需字段即可
    例:有一个Stream包含如下字段: ["x","y","z"],使用投影: project(new Fields("y", "z")) 则输出的流仅包含 ["y","z"]字段   .repatition(重定向)
    
    作用:重定向是指tuple通过下面哪种方式路由到下一层     shuffle:  通过随机分配算法来均衡tuple到各个分区     broadcast: 每个tuple都被广播到所有的分区,这种方式在drcp时非常有用,比如在每个分区上做stateQuery     partitionBy:根据指定的字段列表进行划分,具体做法是用指定字段列表的hash值对分区个数做取模运算,确保相同字段列表的数据被划分到同一个分区     global:   所有的tuple都被发送到这个分区上,这个分区用来处理整个Stream的tuple数据的,但这个线程是独立起的     batchGlobal:一个batch中的tuple都被发送到同一个分区,不同的batch会去往不同的分区     partition: 通过一个自定义的分区函数来进行分区,这个自定义函数实现了 backtype.storm.grouping.CustomStreamGrouping
     .Aggregation(聚合)
    
    在storm的trident中处理数据是以批的形式进行处理的,所以在聚合时也是对批量内的数据进行的。经过aggregation的tuple,是被改变了原有的数据状态     在Aggregator接口中有3个方法需要实现       init()   : 当batch接收到数据时执行。并对tuple中的数据进行初始化
      aggregate(): 在接收到batch中的每一个tuple时执行,该方法一个重定向方法,它会随机启动一个单独的线程来进行聚合操作
      complete() : 在一个batch的结束时执行

    6.1 partitionAggregator

      它是对当前partition上的各个batch执行聚合操作,它不是一个重定向操作,即统计batch上的tuple的操作
   6.2 aggregator

      对一批batch中的tuple数据进行聚合
     
   6.3 ReduceAggregator
      对一批batch中第n个元素的操作 

   6.4 CombinerAggregate

      对一批batch中的tuple进行聚合操作,它是一个重定向操作

   6.5 PersistenceAggrgator

      持久化聚合器,在聚合之前先将数据存到一个位置,然后再对数据进行聚合操作

   6.6 AggregateChina

     聚合链,对一批batch 中的tuple进行多条件聚合操作

  7.GroupBy

      GroupBy会根据指定字段,把整个stream切分成一个个grouped stream,如果在grouped stream上做聚合操作,那么聚合就会发生在这些grouped stream上而不是整个batch。      如果groupBy后面跟的是aggregator,则是聚合操作,如果跟的是partitionAggregate,则不是聚合操作。

     

四.trident常用函数示例

  1.FilterFunction

    需求:在 一组数据中,过滤出第1个值与第2个值相加的值是偶数的

public class FilterTrident {
private static final Logger LOG = LoggerFactory.getLogger(FilterTrident.class); @SuppressWarnings("unchecked")
public static void main(String[] args) throws InterruptedException {
FixedBatchSpout spout = new FixedBatchSpout(new Fields("a","b","c","d"), 3,
new Values(1,4,7,10),
new Values(1,1,3,11),
new Values(2,2,7,1),
new Values(2,5,7,2));
spout.setCycle(false);
Config conf = new Config();
conf.setNumWorkers(4);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
// peek: 不做任务操作,因为参数的consumer
// each:spout中的指定元素进行操作
topology.newStream("filter", spout).parallelismHint(1)
.localOrShuffle()
.peek(input -> LOG.info("peek1 ================{},{},{},{}",input.get(0),input.get(1),input.get(2),input.get(3)))
.parallelismHint(2)
.localOrShuffle()
.each(new Fields("a","b"),new CheckEvenSumFilter())
.parallelismHint(2)
.localOrShuffle()
.peek(input -> LOG.info("peek2 +++++++++++++++++++{},{},{},{}",
input.getIntegerByField("a"),input.getIntegerByField("b"),
input.getIntegerByField("c"),input.getIntegerByField("d"))
).parallelismHint(1); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("FilterTrident", conf, topology.build()); LOG.warn("==================================================");
LOG.warn("the LocalCluster topology {} is submitted.","FilterTrident");
LOG.warn("=================================================="); TimeUnit.SECONDS.sleep(30);
cluster.killTopology("FilterTrident");
cluster.shutdown();
} private static class CheckEvenSumFilter extends BaseFilter{ @Override
public boolean isKeep(TridentTuple tuple) {
Integer a = tuple.getIntegerByField("a");
Integer b = tuple.getIntegerByField("b"); return (a + b) % 2 == 0;
}
}
}

  2.SumFunction

    需求:对一组数据中的前2个数求各

public class SumFunctionTrident {

    private static final Logger LOG = LoggerFactory.getLogger(SumFunctionTrident.class);

    @SuppressWarnings("unchecked")
public static void main(String[] args) throws InterruptedException {
FixedBatchSpout spout = new FixedBatchSpout(new Fields("a","b","c","d"), 3,
new Values(1,4,7,10),
new Values(1,1,3,11),
new Values(2,2,7,1),
new Values(2,5,7,2));
spout.setCycle(false);
Config conf = new Config();
conf.setNumWorkers(4);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
// peek: 不做任务操作,因为参数的consumer
// each:spout中的指定元素进行操作
topology.newStream("function", spout).parallelismHint(1)
.localOrShuffle()
.peek(input -> LOG.info("peek1 ================{},{},{},{}",input.get(0),input.get(1),input.get(2),input.get(3)))
.parallelismHint(2)
.localOrShuffle()
.each(new Fields("a","b"),new SumFunction(),new Fields("sum"))
.parallelismHint(2)
.localOrShuffle()
.peek(input -> LOG.info("peek2 ================{},{},{},{},{}",
input.getIntegerByField("a"),input.getIntegerByField("b"),input.getIntegerByField("c"),input.getIntegerByField("d"),input.getIntegerByField("sum")))
.parallelismHint(1); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("SumFunctionTrident", conf, topology.build()); LOG.warn("==================================================");
LOG.warn("the LocalCluster topology {} is submitted.","SumFunctionTrident");
LOG.warn("=================================================="); TimeUnit.SECONDS.sleep(30);
cluster.killTopology("HelloTridentTopology");
cluster.shutdown();
} private static class SumFunction extends BaseFunction{ @Override
public void execute(TridentTuple tuple, TridentCollector collector) {
Integer a = tuple.getIntegerByField("a");
Integer b = tuple.getIntegerByField("b");
collector.emit(new Values(a+b));
}
}
}

3.MapFunction

  需求:对一组batch中的tuple进行大小写转换

public class MapFunctionTrident {

    private static final Logger LOG = LoggerFactory.getLogger(MapFunctionTrident.class);

    @SuppressWarnings("unchecked")
public static void main(String[] args) throws InterruptedException, AlreadyAliveException, InvalidTopologyException, AuthorizationException { boolean isRemoteMode = false;
if(args.length > 0){
isRemoteMode = true;
}
FixedBatchSpout spout = new FixedBatchSpout(new Fields("line"),3,
new Values("hello stream"),
new Values("hello kafka"),
new Values("hello hadoop"),
new Values("hello scala"),
new Values("hello java")
);
spout.setCycle(true);
TridentTopology topology = new TridentTopology();
Config conf = new Config();
conf.setNumWorkers(4);
conf.setDebug(false); topology.newStream("hello", spout).parallelismHint(1)
.localOrShuffle()
.map(new MyMapFunction(),new Fields("upper"))
.parallelismHint(2)
.partition(Grouping.fields(ImmutableList.of("upper")))
.peek(input ->LOG.warn("================>> peek process value:{}",input.getStringByField("upper")))
.parallelismHint(3);
if(isRemoteMode){
StormSubmitter.submitTopology("HelloTridentTopology", conf, topology.build());
LOG.warn("==================================================");
LOG.warn("the remote topology {} is submitted.","HelloTridentTopology");
LOG.warn("==================================================");
}else{
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("HelloTridentTopology", conf, topology.build()); LOG.warn("==================================================");
LOG.warn("the LocalCluster topology {} is submitted.","HelloTridentTopology");
LOG.warn("=================================================="); TimeUnit.SECONDS.sleep(5);
cluster.killTopology("HelloTridentTopology");
cluster.shutdown();
} } private static class MyMapFunction implements MapFunction{
private static final Logger LOG = LoggerFactory.getLogger(MyMapFunction.class); @Override
public Values execute(TridentTuple input) {
String line = input.getStringByField("line");
LOG.warn("================>> myMapFunction process execute:value :{}",line);
return new Values(line.toUpperCase());
}
}
}

4.ProjectionFunctionTrident

  需求:对一组tuple的数据,取部分数据

public class ProjectionFunctionTrident {

    private static final Logger LOG = LoggerFactory.getLogger(ProjectionFunctionTrident.class);

    public static void main(String [] args) throws InterruptedException{

        @SuppressWarnings("unchecked")
FixedBatchSpout spout = new FixedBatchSpout(new Fields("x","y","z"), 3,
new Values(1,2,3),
new Values(4,5,6),
new Values(7,8,9),
new Values(10,11,12)
);
spout.setCycle(false); Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
topology.newStream("ProjectionTrident", spout).parallelismHint(1)
.localOrShuffle().peek(tridentTuple ->LOG.info("================ {}",tridentTuple)).parallelismHint(2)
.shuffle()
.project(new Fields("y","z")).parallelismHint(2)
.localOrShuffle().peek(tridentTuple ->LOG.info(">>>>>>>>>>>>>>>> {}",tridentTuple)).parallelismHint(2); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("ProjectionTrident", conf, topology.build());
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("ProjectionTrident");
cluster.shutdown();
}
}

5.2 Broadcast

  需求:将一组batch 的tuple数据发送到所有partition上

public class BroadcastRepartitionTrident {

    private static final Logger LOG = LoggerFactory.getLogger(BroadcastRepartitionTrident.class);

    public static void main(String [] args) throws InterruptedException{

        @SuppressWarnings("unchecked")
FixedBatchSpout spout = new FixedBatchSpout(new Fields("language","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("haddop",3),
new Values("java",4),
new Values("haddop",5)
);
spout.setCycle(false); Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
topology.newStream("BroadcastRepartitionTrident", spout).parallelismHint(1)
.broadcast().peek(tridentTuple ->LOG.info("================ {}",tridentTuple))
.parallelismHint(2); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("BroadcastRepartitionTrident", conf, topology.build());
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("BroadcastRepartitionTrident");
cluster.shutdown();
}
}

5.3 PartitionBy

  需求:将一组batch中的tuple通过设置的字段分到同一个task中执行

public class PartitionByRepartitionTrident {

    private static final Logger LOG = LoggerFactory.getLogger(PartitionByRepartitionTrident.class);

    public static void main(String [] args) throws InterruptedException{

        @SuppressWarnings("unchecked")
//FixedBatchSpout()里面参数解释:
// 1.spout 的字段名称的设置
// 2.设置数据几个为一个批次
// 3.字段值的设置
FixedBatchSpout spout = new FixedBatchSpout(new Fields("language","age"), 3,
new Values("java",23),
new Values("scala",3),
new Values("haddop",10),
new Values("java",23),
new Values("haddop",10)
);
spout.setCycle(false); Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
topology.newStream("PartitionByRepartitionTrident", spout).parallelismHint(1)
.partitionBy(new Fields("language")).peek(tridentTuple ->LOG.info("++++++++++++++++ {}",tridentTuple))
.parallelismHint(3); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("PartitionByRepartitionTrident", conf, topology.build());
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("PartitionByRepartitionTrident");
cluster.shutdown();
}
}

5.4 Global

   需求:对一组batch中的tuple 进行全局分组统计

public class GlobalRepatitionTrident {

    private static final Logger LOG = LoggerFactory.getLogger(GlobalRepatitionTrident.class);

    public static void main(String [] args) throws InterruptedException{

        @SuppressWarnings("unchecked")
//FixedBatchSpout()里面参数解释:
// 1.spout 的字段名称的设置
// 2.设置数据几个为一个批次
// 3.字段值的设置
FixedBatchSpout spout = new FixedBatchSpout(new Fields("language","age"), 3,
new Values("java",23),
new Values("scala",3),
new Values("haddop",10),
new Values("java",23),
new Values("haddop",10)
);
spout.setCycle(false); Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
topology.newStream("PartitionByRepartitionTrident", spout).parallelismHint(1)
.partitionBy(new Fields("language"))
.parallelismHint(3) //不管配多少个并行度,都没有影响
.peek(tridentTuple ->LOG.info(" ================= {}",tridentTuple))
.global()
.peek(tridentTuple ->LOG.info(" >>>>>>>>>>>>>>>>> {}",tridentTuple)); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("GlobalRepatitionTrident", conf, topology.build());
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("GlobalRepatitionTrident");
cluster.shutdown();
}
}

  5.5 batchGlobal

    需求:不同batch的tuple分到不同的task中

public class BatchGlobalRepatitionTrident2 {

    private static final Logger LOG = LoggerFactory.getLogger(BatchGlobalRepatitionTrident2.class);

    public static void main(String [] args) throws InterruptedException{

        @SuppressWarnings("unchecked")
FixedBatchSpout spout = new FixedBatchSpout(new Fields("language","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
spout.setCycle(false); Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
topology.newStream("BatchGlobalRepatitionTrident2", spout).parallelismHint(1)
.batchGlobal().peek(tridentTuple ->LOG.info("++++++++++++++++ {}",tridentTuple))
.parallelismHint(3); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("BatchGlobalRepatitionTrident2", conf, topology.build());
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("BatchGlobalRepatitionTrident2");
cluster.shutdown();
}
}

  5.6 partition

    需求:自定义partition

public class CustomRepartitionTrident {

    private static final Logger LOG = LoggerFactory.getLogger(CustomRepartitionTrident.class);

    public static void main(String [] args) throws InterruptedException{

        @SuppressWarnings("unchecked")
FixedBatchSpout spout = new FixedBatchSpout(new Fields("language","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("haddop",3),
new Values("java",4),
new Values("haddop",5)
);
spout.setCycle(false); Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false); TridentTopology topology = new TridentTopology();
topology.newStream("CustomRepartitionTrident", spout).parallelismHint(1)
.partition(new HighTaskIDGrouping()).peek(tridentTuple ->LOG.info("++++++++++++++++ {}",tridentTuple))
.parallelismHint(2); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("CustomRepartitionTrident", conf, topology.build());
TimeUnit.SECONDS.sleep(30);
cluster.killTopology("CustomRepartitionTrident");
cluster.shutdown();
}
} /**
* 自定义grouping :
* 让task编号更大的执行任务
* @author pengbo.zhao
*
*/
public class HighTaskIDGrouping implements CustomStreamGrouping{ private int taskID; @Override
public void prepare(WorkerTopologyContext context, GlobalStreamId stream, List<Integer> targetTasks) {
//List<Integer> targetTasks: 下游所有的tasks的集合
ArrayList<Integer> tasks = new ArrayList<>(targetTasks);
Collections.sort(tasks); //从小到大排列
this.taskID = tasks.get(tasks.size() -1);
} @Override
public List<Integer> chooseTasks(int taskId, List<Object> values) { return Arrays.asList(taskID);
}
}

  6.1 partitionAggregate

    需求:对一组batch中tuple个数的统计

public class PartitionAggregateTrident {

    private static final Logger LOG = LoggerFactory.getLogger(PartitionAggregateTrident.class);

    private FixedBatchSpout spout;

    @SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
} @Test
public void testPartitionAggregtor(){ TridentTopology topoloty = new TridentTopology();
topoloty.newStream("PartitionAggregateTrident", spout).parallelismHint(2)//内部的优先级参数是1,所以我们写2是无效的
.shuffle()
.partitionAggregate(new Fields("name","age"), new Count(),new Fields("count"))
.parallelismHint(2)
// .each(new Fields("count"),new Debug());
.peek(input ->LOG.info(" >>>>>>>>>>>>>>>>> {}",input.getLongByField("count"))); this.submitTopology("PartitionAggregateTrident", topoloty.build());
} public void submitTopology(String name,StormTopology topology) { LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology); try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
} public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
}
}

  6.2 aggregator

    需求:对tuple中的数据进行统计

public class AggregateTrident {

    private static final Logger LOG = LoggerFactory.getLogger(AggregateTrident.class);

    private FixedBatchSpout spout;

    @SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
} @Test
public void testPartitionAggregtor(){ TridentTopology topoloty = new TridentTopology();
topoloty.newStream("AggregateTrident", spout).parallelismHint(2)
.partitionBy(new Fields("name"))
.aggregate(new Fields("name","age"), new Count(),new Fields("count"))
// .aggregate(new Fields("name","age"), new CountAsAggregator(),new Fields("count"))
.parallelismHint(2)
.each(new Fields("count"),new Debug())
.peek(input -> LOG.info("============> count:{}",input.getLongByField("count"))); this.submitTopology("AggregateTrident", topoloty.build());
} public void submitTopology(String name,StormTopology topology) { LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology); try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
} public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
}
}

  6.3 reduceAggregator

     需求:对一批batch 中的tuple第0个元素求和。 即一批batch中的多少条tuple,对tuple中的指定字段求和

public class ReduceAggregatorTrident  {

    private FixedBatchSpout spout;

    @SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
} @Test
public void testReduceAggregator(){ TridentTopology topoloty = new TridentTopology();
topoloty.newStream("ReduceAggregator", spout).parallelismHint(2)
.partitionBy(new Fields("name"))
.aggregate(new Fields("age","name"), new MyReduce(),new Fields("sum"))
.parallelismHint(5)
.each(new Fields("sum"),new Debug()); this.submitTopology("ReduceAggregator", topoloty.build());
} public void submitTopology(String name,StormTopology topology) { LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology); try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
} public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
} static class MyReduce implements ReducerAggregator<Integer>{ @Override
public Integer init() {
return 0; //初始值为0
} @Override
public Integer reduce(Integer curr, TridentTuple tuple) { return curr + tuple.getInteger(0);
}
}
}

  6.4 combinerAggregate

    需求:对tuple中的字段进行求和操作

public class CombinerAggregate {

    private FixedBatchSpout spout;

    @SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
} @Test
public void testCombinerAggregate(){ TridentTopology topoloty = new TridentTopology();
topoloty.newStream("CombinerAggregate", spout).parallelismHint(2)
.partitionBy(new Fields("name"))
.aggregate(new Fields("age"), new MyCount(),new Fields("count"))
.parallelismHint(5)
.each(new Fields("count"),new Debug());
this.submitTopology("CombinerAggregate", topoloty.build());
} public void submitTopology(String name,StormTopology topology) { LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology); try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
} public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
} static class MyCount implements CombinerAggregator<Integer>{ @Override
public Integer init(TridentTuple tuple) {
return tuple.getInteger(0);
} @Override
public Integer combine(Integer val1, Integer val2) {
return val1 + val2;
} @Override
public Integer zero() {
return 0;
}
}
}

  6.5  persistenceAggregator

    需求:对一批batch中tuple元素进行统计

public class PersistenceAggregator {

    private static final Logger LOG = LoggerFactory.getLogger(PersistenceAggregator.class);

    private FixedBatchSpout spout;

    @SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
} @Test
public void testPersistenceAggregator(){ TridentTopology topoloty = new TridentTopology();
topoloty.newStream("testPersistenceAggregator", spout).parallelismHint(2)
.partitionBy(new Fields("name"))
.persistentAggregate(new MemoryMapState.Factory(), new Fields("name"), new Count(),new Fields("count"))
.parallelismHint(4)
.newValuesStream()
.peek(input ->LOG.info("count:{}",input.getLongByField("count")));
this.submitTopology("testPersistenceAggregator", topoloty.build());
} public void submitTopology(String name,StormTopology topology) { LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology); try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
} public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
} }

  6.6  AggregateChina

    需求:对batch中的tuple进行统计、求和、统计操作

public class AggregateChina {
private static final Logger LOG = LoggerFactory.getLogger(AggregateChina.class); private FixedBatchSpout spout; @SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"),3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
} @Test
public void testAggregateChina(){ TridentTopology topoloty = new TridentTopology();
topoloty.newStream("AggregateChina", spout).parallelismHint(2)
.partitionBy(new Fields("name"))
.chainedAgg()
.aggregate(new Fields("name"),new Count(), new Fields("count"))
.aggregate(new Fields("age"),new Sum(), new Fields("sum"))
.aggregate(new Fields("age"),new Count(), new Fields("count2"))
.chainEnd()
.peek(tuple->LOG.info("{}",tuple));
this.submitTopology("AggregateChina", topoloty.build());
} public void submitTopology(String name,StormTopology topology) { LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology); try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
} public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
}
}

  7.GroupBy

    需求:对一批batch中的tuple按name来分组,求对分组后的tuple中的数据进行统计

public class GroupBy {

    private static final Logger LOG = LoggerFactory.getLogger(GroupBy.class);

    private FixedBatchSpout spout;

    @SuppressWarnings("unchecked")
@Before
public void setSpout(){
this.spout = new FixedBatchSpout(new Fields("name","age"), 3,
new Values("java",1),
new Values("scala",2),
new Values("scala",3),
new Values("haddop",4),
new Values("java",5),
new Values("haddop",6)
);
this.spout.setCycle(false);
} @Test
public void testGroupBy(){ TridentTopology topoloty = new TridentTopology();
topoloty.newStream("GroupBy", spout).parallelismHint(1)
// .partitionBy(new Fields("name"))
.groupBy(new Fields("name"))
.aggregate(new Count(), new Fields("count"))
.peek(tuple -> LOG.info("{},{}",tuple.getFields(),tuple)); this.submitTopology("GroupBy", topoloty.build());
} public void submitTopology(String name,StormTopology topology) { LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, createConf(), topology); try {
TimeUnit.MINUTES.sleep(1);
} catch (InterruptedException e) {
e.printStackTrace();
}
cluster.killTopology(name);
cluster.shutdown();
} public Config createConf(){
Config conf = new Config();
conf.setNumWorkers(3);
conf.setDebug(false);
return conf;
}
}

  

storm trident 的介绍与使用的更多相关文章

  1. Strom-7 Storm Trident 详细介绍

    一.概要 1.1 Storm(简介)      Storm是一个实时的可靠地分布式流计算框架.      具体就不多说了,举个例子,它的一个典型的大数据实时计算应用场景:从Kafka消息队列读取消息( ...

  2. Storm Trident API

    在Storm Trident中有五种操作类型 Apply Locally:本地操作,所有操作应用在本地节点数据上,不会产生网络传输 Repartitioning:数据流重定向,单纯的改变数据流向,不会 ...

  3. Storm专题二:Storm Trident API 使用具体解释

    一.概述      Storm Trident中的核心数据模型就是"Stream",也就是说,Storm Trident处理的是Stream.可是实际上Stream是被成批处理的. ...

  4. storm trident 示例

    Storm Trident的核心数据模型是一批一批被处理的“流”,“流”在集群的分区在集群的节点上,对“流”的操作也是并行的在每个分区上进行. Trident有五种对“流”的操作: 1.      不 ...

  5. Storm流分组介绍

    Storm流分组介绍                流分组是拓扑定义的一部分,每个Bolt指定应该接收哪个流作为输入.流分组定义了流/元组如何在Bolt的任务之间进行分发.在设计拓扑的时候需要定义数据 ...

  6. storm trident merger

    import java.util.List; import backtype.storm.Config; import backtype.storm.LocalCluster; import back ...

  7. storm trident的filter和函数

    目的:通过kafka输出的信息进行过滤,添加指定的字段后,进行打印 SentenceSpout: package Trident; import java.util.HashMap; import j ...

  8. storm trident function函数

    package cn.crxy.trident; import java.util.List; import backtype.storm.Config; import backtype.storm. ...

  9. 第1节 storm编程:2、storm的基本介绍

    课程大纲: 1.storm的基本介绍 2.storm的架构模型 3.storm的安装 4.storm的UI管理界面 5.storm的编程模型 6.storm的入门程序 7.storm的并行度 8.st ...

随机推荐

  1. python 方法无法在线程中使用(附python获取网络流量)

    在python中,定义一个方法,直接调用可以,但是创建一个线程来调用就可能导致失败.这种现象多出现在使用com对象进行系统操作时,而且是以线程的形式调用. 异常提示如下:syntax error.WM ...

  2. Python 爬虫从入门到进阶之路(七)

    在之前的文章中我们一直用到的库是 urllib.request,该库已经包含了平常我们使用的大多数功能,但是它的 API 使用起来让人感觉不太好,而 Requests 自称 “HTTP for Hum ...

  3. vue-cli3.x npm create projectName 报错: Unexpected end of JSON input while parsing near......

    npm 版本与node版本还有webpack版本之间的问题 清理缓存,“ npm cache clean --force " 一切OK

  4. vue2.0父子组件以及非父子组件通信传参详解

    1.父组件传递数据给子组件 父组件数据如何传递给子组件呢?可以通过props属性来实现 父组件: <parent> <child :child-msg="msg" ...

  5. Spark学习之路(十六)—— Spark Streaming 整合 Kafka

    一.版本说明 Spark针对Kafka的不同版本,提供了两套整合方案:spark-streaming-kafka-0-8和spark-streaming-kafka-0-10,其主要区别如下:   s ...

  6. yii DAO操作总结

    数据库代码: /* Navicat MySQL Data Transfer Source Server         : lonxom Source Server Version : 50524 S ...

  7. memcached--add使用

    memcached是一种管理内存的软件,来动态的分配机器的内存,将需要存储的数据以key-value(键值对)的形式存储在内存中. 1.memcached使用的存储算法是hash算法在内存中存储字符串 ...

  8. (数据科学学习手札63)利用pandas读写HDF5文件

    一.简介 HDF5(Hierarchical Data Formal)是用于存储大规模数值数据的较为理想的存储格式,文件后缀名为h5,存储读取速度非常快,且可在文件内部按照明确的层次存储数据,同一个H ...

  9. Web Scraper 翻页——控制链接批量抓取数据

    ![](https://image-1255652541.cos.ap-shanghai.myqcloud.com/images/20190708214014.png) 这是简易数据分析系列的第 5 ...

  10. 浅说——数位DP

    老子听懂了!!!!! 好感动!!! 不说多了:Keywords: 数位DP,二进制,异或. “在信息学竞赛中,有一类与数位有关的区间统计问题.这类问题往往具有比较浓厚的数学味道,无法暴力求解,需要在数 ...