Using DRPC to complete the required processing

1. Create a new branch of your source using the following command

git branch chap4
git checkout chap4

2. Create a new class named SplitAndProjectToFields, which extends from BaseFunction

public class SplitAndProjectToFields extends BaseFunction {

    public void execute(TridentTuple tuple, TridentCollector collector) {
        Values vals = new Values();
           for(String word: tuple.getString(0).split(" ")) {
            if(word.length() > 0) {
                vals.add(word);
            }
        }
       collector.emit(vals);
    }
}

3. Once this is complete, edit the TermTopology class, and add the following method

public class TermTopology {

    private static void addTFIDFQueryStream(TridentState tfState, TridentState dfState, TridentState dState, TridentTopology topology, LocalDRPC drpc) {
        topology.newDRPCStream("ftidfQuery", drpc)
            .each(new Fields("args"), new SplitAndProjectToFields(), new Fields("documentId", "term"))
            .each(new Fields(), new StaticSourceFunction(), new Fields("source"))
            .stateQuery(tfState, new Fields("documentId", "term"), new MapGet(), new Fields("tf"))
            .stateQuery(dfState, new Fields("term"), new MapGet(), new Fields("df"))
            .stateQuery(tfState, new Fields("source"), new MapGet(), new Fields("d"))
            .each(new Fields("term", "documentId", "tf", "d", "df"), new TfidfExpression(), new Fields("tfidf"))
            .each(new Fields("tfidf"), new FilterNull())
            .project(new Fields("documentId", "term", "tfidf"));

    }
}

4. Then update your buildTopology method by removing the final stream definition and adding the DRPC creation:

public static TridentTopology buildTopology(ITridentSpout spout, LocalDRPC drpc) {

    TridentTopology topology = new TridentTopology();

    Stream documentStream = getUrlStream(topology, spout)
        .each(new Fields("url"), new DocumentFetchFunction(mimeTypes), new Fields("document", "documentId", "source"));

    Stream termStream = documentStream.parallelismHint(20)
        .each(new Fields("document"), new DocumentTokenizer(), new Fields("dirtyTerm"))
        .each(new Fields("dirtyTerm"), new TermFilter(), new Fields("term"))
        .project(new Fields("term","documentId","source"));

    TridentState dfState = termStream.groupBy(new Fields("term"))
        .persistentAggregate(getStateFactory("df"), new Count(), new Fields("df"));

    TridentState dState = documentStream.groupBy(new Fields("source"))
        .persistentAggregate(getStateFactory("d"), new Count(), new Fields("d"));

    TridentState tfState = termStream.groupBy(new Fields("documentId", "term"))
        .persistentAggregate(getStateFactory("tf"), new Count(), new Fields("tf"));

    addTFIDFQueryStream(tfState, dfState, dState, topology, drpc);

    return topology;
}

Implementing a rolling window topology

1. In order to implement the rolling time window, we will need to use a fork of this state implementation. Start by cloning, building, and installing it into our local Maven repo

git clone https://github.com/quintona/trident-cassandra.git

cd trident-cassandra

lein install

2. Then update your project dependencies to include this new version by changing the following code line:

[trident-cassandra/trident-cassandra "0.0.1-wip1"]

To the following line:

[trident-cassandra/trident-cassandra "0.0.1-bucketwip1"]

Simulating time in integration testing

3. Ensure that you have updated your project dependencies in Eclipse using the process described earlier and then create a new class called TimeBasedRowStrategy

public class TimeBasedRowStrategy implements RowKeyStrategy, Serializable {

    private static final long serialVersionUID = 6981400531506165681L;

    @Override
    public <T> String getRowKey(List<List<Object>> keys, Options<T> options) {
       return options.rowKey + StateUtils.formatHour(new Date());
    }
}

4. And implement the StateUtils.formatHour static method

public static String formatHour(Date date){
    return new SimpleDateFormat("yyyyMMddHH").format(date);
}

5. Finally, replace the getStateFactory method in TermTopology with the following

private static StateFactory getStateFactory(String rowKey) {
    CassandraBucketState.BucketOptions options = new CassandraBucketState.BucketOptions();
    options.keyspace = "trident_test";
    options.columnFamily = "tfid";
    options.rowKey = rowKey;
    options.keyStrategy = new TimeBasedRowStrategy();
    return CassandraBucketState.nonTransactional("localhost", options);
}

Storm(4) - Distributed Remote Procedure Calls的更多相关文章

  1. 分布式计算 要不要把写日志独立成一个Server Remote Procedure Call Protocol

    w https://en.wikipedia.org/wiki/Remote_procedure_call In distributed computing a remote procedure ca ...

  2. Remote procedure call (RPC)

    Remote procedure call (RPC) (using the .NET client) Prerequisites This tutorial assumes RabbitMQ isi ...

  3. win32多线程-异步过程调用(asynchronous Procedure Calls, APCs)

    使用overlapped I/O并搭配event对象-----win32多线程-异步(asynchronous) I/O事例,会产生两个基础性问题. 第一个问题是,使用WaitForMultipleO ...

  4. RPC(Remote Procedure Call Protocol)——远程过程调用协议

    RPC(Remote Procedure Call Protocol)--远程过程调用协议,它是一种通过网络从远程计算机程序上请求服务,而不需要了解底层网络技术的协议.RPC协议假定某些传输协议的存在 ...

  5. RPC(Remote Procedure Call Protocol)远程过程调用协议

    RPC(Remote Procedure Call Protocol)——远程过程调用协议,它是一种通过网络从远程计算机程序上请求服务,而不需要了解底层网络技术的协议.RPC协议假定某些传输协议的存在 ...

  6. RPC远程过程调用(Remote Procedure Call)

    RPC,就是Remote Procedure Call,远程过程调用 远程过程调用,自然是相对于本地过程调用 本地过程调用,就好比你现在在家里,你要想洗碗,那你直接把碗放进洗碗机,打开洗碗机开关就可以 ...

  7. Jmeter Distributed (Remote) Testing: Master Slave Configuration

    What is Distributed Testing? DistributedTestingis a kind of testing which use multiple systems to pe ...

  8. RPC(Remote Procedure Call Protocol)——远程过程调用协议 学习总结

        首先了解什么叫RPC,为什么要RPC,RPC是指远程过程调用,也就是说两台服务器A,B,一个应用部署在A服务器上,想要调用B服务器上应用提供的函数/方法,由于不在一个内存空间,不能直接调用,需 ...

  9. RPC(Remote Procedure Call Protocol)

    远程过程调用协议: 1.调用客户端句柄:执行传送参数 2.调用本地系统内核发送网络消息 3.消息传送到远程主机 4.服务器句柄得到消息并取得参数 5.执行远程过程 6.执行的过程将结果返回服务器句柄 ...

随机推荐

  1. HDU1016 Prime Ring Problem(DFS回溯)

    Prime Ring Problem Time Limit: 4000/2000 MS (Java/Others)    Memory Limit: 65536/32768 K (Java/Other ...

  2. 让css初学者抓狂的属性float

    挣扎了好久,始终没有决定要不要写博客,心里有几个顾虑一是我是小白,我写的文章有没有人看?二是我是小白,我写的文章假如存在诸多错误,理解的不对发表上去再去误导别人.三是写一篇文章费时费力.但是我现在想明 ...

  3. 讓 SourceTree 讀取自定的 SSH key

    我目前都在 Mac 底下開發,用 Git 來管理我的程式碼,比較一番之後決定用 SourceTree 來做為 Git client.SourceTree 是一款 Mac 底下的版本控制系統 clien ...

  4. iOS - Swift Struct 结构体

    1.Struct 的创建 1.1 基本定义 结构体的定义 // 定义结构体数据类型 struct BookInfo { // 每个属性变量都必须初始化 var ID:Int = 0 var Name: ...

  5. 控制执行流程 Thinking in Java 第四章

    4.1 true 和 false *Java 不允许将一个数字作为布尔类型(虽然在C和C++中可以),使用时需要条件表达式将其转换为布尔类型: 如下: if( a!= 0) 4.2 if-else 4 ...

  6. 卷积FFT、NTT、FWT

    先简短几句话说说FFT.... 多项式可用系数和点值表示,n个点可确定一个次数小于n的多项式. 多项式乘积为 f(x)*g(x),显然若已知f(x), g(x)的点值,O(n)可求得多项式乘积的点值. ...

  7. sscanf的用法(转)

    队长做上海邀请赛的I题时遇到一个棘手的问题,字符串的处理很麻烦,按传统的gets全部读入的话还要做N多处理,太浪费时间. 回来之后搜了一下sscanf的用法发现可以很好的解决这一类问题,各种百度,转来 ...

  8. python的最最最最最基本语法(2)

    函数的定义: 使用def语句,依次写出函数名.括号.括号中的参数和冒号:,然后,在缩进块中编写函数体,函数的返回值用return语句返回. 当用return 返回多个值时,返回的其实是一个tuple, ...

  9. Android 代码混淆 防止反编译

    为了防止代码被反编译,因此需要加入混淆.混淆也可以通过第三方进行apk混淆,也可以用android中的proguard进行混淆. 混淆步骤: 1.配置混淆文件,名字可以随意,在这里使用proguard ...

  10. js 高程(三)学习感言(随时更新)

    1.读第二遍了,感觉第一遍白读了. 2.现在还处于初学...