Storm(4) - Distributed Remote Procedure Calls
Using DRPC to complete the required processing
- 1. Create a new branch of your source using the following command
- git branch chap4
- git checkout chap4
- 2. Create a new class named SplitAndProjectToFields, which extends from BaseFunction
- public class SplitAndProjectToFields extends BaseFunction {
- public void execute(TridentTuple tuple, TridentCollector collector) {
- Values vals = new Values();
- for(String word: tuple.getString(0).split(" ")) {
- if(word.length() > 0) {
- vals.add(word);
- }
- }
- collector.emit(vals);
- }
- }
- 3. Once this is complete, edit the TermTopology class, and add the following method
- public class TermTopology {
- private static void addTFIDFQueryStream(TridentState tfState, TridentState dfState, TridentState dState, TridentTopology topology, LocalDRPC drpc) {
- topology.newDRPCStream("ftidfQuery", drpc)
- .each(new Fields("args"), new SplitAndProjectToFields(), new Fields("documentId", "term"))
- .each(new Fields(), new StaticSourceFunction(), new Fields("source"))
- .stateQuery(tfState, new Fields("documentId", "term"), new MapGet(), new Fields("tf"))
- .stateQuery(dfState, new Fields("term"), new MapGet(), new Fields("df"))
- .stateQuery(tfState, new Fields("source"), new MapGet(), new Fields("d"))
- .each(new Fields("term", "documentId", "tf", "d", "df"), new TfidfExpression(), new Fields("tfidf"))
- .each(new Fields("tfidf"), new FilterNull())
- .project(new Fields("documentId", "term", "tfidf"));
- }
- }
- 4. Then update your buildTopology method by removing the final stream definition and adding the DRPC creation:
- public static TridentTopology buildTopology(ITridentSpout spout, LocalDRPC drpc) {
- TridentTopology topology = new TridentTopology();
- Stream documentStream = getUrlStream(topology, spout)
- .each(new Fields("url"), new DocumentFetchFunction(mimeTypes), new Fields("document", "documentId", "source"));
- Stream termStream = documentStream.parallelismHint(20)
- .each(new Fields("document"), new DocumentTokenizer(), new Fields("dirtyTerm"))
- .each(new Fields("dirtyTerm"), new TermFilter(), new Fields("term"))
- .project(new Fields("term","documentId","source"));
- TridentState dfState = termStream.groupBy(new Fields("term"))
- .persistentAggregate(getStateFactory("df"), new Count(), new Fields("df"));
- TridentState dState = documentStream.groupBy(new Fields("source"))
- .persistentAggregate(getStateFactory("d"), new Count(), new Fields("d"));
- TridentState tfState = termStream.groupBy(new Fields("documentId", "term"))
- .persistentAggregate(getStateFactory("tf"), new Count(), new Fields("tf"));
- addTFIDFQueryStream(tfState, dfState, dState, topology, drpc);
- return topology;
- }
Implementing a rolling window topology
- 1. In order to implement the rolling time window, we will need to use a fork of this state implementation. Start by cloning, building, and installing it into our local Maven repo
- git clone https://github.com/quintona/trident-cassandra.git
- cd trident-cassandra
- lein install
- 2. Then update your project dependencies to include this new version by changing the following code line:
- [trident-cassandra/trident-cassandra "0.0.1-wip1"]
- To the following line:
- [trident-cassandra/trident-cassandra "0.0.1-bucketwip1"]
- Simulating time in integration testing
- 3. Ensure that you have updated your project dependencies in Eclipse using the process described earlier and then create a new class called TimeBasedRowStrategy
- public class TimeBasedRowStrategy implements RowKeyStrategy, Serializable {
- private static final long serialVersionUID = 6981400531506165681L;
- @Override
- public <T> String getRowKey(List<List<Object>> keys, Options<T> options) {
- return options.rowKey + StateUtils.formatHour(new Date());
- }
- }
- 4. And implement the StateUtils.formatHour static method
- public static String formatHour(Date date){
- return new SimpleDateFormat("yyyyMMddHH").format(date);
- }
- 5. Finally, replace the getStateFactory method in TermTopology with the following
- private static StateFactory getStateFactory(String rowKey) {
- CassandraBucketState.BucketOptions options = new CassandraBucketState.BucketOptions();
- options.keyspace = "trident_test";
- options.columnFamily = "tfid";
- options.rowKey = rowKey;
- options.keyStrategy = new TimeBasedRowStrategy();
- return CassandraBucketState.nonTransactional("localhost", options);
- }
Storm(4) - Distributed Remote Procedure Calls的更多相关文章
- 分布式计算 要不要把写日志独立成一个Server Remote Procedure Call Protocol
w https://en.wikipedia.org/wiki/Remote_procedure_call In distributed computing a remote procedure ca ...
- Remote procedure call (RPC)
Remote procedure call (RPC) (using the .NET client) Prerequisites This tutorial assumes RabbitMQ isi ...
- win32多线程-异步过程调用(asynchronous Procedure Calls, APCs)
使用overlapped I/O并搭配event对象-----win32多线程-异步(asynchronous) I/O事例,会产生两个基础性问题. 第一个问题是,使用WaitForMultipleO ...
- RPC(Remote Procedure Call Protocol)——远程过程调用协议
RPC(Remote Procedure Call Protocol)--远程过程调用协议,它是一种通过网络从远程计算机程序上请求服务,而不需要了解底层网络技术的协议.RPC协议假定某些传输协议的存在 ...
- RPC(Remote Procedure Call Protocol)远程过程调用协议
RPC(Remote Procedure Call Protocol)——远程过程调用协议,它是一种通过网络从远程计算机程序上请求服务,而不需要了解底层网络技术的协议.RPC协议假定某些传输协议的存在 ...
- RPC远程过程调用(Remote Procedure Call)
RPC,就是Remote Procedure Call,远程过程调用 远程过程调用,自然是相对于本地过程调用 本地过程调用,就好比你现在在家里,你要想洗碗,那你直接把碗放进洗碗机,打开洗碗机开关就可以 ...
- Jmeter Distributed (Remote) Testing: Master Slave Configuration
What is Distributed Testing? DistributedTestingis a kind of testing which use multiple systems to pe ...
- RPC(Remote Procedure Call Protocol)——远程过程调用协议 学习总结
首先了解什么叫RPC,为什么要RPC,RPC是指远程过程调用,也就是说两台服务器A,B,一个应用部署在A服务器上,想要调用B服务器上应用提供的函数/方法,由于不在一个内存空间,不能直接调用,需 ...
- RPC(Remote Procedure Call Protocol)
远程过程调用协议: 1.调用客户端句柄:执行传送参数 2.调用本地系统内核发送网络消息 3.消息传送到远程主机 4.服务器句柄得到消息并取得参数 5.执行远程过程 6.执行的过程将结果返回服务器句柄 ...
随机推荐
- XML约束——DTD约束
参考: 方立勋老师的讲课视频. 什么是XML约束 •在XML技术里,可以编写一个文档来约束一个XML文档的书写规范,这称之为XML约束. 为什么需要XML约束 常用的约束技术 •XML DTD • ...
- QQReg.java
import java.awt.*; import javax.swing.*; public class QQReg extends JFrame{ public static void main( ...
- mysql 查询执行的流程
1.客户端发送一个请求给服务器.2.服务器先检查查询缓存,命中了缓存,直接返回缓存中的数据,否则进入下一个阶段.3.服务器进行sql解析,预处理,再由优化器生成对应的执行计划.4.mysql根据执行计 ...
- hibernate mysql写入中文乱码 解决
启动hibernate项目,自动创建表,插入数据之后发现写入表里的数据里的中文是乱码.按如下方法解决了: 修改数据库的字符集为UTF-8,这个可以通过mysql的客户端软件里右键要修改的数据库的属性更 ...
- 图标下载网站 http://www.easyicon.net/
图标下载网站 http://www.easyicon.net/
- JavaWeb 5 Tomcat
5 Tomcat 1 Web开发入门 1.1 引入 之前的程序: java桌面程序,控制台控制,socket gui界面.javase规范 ...
- 怎么进入BAT的研发部门?
怎么进入BAT的研发部门? ======================================剑指offer+leetcode+geeksforgeeks+编程之美+算法导论====秒杀BA ...
- valueForKeyPath的妙用(转)
可能大家对 - (id)valueForKeyPath:(NSString *)keyPath 方法不是很了解. 其实这个方法非常的强大,举个例子: NSArray *array = @[@" ...
- EXEL表格读取 按键精灵
EXEL表格读取(1,m)(2,m)表格信息,m为行数 以下为本帖隐藏内容 ============================== Call Plugin.Office.OpenXls(&quo ...
- grep使用方法
linux grep命令详解 简介 grep (global search regular expression(RE) and print out the line,全面搜索正则表达式并把行打印出来 ...