Using DRPC to complete the required processing

  1. 1. Create a new branch of your source using the following command
  2.  
  3. git branch chap4
  4. git checkout chap4
  5.  
  6. 2. Create a new class named SplitAndProjectToFields, which extends from BaseFunction
  7.  
  8. public class SplitAndProjectToFields extends BaseFunction {
  9.  
  10. public void execute(TridentTuple tuple, TridentCollector collector) {
  11. Values vals = new Values();
  12. for(String word: tuple.getString(0).split(" ")) {
  13. if(word.length() > 0) {
  14. vals.add(word);
  15. }
  16. }
  17. collector.emit(vals);
  18. }
  19. }
  20.  
  21. 3. Once this is complete, edit the TermTopology class, and add the following method
  22.  
  23. public class TermTopology {
  24.  
  25. private static void addTFIDFQueryStream(TridentState tfState, TridentState dfState, TridentState dState, TridentTopology topology, LocalDRPC drpc) {
  26. topology.newDRPCStream("ftidfQuery", drpc)
  27. .each(new Fields("args"), new SplitAndProjectToFields(), new Fields("documentId", "term"))
  28. .each(new Fields(), new StaticSourceFunction(), new Fields("source"))
  29. .stateQuery(tfState, new Fields("documentId", "term"), new MapGet(), new Fields("tf"))
  30. .stateQuery(dfState, new Fields("term"), new MapGet(), new Fields("df"))
  31. .stateQuery(tfState, new Fields("source"), new MapGet(), new Fields("d"))
  32. .each(new Fields("term", "documentId", "tf", "d", "df"), new TfidfExpression(), new Fields("tfidf"))
  33. .each(new Fields("tfidf"), new FilterNull())
  34. .project(new Fields("documentId", "term", "tfidf"));
  35.  
  36. }
  37. }
  38.  
  39. 4. Then update your buildTopology method by removing the final stream definition and adding the DRPC creation:
  40.  
  41. public static TridentTopology buildTopology(ITridentSpout spout, LocalDRPC drpc) {
  42.  
  43. TridentTopology topology = new TridentTopology();
  44.  
  45. Stream documentStream = getUrlStream(topology, spout)
  46. .each(new Fields("url"), new DocumentFetchFunction(mimeTypes), new Fields("document", "documentId", "source"));
  47.  
  48. Stream termStream = documentStream.parallelismHint(20)
  49. .each(new Fields("document"), new DocumentTokenizer(), new Fields("dirtyTerm"))
  50. .each(new Fields("dirtyTerm"), new TermFilter(), new Fields("term"))
  51. .project(new Fields("term","documentId","source"));
  52.  
  53. TridentState dfState = termStream.groupBy(new Fields("term"))
  54. .persistentAggregate(getStateFactory("df"), new Count(), new Fields("df"));
  55.  
  56. TridentState dState = documentStream.groupBy(new Fields("source"))
  57. .persistentAggregate(getStateFactory("d"), new Count(), new Fields("d"));
  58.  
  59. TridentState tfState = termStream.groupBy(new Fields("documentId", "term"))
  60. .persistentAggregate(getStateFactory("tf"), new Count(), new Fields("tf"));
  61.  
  62. addTFIDFQueryStream(tfState, dfState, dState, topology, drpc);
  63.  
  64. return topology;
  65. }

Implementing a rolling window topology

  1. 1. In order to implement the rolling time window, we will need to use a fork of this state implementation. Start by cloning, building, and installing it into our local Maven repo
  2.  
  3. git clone https://github.com/quintona/trident-cassandra.git
  4.  
  5. cd trident-cassandra
  6.  
  7. lein install
  8.  
  9. 2. Then update your project dependencies to include this new version by changing the following code line:
  10.  
  11. [trident-cassandra/trident-cassandra "0.0.1-wip1"]
  12.  
  13. To the following line:
  14.  
  15. [trident-cassandra/trident-cassandra "0.0.1-bucketwip1"]
  16.  
  17. Simulating time in integration testing
  18.  
  19. 3. Ensure that you have updated your project dependencies in Eclipse using the process described earlier and then create a new class called TimeBasedRowStrategy
  20.  
  21. public class TimeBasedRowStrategy implements RowKeyStrategy, Serializable {
  22.  
  23. private static final long serialVersionUID = 6981400531506165681L;
  24.  
  25. @Override
  26. public <T> String getRowKey(List<List<Object>> keys, Options<T> options) {
  27. return options.rowKey + StateUtils.formatHour(new Date());
  28. }
  29. }
  30.  
  31. 4. And implement the StateUtils.formatHour static method
  32.  
  33. public static String formatHour(Date date){
  34. return new SimpleDateFormat("yyyyMMddHH").format(date);
  35. }
  36.  
  37. 5. Finally, replace the getStateFactory method in TermTopology with the following
  38.  
  39. private static StateFactory getStateFactory(String rowKey) {
  40. CassandraBucketState.BucketOptions options = new CassandraBucketState.BucketOptions();
  41. options.keyspace = "trident_test";
  42. options.columnFamily = "tfid";
  43. options.rowKey = rowKey;
  44. options.keyStrategy = new TimeBasedRowStrategy();
  45. return CassandraBucketState.nonTransactional("localhost", options);
  46. }

Storm(4) - Distributed Remote Procedure Calls的更多相关文章

  1. 分布式计算 要不要把写日志独立成一个Server Remote Procedure Call Protocol

    w https://en.wikipedia.org/wiki/Remote_procedure_call In distributed computing a remote procedure ca ...

  2. Remote procedure call (RPC)

    Remote procedure call (RPC) (using the .NET client) Prerequisites This tutorial assumes RabbitMQ isi ...

  3. win32多线程-异步过程调用(asynchronous Procedure Calls, APCs)

    使用overlapped I/O并搭配event对象-----win32多线程-异步(asynchronous) I/O事例,会产生两个基础性问题. 第一个问题是,使用WaitForMultipleO ...

  4. RPC(Remote Procedure Call Protocol)——远程过程调用协议

    RPC(Remote Procedure Call Protocol)--远程过程调用协议,它是一种通过网络从远程计算机程序上请求服务,而不需要了解底层网络技术的协议.RPC协议假定某些传输协议的存在 ...

  5. RPC(Remote Procedure Call Protocol)远程过程调用协议

    RPC(Remote Procedure Call Protocol)——远程过程调用协议,它是一种通过网络从远程计算机程序上请求服务,而不需要了解底层网络技术的协议.RPC协议假定某些传输协议的存在 ...

  6. RPC远程过程调用(Remote Procedure Call)

    RPC,就是Remote Procedure Call,远程过程调用 远程过程调用,自然是相对于本地过程调用 本地过程调用,就好比你现在在家里,你要想洗碗,那你直接把碗放进洗碗机,打开洗碗机开关就可以 ...

  7. Jmeter Distributed (Remote) Testing: Master Slave Configuration

    What is Distributed Testing? DistributedTestingis a kind of testing which use multiple systems to pe ...

  8. RPC(Remote Procedure Call Protocol)——远程过程调用协议 学习总结

        首先了解什么叫RPC,为什么要RPC,RPC是指远程过程调用,也就是说两台服务器A,B,一个应用部署在A服务器上,想要调用B服务器上应用提供的函数/方法,由于不在一个内存空间,不能直接调用,需 ...

  9. RPC(Remote Procedure Call Protocol)

    远程过程调用协议: 1.调用客户端句柄:执行传送参数 2.调用本地系统内核发送网络消息 3.消息传送到远程主机 4.服务器句柄得到消息并取得参数 5.执行远程过程 6.执行的过程将结果返回服务器句柄 ...

随机推荐

  1. XML约束——DTD约束

    参考: 方立勋老师的讲课视频.   什么是XML约束 •在XML技术里,可以编写一个文档来约束一个XML文档的书写规范,这称之为XML约束. 为什么需要XML约束 常用的约束技术 •XML DTD • ...

  2. QQReg.java

    import java.awt.*; import javax.swing.*; public class QQReg extends JFrame{ public static void main( ...

  3. mysql 查询执行的流程

    1.客户端发送一个请求给服务器.2.服务器先检查查询缓存,命中了缓存,直接返回缓存中的数据,否则进入下一个阶段.3.服务器进行sql解析,预处理,再由优化器生成对应的执行计划.4.mysql根据执行计 ...

  4. hibernate mysql写入中文乱码 解决

    启动hibernate项目,自动创建表,插入数据之后发现写入表里的数据里的中文是乱码.按如下方法解决了: 修改数据库的字符集为UTF-8,这个可以通过mysql的客户端软件里右键要修改的数据库的属性更 ...

  5. 图标下载网站 http://www.easyicon.net/

    图标下载网站 http://www.easyicon.net/

  6. JavaWeb 5 Tomcat

    5 Tomcat 1 Web开发入门            1.1 引入                之前的程序: java桌面程序,控制台控制,socket gui界面.javase规范      ...

  7. 怎么进入BAT的研发部门?

    怎么进入BAT的研发部门? ======================================剑指offer+leetcode+geeksforgeeks+编程之美+算法导论====秒杀BA ...

  8. valueForKeyPath的妙用(转)

    可能大家对 - (id)valueForKeyPath:(NSString *)keyPath 方法不是很了解. 其实这个方法非常的强大,举个例子: NSArray *array = @[@" ...

  9. EXEL表格读取 按键精灵

    EXEL表格读取(1,m)(2,m)表格信息,m为行数 以下为本帖隐藏内容 ============================== Call Plugin.Office.OpenXls(&quo ...

  10. grep使用方法

    linux grep命令详解 简介 grep (global search regular expression(RE) and print out the line,全面搜索正则表达式并把行打印出来 ...