在对上两篇讨论中我们介绍了并行运算的两种体现方式:并行构建数据源及并行运算用户自定义函数。我们分别对这两部分进行了示范。本篇我准备示范把这两种情况集成一体的并行运算模式。这次介绍的数据源并行构建方式也与前面描述的有所不同:在前面讨论里我们预知需要从三个独立流来并行构建数据源。但如果我们有一个不知长度的数据流,它的每个元素代表不同的数据流,应该如何处理。我们知道在AQMRPT表里有从1999年到2xxx年的空气质量测量数据,我们可以试着并行把按年份生成的数据流构建成一个数据源。直接使用上期示范中的铺垫代码包括NORMAQM表初始化和从STATES和COUNTIES里用名称搜索对应id的函数:

  1. val db = Database.forConfig("h2db")
  2.  
  3. //drop original table schema
  4. val futVectorTables = db.run(MTable.getTables)
  5.  
  6. val futDropTable = futVectorTables.flatMap{ tables => {
  7. val tableNames = tables.map(t => t.name.name)
  8. if (tableNames.contains(NORMAQMQuery.baseTableRow.tableName))
  9. db.run(NORMAQMQuery.schema.drop)
  10. else Future()
  11. }
  12. }.andThen {
  13. case Success(_) => println(s"Table ${NORMAQMQuery.baseTableRow.tableName} dropped successfully! ")
  14. case Failure(e) => println(s"Failed to drop Table ${NORMAQMQuery.baseTableRow.tableName}, it may not exist! Error: ${e.getMessage}")
  15. }
  16. Await.ready(futDropTable,Duration.Inf)
  17.  
  18. //create new table to refine AQMRawTable
  19. val actionCreateTable = Models.NORMAQMQuery.schema.create
  20. val futCreateTable = db.run(actionCreateTable).andThen {
  21. case Success(_) => println("Table created successfully!")
  22. case Failure(e) => println(s"Table may exist already! Error: ${e.getMessage}")
  23. }
  24. //would carry on even fail to create table
  25. Await.ready(futCreateTable,Duration.Inf)
  26.  
  27. //truncate data, only available in slick 3.2.1
  28. val futTruncateTable = futVectorTables.flatMap{ tables => {
  29. val tableNames = tables.map(t => t.name.name)
  30. if (tableNames.contains(NORMAQMQuery.baseTableRow.tableName))
  31. db.run(NORMAQMQuery.schema.truncate)
  32. else Future()
  33. }
  34. }.andThen {
  35. case Success(_) => println(s"Table ${NORMAQMQuery.baseTableRow.tableName} truncated successfully!")
  36. case Failure(e) => println(s"Failed to truncate Table ${NORMAQMQuery.baseTableRow.tableName}! Error: ${e.getMessage}")
  37. }
  38. Await.ready(futDropTable,Duration.Inf)
  39.  
  40. //a conceived task for the purpose of resource consumption
  41. //getting id with corresponding name from STATES table
  42. def getStateID(state: String): Int = {
  43. //create a stream for state id with state name
  44. implicit def toState(row: StateTable#TableElementType) = StateModel(row.id,row.name)
  45. val stateLoader = FDAViewLoader(slick.jdbc.H2Profile)(toState _)
  46. val stateSeq = stateLoader.fda_typedRows(StateQuery.result)(db).toSeq
  47. //constructed a Stream[Task,String]
  48. val stateStream = fda_staticSource(stateSeq)()
  49. var id = -
  50. def getid: FDAUserTask[FDAROW] = row => {
  51. row match {
  52. case StateModel(stid,stname) => //target row type
  53. if (stname.contains(state)) {
  54. id = stid
  55. fda_break //exit
  56. }
  57. else fda_skip //take next row
  58. case _ => fda_skip
  59. }
  60. }
  61. stateStream.appendTask(getid).startRun
  62. id
  63. }
  64. //another conceived task for the purpose of resource consumption
  65. //getting id with corresponding names from COUNTIES table
  66. def getCountyID(state: String, county: String): Int = {
  67. //create a stream for county id with state name and county name
  68. implicit def toCounty(row: CountyTable#TableElementType) = CountyModel(row.id,row.name)
  69. val countyLoader = FDAViewLoader(slick.jdbc.H2Profile)(toCounty _)
  70. val countySeq = countyLoader.fda_typedRows(CountyQuery.result)(db).toSeq
  71. //constructed a Stream[Task,String]
  72. val countyStream = fda_staticSource(countySeq)()
  73. var id = -
  74. def getid: FDAUserTask[FDAROW] = row => {
  75. row match {
  76. case CountyModel(cid,cname) => //target row type
  77. if (cname.contains(state) && cname.contains(county)) {
  78. id = cid
  79. fda_break //exit
  80. }
  81. else fda_skip //take next row
  82. case _ => fda_skip
  83. }
  84. }
  85. countyStream.appendTask(getid).startRun
  86. id
  87. }

以及两个用户自定义函数:

  1. //process input row and produce action row to insert into NORMAQM
  2. def getIdsThenInsertAction: FDAUserTask[FDAROW] = row => {
  3. row match {
  4. case aqm: AQMRPTModel =>
  5. if (aqm.valid) {
  6. val stateId = getStateID(aqm.state)
  7. val countyId = getCountyID(aqm.state,aqm.county)
  8. val action = NORMAQMQuery += NORMAQMModel(,aqm.mid, stateId, countyId, aqm.year,aqm.value,aqm.total)
  9. fda_next(FDAActionRow(action))
  10. }
  11. else fda_skip
  12. case _ => fda_skip
  13. }
  14. }
  15. //runner for the action rows
  16. val runner = FDAActionRunner(slick.jdbc.H2Profile)
  17. def runInsertAction: FDAUserTask[FDAROW] = row =>
  18. row match {
  19. case FDAActionRow(action) =>
  20. runner.fda_execAction(action)(db)
  21. fda_skip
  22. case _ => fda_skip
  23. }

跟着是本篇新增代码,我们先构建一个所有年份的流:

  1. //create parallel sources
  2. //get a stream of years
  3. val qryYears = AQMRPTQuery.map(_.year).distinct
  4. case class Years(year: Int) extends FDAROW
  5.  
  6. implicit def toYears(y: Int) = Years(y)
  7.  
  8. val yearViewLoader = FDAViewLoader(slick.jdbc.H2Profile)(toYears _)
  9. val yearSeq = yearViewLoader.fda_typedRows(qryYears.result)(db).toSeq
  10. val yearStream = fda_staticSource(yearSeq)()

下面是一个按年份从AQMRPT表读取数据的函数:

  1. //strong row type
  2. implicit def toAQMRPT(row: AQMRPTTable#TableElementType) =
  3. AQMRPTModel(row.rid, row.mid, row.state, row.county, row.year, row.value, row.total, row.valid)
  4.  
  5. //shared stream loader when operate in parallel mode
  6. val AQMRPTLoader = FDAStreamLoader(slick.jdbc.H2Profile)(toAQMRPT _)
  7.  
  8. //loading rows with year yr
  9. def loadRowsInYear(yr: Int) = {
  10. //a new query
  11. val query = AQMRPTQuery.filter(row => row.year === yr)
  12. //reuse same loader
  13. AQMRPTLoader.fda_typedStream(query.result)(db)(, )()
  14. }

我们可以预见多个loadRowsInYear函数实例会共享统一的FDAStreamLoader AQMRPTLoader。用户自定义数据读取函数类型是FDASourceLoader。下面是FDASourceLoader示范代码:

  1. //loading rows by year
  2. def loadRowsByYear: FDASourceLoader = row => {
  3. row match {
  4. case Years(y) => loadRowsInYear(y) //produce stream of the year
  5. case _ => fda_appendRow(FDANullRow)
  6. }
  7.  
  8. }

我们用toParSource构建一个并行数据源:

  1. //get parallel source constructor
  2. val parSource = yearStream.toParSource(loadRowsByYear)

用fda_par_source来把并行数据源转换成统一数据流:

  1. //produce a stream from parallel sources
  2. val source = fda_par_source(parSource)()

source是个FDAPipeLine,可以直接运算:source.startRun,也可以在后面挂上多个环节。下面我们把其它两个用户自定义函数转成并行运算函数后接到source后面:

  1. //the following is a process of composition of stream combinators
  2. //get parallel source constructor
  3. val parSource = yearStream.toParSource(loadRowsByYear)
  4.  
  5. //implicit val strategy = Strategy.fromCachedDaemonPool("cachedPool")
  6. //produce a stream from parallel sources
  7. val source = fda_par_source(parSource)()
  8. //turn getIdsThenInsertAction into parallel task
  9. val parTasks = source.toPar(getIdsThenInsertAction)
  10. //runPar to produce a new stream
  11. val actionStream =fda_runPar(parTasks)()
  12. //turn runInsertAction into parallel task
  13. val parRun = actionStream.toPar(runInsertAction)
  14. //runPar and carry out by startRun
  15. fda_runPar(parRun)().startRun

下面是本次示范的完整源代码:

  1. import slick.jdbc.meta._
  2. import com.bayakala.funda._
  3. import api._
  4. import scala.language.implicitConversions
  5. import scala.concurrent.ExecutionContext.Implicits.global
  6. import scala.concurrent.duration._
  7. import scala.concurrent.{Await, Future}
  8. import scala.util.{Failure, Success}
  9. import slick.jdbc.H2Profile.api._
  10. import Models._
  11. import fs2.Strategy
  12.  
  13. object ParallelExecution extends App {
  14.  
  15. val db = Database.forConfig("h2db")
  16.  
  17. //drop original table schema
  18. val futVectorTables = db.run(MTable.getTables)
  19.  
  20. val futDropTable = futVectorTables.flatMap{ tables => {
  21. val tableNames = tables.map(t => t.name.name)
  22. if (tableNames.contains(NORMAQMQuery.baseTableRow.tableName))
  23. db.run(NORMAQMQuery.schema.drop)
  24. else Future()
  25. }
  26. }.andThen {
  27. case Success(_) => println(s"Table ${NORMAQMQuery.baseTableRow.tableName} dropped successfully! ")
  28. case Failure(e) => println(s"Failed to drop Table ${NORMAQMQuery.baseTableRow.tableName}, it may not exist! Error: ${e.getMessage}")
  29. }
  30. Await.ready(futDropTable,Duration.Inf)
  31.  
  32. //create new table to refine AQMRawTable
  33. val actionCreateTable = Models.NORMAQMQuery.schema.create
  34. val futCreateTable = db.run(actionCreateTable).andThen {
  35. case Success(_) => println("Table created successfully!")
  36. case Failure(e) => println(s"Table may exist already! Error: ${e.getMessage}")
  37. }
  38. //would carry on even fail to create table
  39. Await.ready(futCreateTable,Duration.Inf)
  40.  
  41. //truncate data, only available in slick 3.2.1
  42. val futTruncateTable = futVectorTables.flatMap{ tables => {
  43. val tableNames = tables.map(t => t.name.name)
  44. if (tableNames.contains(NORMAQMQuery.baseTableRow.tableName))
  45. db.run(NORMAQMQuery.schema.truncate)
  46. else Future()
  47. }
  48. }.andThen {
  49. case Success(_) => println(s"Table ${NORMAQMQuery.baseTableRow.tableName} truncated successfully!")
  50. case Failure(e) => println(s"Failed to truncate Table ${NORMAQMQuery.baseTableRow.tableName}! Error: ${e.getMessage}")
  51. }
  52. Await.ready(futDropTable,Duration.Inf)
  53.  
  54. //a conceived task for the purpose of resource consumption
  55. //getting id with corresponding name from STATES table
  56. def getStateID(state: String): Int = {
  57. //create a stream for state id with state name
  58. implicit def toState(row: StateTable#TableElementType) = StateModel(row.id,row.name)
  59. val stateLoader = FDAViewLoader(slick.jdbc.H2Profile)(toState _)
  60. val stateSeq = stateLoader.fda_typedRows(StateQuery.result)(db).toSeq
  61. //constructed a Stream[Task,String]
  62. val stateStream = fda_staticSource(stateSeq)()
  63. var id = -
  64. def getid: FDAUserTask[FDAROW] = row => {
  65. row match {
  66. case StateModel(stid,stname) => //target row type
  67. if (stname.contains(state)) {
  68. id = stid
  69. fda_break //exit
  70. }
  71. else fda_skip //take next row
  72. case _ => fda_skip
  73. }
  74. }
  75. stateStream.appendTask(getid).startRun
  76. id
  77. }
  78. //another conceived task for the purpose of resource consumption
  79. //getting id with corresponding names from COUNTIES table
  80. def getCountyID(state: String, county: String): Int = {
  81. //create a stream for county id with state name and county name
  82. implicit def toCounty(row: CountyTable#TableElementType) = CountyModel(row.id,row.name)
  83. val countyLoader = FDAViewLoader(slick.jdbc.H2Profile)(toCounty _)
  84. val countySeq = countyLoader.fda_typedRows(CountyQuery.result)(db).toSeq
  85. //constructed a Stream[Task,String]
  86. val countyStream = fda_staticSource(countySeq)()
  87. var id = -
  88. def getid: FDAUserTask[FDAROW] = row => {
  89. row match {
  90. case CountyModel(cid,cname) => //target row type
  91. if (cname.contains(state) && cname.contains(county)) {
  92. id = cid
  93. fda_break //exit
  94. }
  95. else fda_skip //take next row
  96. case _ => fda_skip
  97. }
  98. }
  99. countyStream.appendTask(getid).startRun
  100. id
  101. }
  102.  
  103. //process input row and produce action row to insert into NORMAQM
  104. def getIdsThenInsertAction: FDAUserTask[FDAROW] = row => {
  105. row match {
  106. case aqm: AQMRPTModel =>
  107. if (aqm.valid) {
  108. val stateId = getStateID(aqm.state)
  109. val countyId = getCountyID(aqm.state,aqm.county)
  110. val action = NORMAQMQuery += NORMAQMModel(,aqm.mid, stateId, countyId, aqm.year,aqm.value,aqm.total)
  111. fda_next(FDAActionRow(action))
  112. }
  113. else fda_skip
  114. case _ => fda_skip
  115. }
  116. }
  117. //runner for the action rows
  118. val runner = FDAActionRunner(slick.jdbc.H2Profile)
  119. def runInsertAction: FDAUserTask[FDAROW] = row =>
  120. row match {
  121. case FDAActionRow(action) =>
  122. runner.fda_execAction(action)(db)
  123. fda_skip
  124. case _ => fda_skip
  125. }
  126.  
  127. //create parallel sources
  128. //get a stream of years
  129. val qryYears = AQMRPTQuery.map(_.year).distinct
  130. case class Years(year: Int) extends FDAROW
  131.  
  132. implicit def toYears(y: Int) = Years(y)
  133.  
  134. val yearViewLoader = FDAViewLoader(slick.jdbc.H2Profile)(toYears _)
  135. val yearSeq = yearViewLoader.fda_typedRows(qryYears.result)(db).toSeq
  136. val yearStream = fda_staticSource(yearSeq)()
  137.  
  138. //strong row type
  139. implicit def toAQMRPT(row: AQMRPTTable#TableElementType) =
  140. AQMRPTModel(row.rid, row.mid, row.state, row.county, row.year, row.value, row.total, row.valid)
  141.  
  142. //shared stream loader when operate in parallel mode
  143. val AQMRPTLoader = FDAStreamLoader(slick.jdbc.H2Profile)(toAQMRPT _)
  144.  
  145. //loading rows with year yr
  146. def loadRowsInYear(yr: Int) = {
  147. //a new query
  148. val query = AQMRPTQuery.filter(row => row.year === yr)
  149. //reuse same loader
  150. AQMRPTLoader.fda_typedStream(query.result)(db)(, )()
  151. }
  152.  
  153. //loading rows by year
  154. def loadRowsByYear: FDASourceLoader = row => {
  155. row match {
  156. case Years(y) => loadRowsInYear(y) //produce stream of the year
  157. case _ => fda_appendRow(FDANullRow)
  158. }
  159.  
  160. }
  161.  
  162. //start counter
  163. val cnt_start = System.currentTimeMillis()
  164.  
  165. def showRecord: FDAUserTask[FDAROW] = row => {
  166. row match {
  167. case Years(y) => println(y); fda_skip
  168. case aqm: AQMRPTModel =>
  169. println(s"${aqm.year} $aqm")
  170. fda_skip
  171. case FDAActionRow(action) =>
  172. println(s"${action}")
  173. fda_skip
  174. case _ => fda_skip
  175. }
  176. }
  177.  
  178. //the following is a process of composition of stream combinators
  179. //get parallel source constructor
  180. val parSource = yearStream.toParSource(loadRowsByYear)
  181.  
  182. //implicit val strategy = Strategy.fromCachedDaemonPool("cachedPool")
  183. //produce a stream from parallel sources
  184. val source = fda_par_source(parSource)()
  185. //turn getIdsThenInsertAction into parallel task
  186. val parTasks = source.toPar(getIdsThenInsertAction)
  187. //runPar to produce a new stream
  188. val actionStream =fda_runPar(parTasks)()
  189. //turn runInsertAction into parallel task
  190. val parRun = actionStream.toPar(runInsertAction)
  191. //runPar and carry out by startRun
  192. fda_runPar(parRun)().startRun
  193.  
  194. println(s"processing 219400 rows parallelly in ${(System.currentTimeMillis - cnt_start)/1000} seconds")
  195.  
  196. }

FunDA(16)- 示范:整合并行运算 - total parallelism solution的更多相关文章

  1. FunDA(14)- 示范:并行运算,并行数据库读取 - parallel data loading

    FunDA的并行数据库读取功能是指在多个线程中同时对多个独立的数据源进行读取.这些独立的数据源可以是在不同服务器上的数据库表,又或者把一个数据库表分成几个独立部分形成的独立数据源.当然,并行读取的最终 ...

  2. java JDK8 学习笔记——第16章 整合数据库

    第十六章 整合数据库 16.1 JDBC入门 16.1.1 JDBC简介 1.JDBC是java联机数据库的标准规范.它定义了一组标准类与接口,标准API中的接口会有数据库厂商操作,称为JDBC驱动程 ...

  3. SpringBoot学习笔记(16)----SpringBoot整合Swagger2

    Swagger 是一个规范和完整的框架,用于生成,描述,调用和可视化RESTful风格的web服务 http://swagger.io Springfox的前身是swagger-springmvc,是 ...

  4. spring 5.x 系列第16篇 —— 整合dubbo (代码配置方式)

    文章目录 一. 项目结构说明 二.项目依赖 三.公共模块(dubbo-ano-common) 四. 服务提供者(dubbo-ano-provider) 4.1 提供方配置 4.2 使用注解@Servi ...

  5. FunDA(11)- 数据库操作的并行运算:Parallel data processing

    FunDA最重要的设计目标之一就是能够实现数据库操作的并行运算.我们先重温一下fs2是如何实现并行运算的.我们用interleave.merge.either这几种方式来同时处理两个Stream里的元 ...

  6. Total Commander 8.52 Beta 1

    Total Commander 8.52 Beta 1http://www.ghisler.com/852_b1.php 10.08.15 Release Total Commander 8.52 b ...

  7. FunDA(0)- Functional Data Access accessible to all

    大数据.多核CPU驱动了函数式编程模式的兴起.因为函数式编程更适合多线程.复杂.安全的大型软件编程.但是,对许多有应用软件开发经验的编程者来说,函数式编程模式是一种全新的.甚至抽象的概念,可能需要很长 ...

  8. mybatis系列-16-spring和mybatis整合

    16.1     整合思路 需要spring通过单例方式管理SqlSessionFactory. spring和mybatis整合生成代理对象,使用SqlSessionFactory创建SqlSess ...

  9. ThinkPHP与EasyUI整合之三(searchbox):在datagrid中查询指定记录

    在datagrid中toolbar添加searchbox查询框,根据列范围查询数据,先看效果图: 1. searchbox采用easyui的Demo例子,再加以js扩展,根据datagrid中的列数据 ...

随机推荐

  1. DHT

    DHT(Distributed Hash Table,分布式哈希表)类似Tracker的根据种子特征码返回种子信息的网络.DHT全称叫分布式哈希表(Distributed Hash Table),是一 ...

  2. ListView动态改变每一项的高度。

    ListView中每一项的高度默认是相同的,除非超过其预定高度值,否则需要动点手脚. VariableSizedListView 继承 ListView然后重写protected override v ...

  3. 失踪的7(P1590&NOIP水题测试(2017082301))

    题目链接:失踪的7 水题,不解释. #include<bits/stdc++.h> using namespace std; int main(){ int t; scanf(" ...

  4. keras框架的MLP手写数字识别MNIST,梳理?

    keras框架的MLP手写数字识别MNIST 代码: # coding: utf-8 # In[1]: import numpy as np import pandas as pd from kera ...

  5. Django的学习(五)————实战问题

    一.多参数问题: 首先是在添加一个新的参数,其次在url中把这个id传递过去 def article_page(request, article_id): article = models.Artic ...

  6. Quartz(强大的定时器)

    1.关于Quartz的配置文件说明 # # Quartz会优先读取项目下我们自定义这个quartz.properties配置文件 否则会去读取quartzjar包下org.quatrz包# 下面的那个 ...

  7. 【转】Linux修改SSH端口和禁止Root远程登陆

    Linux修改ssh端口22 vi /etc/ssh/ssh_config vi /etc/ssh/sshd_config 然后修改为port 8888 以root身份service sshd res ...

  8. linux挂载ntfs格式的硬盘

    发生了一件辣眼睛的操作,一个现场应用升级,由于跨度很大,不敢直接动,就把现场的数据库dump拿回来,在公司做写升级测试. 于是,联系现场的工程师把数据库dump导出来,放到网盘弄回来. ------- ...

  9. 走进JDK(三)------AbstractStringBuilder、StringBuffer、StringBuilder

    AbstractStringBuilder是一个抽象类,StringBuffer.StringBuilder则继承AbstractStringBuilder,所以先说AbstractStringBui ...

  10. MongoDB常用命令总结

    查看数据库 show dbs; 选择某个库 use db; 查看库下的表(暂且说成是表,mongodb中称表问文档) show collections; 插入数据 db.table.insert( { ...