FunDA(16)- 示范:整合并行运算 - total parallelism solution
在对上两篇讨论中我们介绍了并行运算的两种体现方式:并行构建数据源及并行运算用户自定义函数。我们分别对这两部分进行了示范。本篇我准备示范把这两种情况集成一体的并行运算模式。这次介绍的数据源并行构建方式也与前面描述的有所不同:在前面讨论里我们预知需要从三个独立流来并行构建数据源。但如果我们有一个不知长度的数据流,它的每个元素代表不同的数据流,应该如何处理。我们知道在AQMRPT表里有从1999年到2xxx年的空气质量测量数据,我们可以试着并行把按年份生成的数据流构建成一个数据源。直接使用上期示范中的铺垫代码包括NORMAQM表初始化和从STATES和COUNTIES里用名称搜索对应id的函数:
- val db = Database.forConfig("h2db")
- //drop original table schema
- val futVectorTables = db.run(MTable.getTables)
- val futDropTable = futVectorTables.flatMap{ tables => {
- val tableNames = tables.map(t => t.name.name)
- if (tableNames.contains(NORMAQMQuery.baseTableRow.tableName))
- db.run(NORMAQMQuery.schema.drop)
- else Future()
- }
- }.andThen {
- case Success(_) => println(s"Table ${NORMAQMQuery.baseTableRow.tableName} dropped successfully! ")
- case Failure(e) => println(s"Failed to drop Table ${NORMAQMQuery.baseTableRow.tableName}, it may not exist! Error: ${e.getMessage}")
- }
- Await.ready(futDropTable,Duration.Inf)
- //create new table to refine AQMRawTable
- val actionCreateTable = Models.NORMAQMQuery.schema.create
- val futCreateTable = db.run(actionCreateTable).andThen {
- case Success(_) => println("Table created successfully!")
- case Failure(e) => println(s"Table may exist already! Error: ${e.getMessage}")
- }
- //would carry on even fail to create table
- Await.ready(futCreateTable,Duration.Inf)
- //truncate data, only available in slick 3.2.1
- val futTruncateTable = futVectorTables.flatMap{ tables => {
- val tableNames = tables.map(t => t.name.name)
- if (tableNames.contains(NORMAQMQuery.baseTableRow.tableName))
- db.run(NORMAQMQuery.schema.truncate)
- else Future()
- }
- }.andThen {
- case Success(_) => println(s"Table ${NORMAQMQuery.baseTableRow.tableName} truncated successfully!")
- case Failure(e) => println(s"Failed to truncate Table ${NORMAQMQuery.baseTableRow.tableName}! Error: ${e.getMessage}")
- }
- Await.ready(futDropTable,Duration.Inf)
- //a conceived task for the purpose of resource consumption
- //getting id with corresponding name from STATES table
- def getStateID(state: String): Int = {
- //create a stream for state id with state name
- implicit def toState(row: StateTable#TableElementType) = StateModel(row.id,row.name)
- val stateLoader = FDAViewLoader(slick.jdbc.H2Profile)(toState _)
- val stateSeq = stateLoader.fda_typedRows(StateQuery.result)(db).toSeq
- //constructed a Stream[Task,String]
- val stateStream = fda_staticSource(stateSeq)()
- var id = -
- def getid: FDAUserTask[FDAROW] = row => {
- row match {
- case StateModel(stid,stname) => //target row type
- if (stname.contains(state)) {
- id = stid
- fda_break //exit
- }
- else fda_skip //take next row
- case _ => fda_skip
- }
- }
- stateStream.appendTask(getid).startRun
- id
- }
- //another conceived task for the purpose of resource consumption
- //getting id with corresponding names from COUNTIES table
- def getCountyID(state: String, county: String): Int = {
- //create a stream for county id with state name and county name
- implicit def toCounty(row: CountyTable#TableElementType) = CountyModel(row.id,row.name)
- val countyLoader = FDAViewLoader(slick.jdbc.H2Profile)(toCounty _)
- val countySeq = countyLoader.fda_typedRows(CountyQuery.result)(db).toSeq
- //constructed a Stream[Task,String]
- val countyStream = fda_staticSource(countySeq)()
- var id = -
- def getid: FDAUserTask[FDAROW] = row => {
- row match {
- case CountyModel(cid,cname) => //target row type
- if (cname.contains(state) && cname.contains(county)) {
- id = cid
- fda_break //exit
- }
- else fda_skip //take next row
- case _ => fda_skip
- }
- }
- countyStream.appendTask(getid).startRun
- id
- }
以及两个用户自定义函数:
- //process input row and produce action row to insert into NORMAQM
- def getIdsThenInsertAction: FDAUserTask[FDAROW] = row => {
- row match {
- case aqm: AQMRPTModel =>
- if (aqm.valid) {
- val stateId = getStateID(aqm.state)
- val countyId = getCountyID(aqm.state,aqm.county)
- val action = NORMAQMQuery += NORMAQMModel(,aqm.mid, stateId, countyId, aqm.year,aqm.value,aqm.total)
- fda_next(FDAActionRow(action))
- }
- else fda_skip
- case _ => fda_skip
- }
- }
- //runner for the action rows
- val runner = FDAActionRunner(slick.jdbc.H2Profile)
- def runInsertAction: FDAUserTask[FDAROW] = row =>
- row match {
- case FDAActionRow(action) =>
- runner.fda_execAction(action)(db)
- fda_skip
- case _ => fda_skip
- }
跟着是本篇新增代码,我们先构建一个所有年份的流:
- //create parallel sources
- //get a stream of years
- val qryYears = AQMRPTQuery.map(_.year).distinct
- case class Years(year: Int) extends FDAROW
- implicit def toYears(y: Int) = Years(y)
- val yearViewLoader = FDAViewLoader(slick.jdbc.H2Profile)(toYears _)
- val yearSeq = yearViewLoader.fda_typedRows(qryYears.result)(db).toSeq
- val yearStream = fda_staticSource(yearSeq)()
下面是一个按年份从AQMRPT表读取数据的函数:
- //strong row type
- implicit def toAQMRPT(row: AQMRPTTable#TableElementType) =
- AQMRPTModel(row.rid, row.mid, row.state, row.county, row.year, row.value, row.total, row.valid)
- //shared stream loader when operate in parallel mode
- val AQMRPTLoader = FDAStreamLoader(slick.jdbc.H2Profile)(toAQMRPT _)
- //loading rows with year yr
- def loadRowsInYear(yr: Int) = {
- //a new query
- val query = AQMRPTQuery.filter(row => row.year === yr)
- //reuse same loader
- AQMRPTLoader.fda_typedStream(query.result)(db)(, )()
- }
我们可以预见多个loadRowsInYear函数实例会共享统一的FDAStreamLoader AQMRPTLoader。用户自定义数据读取函数类型是FDASourceLoader。下面是FDASourceLoader示范代码:
- //loading rows by year
- def loadRowsByYear: FDASourceLoader = row => {
- row match {
- case Years(y) => loadRowsInYear(y) //produce stream of the year
- case _ => fda_appendRow(FDANullRow)
- }
- }
我们用toParSource构建一个并行数据源:
- //get parallel source constructor
- val parSource = yearStream.toParSource(loadRowsByYear)
用fda_par_source来把并行数据源转换成统一数据流:
- //produce a stream from parallel sources
- val source = fda_par_source(parSource)()
source是个FDAPipeLine,可以直接运算:source.startRun,也可以在后面挂上多个环节。下面我们把其它两个用户自定义函数转成并行运算函数后接到source后面:
- //the following is a process of composition of stream combinators
- //get parallel source constructor
- val parSource = yearStream.toParSource(loadRowsByYear)
- //implicit val strategy = Strategy.fromCachedDaemonPool("cachedPool")
- //produce a stream from parallel sources
- val source = fda_par_source(parSource)()
- //turn getIdsThenInsertAction into parallel task
- val parTasks = source.toPar(getIdsThenInsertAction)
- //runPar to produce a new stream
- val actionStream =fda_runPar(parTasks)()
- //turn runInsertAction into parallel task
- val parRun = actionStream.toPar(runInsertAction)
- //runPar and carry out by startRun
- fda_runPar(parRun)().startRun
下面是本次示范的完整源代码:
- import slick.jdbc.meta._
- import com.bayakala.funda._
- import api._
- import scala.language.implicitConversions
- import scala.concurrent.ExecutionContext.Implicits.global
- import scala.concurrent.duration._
- import scala.concurrent.{Await, Future}
- import scala.util.{Failure, Success}
- import slick.jdbc.H2Profile.api._
- import Models._
- import fs2.Strategy
- object ParallelExecution extends App {
- val db = Database.forConfig("h2db")
- //drop original table schema
- val futVectorTables = db.run(MTable.getTables)
- val futDropTable = futVectorTables.flatMap{ tables => {
- val tableNames = tables.map(t => t.name.name)
- if (tableNames.contains(NORMAQMQuery.baseTableRow.tableName))
- db.run(NORMAQMQuery.schema.drop)
- else Future()
- }
- }.andThen {
- case Success(_) => println(s"Table ${NORMAQMQuery.baseTableRow.tableName} dropped successfully! ")
- case Failure(e) => println(s"Failed to drop Table ${NORMAQMQuery.baseTableRow.tableName}, it may not exist! Error: ${e.getMessage}")
- }
- Await.ready(futDropTable,Duration.Inf)
- //create new table to refine AQMRawTable
- val actionCreateTable = Models.NORMAQMQuery.schema.create
- val futCreateTable = db.run(actionCreateTable).andThen {
- case Success(_) => println("Table created successfully!")
- case Failure(e) => println(s"Table may exist already! Error: ${e.getMessage}")
- }
- //would carry on even fail to create table
- Await.ready(futCreateTable,Duration.Inf)
- //truncate data, only available in slick 3.2.1
- val futTruncateTable = futVectorTables.flatMap{ tables => {
- val tableNames = tables.map(t => t.name.name)
- if (tableNames.contains(NORMAQMQuery.baseTableRow.tableName))
- db.run(NORMAQMQuery.schema.truncate)
- else Future()
- }
- }.andThen {
- case Success(_) => println(s"Table ${NORMAQMQuery.baseTableRow.tableName} truncated successfully!")
- case Failure(e) => println(s"Failed to truncate Table ${NORMAQMQuery.baseTableRow.tableName}! Error: ${e.getMessage}")
- }
- Await.ready(futDropTable,Duration.Inf)
- //a conceived task for the purpose of resource consumption
- //getting id with corresponding name from STATES table
- def getStateID(state: String): Int = {
- //create a stream for state id with state name
- implicit def toState(row: StateTable#TableElementType) = StateModel(row.id,row.name)
- val stateLoader = FDAViewLoader(slick.jdbc.H2Profile)(toState _)
- val stateSeq = stateLoader.fda_typedRows(StateQuery.result)(db).toSeq
- //constructed a Stream[Task,String]
- val stateStream = fda_staticSource(stateSeq)()
- var id = -
- def getid: FDAUserTask[FDAROW] = row => {
- row match {
- case StateModel(stid,stname) => //target row type
- if (stname.contains(state)) {
- id = stid
- fda_break //exit
- }
- else fda_skip //take next row
- case _ => fda_skip
- }
- }
- stateStream.appendTask(getid).startRun
- id
- }
- //another conceived task for the purpose of resource consumption
- //getting id with corresponding names from COUNTIES table
- def getCountyID(state: String, county: String): Int = {
- //create a stream for county id with state name and county name
- implicit def toCounty(row: CountyTable#TableElementType) = CountyModel(row.id,row.name)
- val countyLoader = FDAViewLoader(slick.jdbc.H2Profile)(toCounty _)
- val countySeq = countyLoader.fda_typedRows(CountyQuery.result)(db).toSeq
- //constructed a Stream[Task,String]
- val countyStream = fda_staticSource(countySeq)()
- var id = -
- def getid: FDAUserTask[FDAROW] = row => {
- row match {
- case CountyModel(cid,cname) => //target row type
- if (cname.contains(state) && cname.contains(county)) {
- id = cid
- fda_break //exit
- }
- else fda_skip //take next row
- case _ => fda_skip
- }
- }
- countyStream.appendTask(getid).startRun
- id
- }
- //process input row and produce action row to insert into NORMAQM
- def getIdsThenInsertAction: FDAUserTask[FDAROW] = row => {
- row match {
- case aqm: AQMRPTModel =>
- if (aqm.valid) {
- val stateId = getStateID(aqm.state)
- val countyId = getCountyID(aqm.state,aqm.county)
- val action = NORMAQMQuery += NORMAQMModel(,aqm.mid, stateId, countyId, aqm.year,aqm.value,aqm.total)
- fda_next(FDAActionRow(action))
- }
- else fda_skip
- case _ => fda_skip
- }
- }
- //runner for the action rows
- val runner = FDAActionRunner(slick.jdbc.H2Profile)
- def runInsertAction: FDAUserTask[FDAROW] = row =>
- row match {
- case FDAActionRow(action) =>
- runner.fda_execAction(action)(db)
- fda_skip
- case _ => fda_skip
- }
- //create parallel sources
- //get a stream of years
- val qryYears = AQMRPTQuery.map(_.year).distinct
- case class Years(year: Int) extends FDAROW
- implicit def toYears(y: Int) = Years(y)
- val yearViewLoader = FDAViewLoader(slick.jdbc.H2Profile)(toYears _)
- val yearSeq = yearViewLoader.fda_typedRows(qryYears.result)(db).toSeq
- val yearStream = fda_staticSource(yearSeq)()
- //strong row type
- implicit def toAQMRPT(row: AQMRPTTable#TableElementType) =
- AQMRPTModel(row.rid, row.mid, row.state, row.county, row.year, row.value, row.total, row.valid)
- //shared stream loader when operate in parallel mode
- val AQMRPTLoader = FDAStreamLoader(slick.jdbc.H2Profile)(toAQMRPT _)
- //loading rows with year yr
- def loadRowsInYear(yr: Int) = {
- //a new query
- val query = AQMRPTQuery.filter(row => row.year === yr)
- //reuse same loader
- AQMRPTLoader.fda_typedStream(query.result)(db)(, )()
- }
- //loading rows by year
- def loadRowsByYear: FDASourceLoader = row => {
- row match {
- case Years(y) => loadRowsInYear(y) //produce stream of the year
- case _ => fda_appendRow(FDANullRow)
- }
- }
- //start counter
- val cnt_start = System.currentTimeMillis()
- def showRecord: FDAUserTask[FDAROW] = row => {
- row match {
- case Years(y) => println(y); fda_skip
- case aqm: AQMRPTModel =>
- println(s"${aqm.year} $aqm")
- fda_skip
- case FDAActionRow(action) =>
- println(s"${action}")
- fda_skip
- case _ => fda_skip
- }
- }
- //the following is a process of composition of stream combinators
- //get parallel source constructor
- val parSource = yearStream.toParSource(loadRowsByYear)
- //implicit val strategy = Strategy.fromCachedDaemonPool("cachedPool")
- //produce a stream from parallel sources
- val source = fda_par_source(parSource)()
- //turn getIdsThenInsertAction into parallel task
- val parTasks = source.toPar(getIdsThenInsertAction)
- //runPar to produce a new stream
- val actionStream =fda_runPar(parTasks)()
- //turn runInsertAction into parallel task
- val parRun = actionStream.toPar(runInsertAction)
- //runPar and carry out by startRun
- fda_runPar(parRun)().startRun
- println(s"processing 219400 rows parallelly in ${(System.currentTimeMillis - cnt_start)/1000} seconds")
- }
FunDA(16)- 示范:整合并行运算 - total parallelism solution的更多相关文章
- FunDA(14)- 示范:并行运算,并行数据库读取 - parallel data loading
FunDA的并行数据库读取功能是指在多个线程中同时对多个独立的数据源进行读取.这些独立的数据源可以是在不同服务器上的数据库表,又或者把一个数据库表分成几个独立部分形成的独立数据源.当然,并行读取的最终 ...
- java JDK8 学习笔记——第16章 整合数据库
第十六章 整合数据库 16.1 JDBC入门 16.1.1 JDBC简介 1.JDBC是java联机数据库的标准规范.它定义了一组标准类与接口,标准API中的接口会有数据库厂商操作,称为JDBC驱动程 ...
- SpringBoot学习笔记(16)----SpringBoot整合Swagger2
Swagger 是一个规范和完整的框架,用于生成,描述,调用和可视化RESTful风格的web服务 http://swagger.io Springfox的前身是swagger-springmvc,是 ...
- spring 5.x 系列第16篇 —— 整合dubbo (代码配置方式)
文章目录 一. 项目结构说明 二.项目依赖 三.公共模块(dubbo-ano-common) 四. 服务提供者(dubbo-ano-provider) 4.1 提供方配置 4.2 使用注解@Servi ...
- FunDA(11)- 数据库操作的并行运算:Parallel data processing
FunDA最重要的设计目标之一就是能够实现数据库操作的并行运算.我们先重温一下fs2是如何实现并行运算的.我们用interleave.merge.either这几种方式来同时处理两个Stream里的元 ...
- Total Commander 8.52 Beta 1
Total Commander 8.52 Beta 1http://www.ghisler.com/852_b1.php 10.08.15 Release Total Commander 8.52 b ...
- FunDA(0)- Functional Data Access accessible to all
大数据.多核CPU驱动了函数式编程模式的兴起.因为函数式编程更适合多线程.复杂.安全的大型软件编程.但是,对许多有应用软件开发经验的编程者来说,函数式编程模式是一种全新的.甚至抽象的概念,可能需要很长 ...
- mybatis系列-16-spring和mybatis整合
16.1 整合思路 需要spring通过单例方式管理SqlSessionFactory. spring和mybatis整合生成代理对象,使用SqlSessionFactory创建SqlSess ...
- ThinkPHP与EasyUI整合之三(searchbox):在datagrid中查询指定记录
在datagrid中toolbar添加searchbox查询框,根据列范围查询数据,先看效果图: 1. searchbox采用easyui的Demo例子,再加以js扩展,根据datagrid中的列数据 ...
随机推荐
- DHT
DHT(Distributed Hash Table,分布式哈希表)类似Tracker的根据种子特征码返回种子信息的网络.DHT全称叫分布式哈希表(Distributed Hash Table),是一 ...
- ListView动态改变每一项的高度。
ListView中每一项的高度默认是相同的,除非超过其预定高度值,否则需要动点手脚. VariableSizedListView 继承 ListView然后重写protected override v ...
- 失踪的7(P1590&NOIP水题测试(2017082301))
题目链接:失踪的7 水题,不解释. #include<bits/stdc++.h> using namespace std; int main(){ int t; scanf(" ...
- keras框架的MLP手写数字识别MNIST,梳理?
keras框架的MLP手写数字识别MNIST 代码: # coding: utf-8 # In[1]: import numpy as np import pandas as pd from kera ...
- Django的学习(五)————实战问题
一.多参数问题: 首先是在添加一个新的参数,其次在url中把这个id传递过去 def article_page(request, article_id): article = models.Artic ...
- Quartz(强大的定时器)
1.关于Quartz的配置文件说明 # # Quartz会优先读取项目下我们自定义这个quartz.properties配置文件 否则会去读取quartzjar包下org.quatrz包# 下面的那个 ...
- 【转】Linux修改SSH端口和禁止Root远程登陆
Linux修改ssh端口22 vi /etc/ssh/ssh_config vi /etc/ssh/sshd_config 然后修改为port 8888 以root身份service sshd res ...
- linux挂载ntfs格式的硬盘
发生了一件辣眼睛的操作,一个现场应用升级,由于跨度很大,不敢直接动,就把现场的数据库dump拿回来,在公司做写升级测试. 于是,联系现场的工程师把数据库dump导出来,放到网盘弄回来. ------- ...
- 走进JDK(三)------AbstractStringBuilder、StringBuffer、StringBuilder
AbstractStringBuilder是一个抽象类,StringBuffer.StringBuilder则继承AbstractStringBuilder,所以先说AbstractStringBui ...
- MongoDB常用命令总结
查看数据库 show dbs; 选择某个库 use db; 查看库下的表(暂且说成是表,mongodb中称表问文档) show collections; 插入数据 db.table.insert( { ...