FunDA(15)- 示范:任务并行运算 - user task parallel execution
FunDA的并行运算施用就是对用户自定义函数的并行运算。原理上就是把一个输入流截分成多个输入流并行地输入到一个自定义函数的多个运行实例。这些函数运行实例同时在各自不同的线程里同步运算直至耗尽所有输入。并行运算的具体函数实例数是用fs2-nondeterminism的算法根据CPU内核数、线程池配置和用户指定的最大运算实例数来决定的。我们在这次示范里可以对比一下同样工作内容的并行运算和串形运算效率。在前面示范里我们获取了一个AQMRPT表。但这个表不够合理化(normalized):state和county还没有实现编码与STATES和COUNTIES表的连接。在这次示范里我们就创建一个新表NORMAQM,把AQMRPT表内数据都搬进来。并在这个过程中把STATENAME和COUNTYNAME字段转换成STATES和COUNTIES表的id字段。下面就是NORMAQM表结构:
case class NORMAQMModel(rid: Long
, mid: Int
, state: Int
, county: Int
, year: Int
, value: Int
, average: Int
) extends FDAROW class NORMAQMTable(tag: Tag) extends Table[NORMAQMModel](tag, "NORMAQM") {
def rid = column[Long]("ROWID",O.AutoInc,O.PrimaryKey)
def mid = column[Int]("MEASUREID")
def state = column[Int]("STATID")
def county = column[Int]("COUNTYID")
def year = column[Int]("REPORTYEAR")
def value = column[Int]("VALUE")
def average = column[Int]("AVG") def * = (rid,mid,state,county,year,value,average) <> (NORMAQMModel.tupled, NORMAQMModel.unapply)
} val NORMAQMQuery = TableQuery[NORMAQMTable]
下面是这个表的初始化铺垫代码:
val db = Database.forConfig("h2db") //drop original table schema
val futVectorTables = db.run(MTable.getTables) val futDropTable = futVectorTables.flatMap{ tables => {
val tableNames = tables.map(t => t.name.name)
if (tableNames.contains(NORMAQMQuery.baseTableRow.tableName))
db.run(NORMAQMQuery.schema.drop)
else Future()
}
}.andThen {
case Success(_) => println(s"Table ${NORMAQMQuery.baseTableRow.tableName} dropped successfully! ")
case Failure(e) => println(s"Failed to drop Table ${NORMAQMQuery.baseTableRow.tableName}, it may not exist! Error: ${e.getMessage}")
}
Await.ready(futDropTable,Duration.Inf) //create new table to refine AQMRawTable
val actionCreateTable = Models.NORMAQMQuery.schema.create
val futCreateTable = db.run(actionCreateTable).andThen {
case Success(_) => println("Table created successfully!")
case Failure(e) => println(s"Table may exist already! Error: ${e.getMessage}")
}
//would carry on even fail to create table
Await.ready(futCreateTable,Duration.Inf) //truncate data, only available in slick 3.2.1
val futTruncateTable = futVectorTables.flatMap{ tables => {
val tableNames = tables.map(t => t.name.name)
if (tableNames.contains(NORMAQMQuery.baseTableRow.tableName))
db.run(NORMAQMQuery.schema.truncate)
else Future()
}
}.andThen {
case Success(_) => println(s"Table ${NORMAQMQuery.baseTableRow.tableName} truncated successfully!")
case Failure(e) => println(s"Failed to truncate Table ${NORMAQMQuery.baseTableRow.tableName}! Error: ${e.getMessage}")
}
Await.ready(futDropTable,Duration.Inf)
我们需要设计一个函数从STATES表里用AQMRPT表的STATENAME查询ID。我故意把这个函数设计成一个完整的FunDA程序。这样可以模拟一个比较消耗io和计算资源的独立过程(不要理会任何合理性,目标是增加io和运算消耗):
//a conceived task for the purpose of resource consumption
//getting id with corresponding name from STATES table
def getStateID(state: String): Int = {
//create a stream for state id with state name
implicit def toState(row: StateTable#TableElementType) = StateModel(row.id,row.name)
val stateLoader = FDAViewLoader(slick.jdbc.H2Profile)(toState _)
val stateSeq = stateLoader.fda_typedRows(StateQuery.result)(db).toSeq
//constructed a Stream[Task,String]
val stateStream = fda_staticSource(stateSeq)()
var id = -
def getid: FDAUserTask[FDAROW] = row => {
row match {
case StateModel(stid,stname) => //target row type
if (stname.contains(state)) {
id = stid
fda_break //exit
}
else fda_skip //take next row
case _ => fda_skip
}
}
stateStream.appendTask(getid).startRun
id
}
可以看到getStateID函数每次运算都重复构建stateStream。这样可以达到增加io操作的目的。
同样,我们也需要设计另一个函数来从COUNTIES表里获取id字段:
//another conceived task for the purpose of resource consumption
//getting id with corresponding names from COUNTIES table
def getCountyID(state: String, county: String): Int = {
//create a stream for county id with state name and county name
implicit def toCounty(row: CountyTable#TableElementType) = CountyModel(row.id,row.name)
val countyLoader = FDAViewLoader(slick.jdbc.H2Profile)(toCounty _)
val countySeq = countyLoader.fda_typedRows(CountyQuery.result)(db).toSeq
//constructed a Stream[Task,String]
val countyStream = fda_staticSource(countySeq)()
var id = -
def getid: FDAUserTask[FDAROW] = row => {
row match {
case CountyModel(cid,cname) => //target row type
if (cname.contains(state) && cname.contains(county)) {
id = cid
fda_break //exit
}
else fda_skip //take next row
case _ => fda_skip
}
}
countyStream.appendTask(getid).startRun
id
}
我们可以如下这样获取这个程序的数据源:
//original table listing
implicit def toAQMRPT(row: AQMRPTTable#TableElementType) =
AQMRPTModel(row.rid,row.mid,row.state,row.county,row.year,row.value,row.total,row.valid)
val AQMRPTLoader = FDAStreamLoader(slick.jdbc.H2Profile)(toAQMRPT _)
val AQMRPTStream = AQMRPTLoader.fda_typedStream(AQMRPTQuery.result)(db)(,)()
按照正常的FunDA流程我们设计了两个用户自定义函数:一个根据数据行内的state和county字段调用函数getStateID和getCountyID获取相应id后构建一条新的NORMAQM表插入指令行,然后传给下个自定义函数。下个自定义函数就直接运算收到的动作行:
def getIdsThenInsertAction: FDAUserTask[FDAROW] = row => {
row match {
case aqm: AQMRPTModel =>
if (aqm.valid) {
val stateId = getStateID(aqm.state)
val countyId = getCountyID(aqm.state,aqm.county)
val action = NORMAQMQuery += NORMAQMModel(,aqm.mid, stateId, countyId, aqm.year,aqm.value,aqm.total)
fda_next(FDAActionRow(action))
}
else fda_skip
case _ => fda_skip
}
}
val runner = FDAActionRunner(slick.jdbc.H2Profile)
def runInsertAction: FDAUserTask[FDAROW] = row =>
row match {
case FDAActionRow(action) =>
runner.fda_execAction(action)(db)
fda_skip
case _ => fda_skip
}
像前面几篇示范那样我们把这两个用户自定义函数与数据源组合起来成为完整的FunDA程序后startRun就可以得到实际效果了:
AQMRPTStream.take()
.appendTask(getIdsThenInsertAction)
.appendTask(runInsertAction)
.startRun
这个程序运算了579秒,不过这是个单一线程运算。我们想知道并行运算结果。那么我们首先要把这个getIdsThenInsertAction转成一个并行运算函数FDAParTask:
AQMRPTStream.toPar(getIdsThenInsertAction)
FunDA提供了并行运算器fda_runPar:
implicit val strategy = Strategy.fromCachedDaemonPool("cachedPool")
fda_runPar(AQMRPTStream.take().toPar(getIdsThenInsertAction))() //max 8 open computations
.appendTask(runInsertAction)
.startRun
我们可以自定义线程池。fda_runPar返回标准的FunDA FDAPipeLine,所以我们可以在后面挂上runInsertAction函数。下面是不同行数的运算时间对比结果:
//processing 10000 rows in a single thread in 570 seconds
// processing 10000 rows parallelly in 316 seconds //processing 20000 rows in a single thread in 1090 seconds
//processing 20000 rows parallelly in 614 seconds //processing 100000 rows in a single thread in 2+ hrs
//processing 100000 rows parallelly in 3885 seconds
可以得出,并行运算对越大数据集有更大的效率提高。下面就是这次示范的源代码:
import slick.jdbc.meta._
import com.bayakala.funda._
import api._
import scala.language.implicitConversions
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
import scala.concurrent.{Await, Future}
import scala.util.{Failure, Success}
import slick.jdbc.H2Profile.api._
import Models._
import fs2.Strategy object ParallelTasks extends App { val db = Database.forConfig("h2db") //drop original table schema
val futVectorTables = db.run(MTable.getTables) val futDropTable = futVectorTables.flatMap{ tables => {
val tableNames = tables.map(t => t.name.name)
if (tableNames.contains(NORMAQMQuery.baseTableRow.tableName))
db.run(NORMAQMQuery.schema.drop)
else Future()
}
}.andThen {
case Success(_) => println(s"Table ${NORMAQMQuery.baseTableRow.tableName} dropped successfully! ")
case Failure(e) => println(s"Failed to drop Table ${NORMAQMQuery.baseTableRow.tableName}, it may not exist! Error: ${e.getMessage}")
}
Await.ready(futDropTable,Duration.Inf) //create new table to refine AQMRawTable
val actionCreateTable = Models.NORMAQMQuery.schema.create
val futCreateTable = db.run(actionCreateTable).andThen {
case Success(_) => println("Table created successfully!")
case Failure(e) => println(s"Table may exist already! Error: ${e.getMessage}")
}
//would carry on even fail to create table
Await.ready(futCreateTable,Duration.Inf) //truncate data, only available in slick 3.2.1
val futTruncateTable = futVectorTables.flatMap{ tables => {
val tableNames = tables.map(t => t.name.name)
if (tableNames.contains(NORMAQMQuery.baseTableRow.tableName))
db.run(NORMAQMQuery.schema.truncate)
else Future()
}
}.andThen {
case Success(_) => println(s"Table ${NORMAQMQuery.baseTableRow.tableName} truncated successfully!")
case Failure(e) => println(s"Failed to truncate Table ${NORMAQMQuery.baseTableRow.tableName}! Error: ${e.getMessage}")
}
Await.ready(futDropTable,Duration.Inf) //a conceived task for the purpose of resource consumption
//getting id with corresponding name from STATES table
def getStateID(state: String): Int = {
//create a stream for state id with state name
implicit def toState(row: StateTable#TableElementType) = StateModel(row.id,row.name)
val stateLoader = FDAViewLoader(slick.jdbc.H2Profile)(toState _)
val stateSeq = stateLoader.fda_typedRows(StateQuery.result)(db).toSeq
//constructed a Stream[Task,String]
val stateStream = fda_staticSource(stateSeq)()
var id = -
def getid: FDAUserTask[FDAROW] = row => {
row match {
case StateModel(stid,stname) => //target row type
if (stname.contains(state)) {
id = stid
fda_break //exit
}
else fda_skip //take next row
case _ => fda_skip
}
}
stateStream.appendTask(getid).startRun
id
}
//another conceived task for the purpose of resource consumption
//getting id with corresponding names from COUNTIES table
def getCountyID(state: String, county: String): Int = {
//create a stream for county id with state name and county name
implicit def toCounty(row: CountyTable#TableElementType) = CountyModel(row.id,row.name)
val countyLoader = FDAViewLoader(slick.jdbc.H2Profile)(toCounty _)
val countySeq = countyLoader.fda_typedRows(CountyQuery.result)(db).toSeq
//constructed a Stream[Task,String]
val countyStream = fda_staticSource(countySeq)()
var id = -
def getid: FDAUserTask[FDAROW] = row => {
row match {
case CountyModel(cid,cname) => //target row type
if (cname.contains(state) && cname.contains(county)) {
id = cid
fda_break //exit
}
else fda_skip //take next row
case _ => fda_skip
}
}
countyStream.appendTask(getid).startRun
id
} //original table listing
implicit def toAQMRPT(row: AQMRPTTable#TableElementType) =
AQMRPTModel(row.rid,row.mid,row.state,row.county,row.year,row.value,row.total,row.valid)
val AQMRPTLoader = FDAStreamLoader(slick.jdbc.H2Profile)(toAQMRPT _)
val AQMRPTStream = AQMRPTLoader.fda_typedStream(AQMRPTQuery.result)(db)(,)() def getIdsThenInsertAction: FDAUserTask[FDAROW] = row => {
row match {
case aqm: AQMRPTModel =>
if (aqm.valid) {
val stateId = getStateID(aqm.state)
val countyId = getCountyID(aqm.state,aqm.county)
val action = NORMAQMQuery += NORMAQMModel(,aqm.mid, stateId, countyId, aqm.year,aqm.value,aqm.total)
fda_next(FDAActionRow(action))
}
else fda_skip
case _ => fda_skip
}
}
val runner = FDAActionRunner(slick.jdbc.H2Profile)
def runInsertAction: FDAUserTask[FDAROW] = row =>
row match {
case FDAActionRow(action) =>
runner.fda_execAction(action)(db)
fda_skip
case _ => fda_skip
} val cnt_start = System.currentTimeMillis()
/*
AQMRPTStream.take()
.appendTask(getIdsThenInsertAction)
.appendTask(runInsertAction)
.startRun
//println(s"processing 10000 rows in a single thread in ${(System.currentTimeMillis - cnt_start)/1000} seconds")
//processing 10000 rows in a single thread in 570 seconds
//println(s"processing 20000 rows in a single thread in ${(System.currentTimeMillis - cnt_start)/1000} seconds")
//processing 20000 rows in a single thread in 1090 seconds
//println(s"processing 100000 rows in a single thread in ${(System.currentTimeMillis - cnt_start)/1000} seconds")
//processing 100000 rows in a single thread in 2+ hrs implicit val strategy = Strategy.fromCachedDaemonPool("cachedPool")
fda_runPar(AQMRPTStream.take().toPar(getIdsThenInsertAction))()
.appendTask(runInsertAction)
.startRun //println(s"processing 10000 rows parallelly in ${(System.currentTimeMillis - cnt_start)/1000} seconds")
// processing 10000 rows parallelly in 316 seconds
//println(s"processing 20000 rows parallelly in ${(System.currentTimeMillis - cnt_start)/1000} seconds")
//processing 20000 rows parallelly in 614 seconds
println(s"processing 100000 rows parallelly in ${(System.currentTimeMillis - cnt_start)/1000} seconds")
//processing 100000 rows parallelly in 3885 seconds }
FunDA(15)- 示范:任务并行运算 - user task parallel execution的更多相关文章
- Winform Global exception and task parallel library exception;
static class Program { /// <summary> /// 应用程序的主入口点. /// </summary> [STAThread] static vo ...
- C#5.0之后推荐使用TPL(Task Parallel Libray 任务并行库) 和PLINQ(Parallel LINQ, 并行Linq). 其次是TAP(Task-based Asynchronous Pattern, 基于任务的异步模式)
学习书籍: <C#本质论> 1--C#5.0之后推荐使用TPL(Task Parallel Libray 任务并行库) 和PLINQ(Parallel LINQ, 并行Linq). 其次是 ...
- Using the Task Parallel Library (TPL) for Events
Using the Task Parallel Library (TPL) for Events The parallel tasks library was introduced with the ...
- TPL(Task Parallel Library)多线程、并发功能
The Task Parallel Library (TPL) is a set of public types and APIs in the System.Threading and System ...
- Task Parallel Library01,基本用法
我们知道,每个应用程序就是一个进程,一个进程有多个线程.Task Parallel Library为我们的异步编程.多线程编程提供了强有力的支持,它允许一个主线程运行的同时,另外的一些线程或Task也 ...
- System and method for parallel execution of memory transactions using multiple memory models, including SSO, TSO, PSO and RMO
A data processor supports the use of multiple memory models by computer programs. At a device extern ...
- CMU Database Systems - Parallel Execution
并发执行,主要为了增大吞吐,降低延迟,提高数据库的可用性 先区分一组概念,parallel和distributed的区别 总的来说,parallel是指在物理上很近的节点,比如本机的多个线程或进程,不 ...
- FunDA(14)- 示范:并行运算,并行数据库读取 - parallel data loading
FunDA的并行数据库读取功能是指在多个线程中同时对多个独立的数据源进行读取.这些独立的数据源可以是在不同服务器上的数据库表,又或者把一个数据库表分成几个独立部分形成的独立数据源.当然,并行读取的最终 ...
- 异步和多线程,委托异步调用,Thread,ThreadPool,Task,Parallel,CancellationTokenSource
1 进程-线程-多线程,同步和异步2 异步使用和回调3 异步参数4 异步等待5 异步返回值 5 多线程的特点:不卡主线程.速度快.无序性7 thread:线程等待,回调,前台线程/后台线程, 8 th ...
随机推荐
- Devexpress RichEditControl 导入word文件后字体变为方正姚体的解决方案
最近在做一个排版软件,用过RichEditControl 导入外部Word文件的时候,发现导的文件后字体会变成“方正姚体”,官方这个BUG至少在V16.1版本尚未解决,翻阅了大量资料,发现 DevEx ...
- PAT 1048 数字加密(20)(代码+思路)
1048 数字加密(20)(20 分) 本题要求实现一种数字加密方法.首先固定一个加密用正整数A,对任一正整数B,将其每1位数字与A的对应位置上的数字进行以下运算:对奇数位,对应位的数字相加后对13取 ...
- struts2 的 ServletActionContext 和 actionContext,服务器代码测试, redirect 、dispatcher、chain、redirectAction
一.ServletActionContext 和 actionContext HttpServletRequest request=ServletActionContext.getRequest() ...
- Java 关键字有哪些
数据类型: Boolean(布尔型) int long short byte float double char class interface( ...
- 使用delphi 开发多层应用(二十三)KbmMW 的WIB
解释WIB 是什么之前,先回顾以下我们前面的各种服务工作方式.前面的各种服务的工作方式都是请求/应答方式. 客户端发送请求,服务器端根据客户端的请求,返回相应的结果.这种方式是一种顺序式访问,是一种紧 ...
- python数据类型2
一 文件格式补充 在python3中,除字符串外,所有数据类型在内存中的编码格式都是utf-8,而字符串在内存中的格式是Unicode的格式. 由于Unicode的格式无法存入硬盘中,所以这里还有一种 ...
- 2018.07.20 洛谷P4178 Tree(点分治)
传送门 又一道点分治. 直接维护子树内到根的所有路径长度,然后排序+双指针统计答案. 代码如下: #include<bits/stdc++.h> #define N 40005 using ...
- redis学习-事务命令
multi:开启事务 exec:提交事务 discard:取消事务 1.开启事务之后,每次执行命令之后,都要先进入事务队列中,只有在执行 exec之后才开始执行 2.开启事务之后,每次执行命令之后,都 ...
- hadoop学习笔记(五):java api 操作hdfs
HDFS的Java访问接口 1)org.apache.hadoop.fs.FileSystem 是一个通用的文件系统API,提供了不同文件系统的统一访问方式. 2)org.apache.hadoop. ...
- 如何使用 Visual C# 2005 或 Visual C# .NET 向 Excel 工作簿传输数据
本文分步介绍了多种从 Microsoft Visual C# 2005 或 Microsoft Visual C# .NET 程序向 Microsoft Excel 2002 传输数据的方法.本文还提 ...