FunDA的并行运算施用就是对用户自定义函数的并行运算。原理上就是把一个输入流截分成多个输入流并行地输入到一个自定义函数的多个运行实例。这些函数运行实例同时在各自不同的线程里同步运算直至耗尽所有输入。并行运算的具体函数实例数是用fs2-nondeterminism的算法根据CPU内核数、线程池配置和用户指定的最大运算实例数来决定的。我们在这次示范里可以对比一下同样工作内容的并行运算和串形运算效率。在前面示范里我们获取了一个AQMRPT表。但这个表不够合理化(normalized):state和county还没有实现编码与STATES和COUNTIES表的连接。在这次示范里我们就创建一个新表NORMAQM,把AQMRPT表内数据都搬进来。并在这个过程中把STATENAME和COUNTYNAME字段转换成STATES和COUNTIES表的id字段。下面就是NORMAQM表结构:

  case class NORMAQMModel(rid: Long
, mid: Int
, state: Int
, county: Int
, year: Int
, value: Int
, average: Int
) extends FDAROW class NORMAQMTable(tag: Tag) extends Table[NORMAQMModel](tag, "NORMAQM") {
def rid = column[Long]("ROWID",O.AutoInc,O.PrimaryKey)
def mid = column[Int]("MEASUREID")
def state = column[Int]("STATID")
def county = column[Int]("COUNTYID")
def year = column[Int]("REPORTYEAR")
def value = column[Int]("VALUE")
def average = column[Int]("AVG") def * = (rid,mid,state,county,year,value,average) <> (NORMAQMModel.tupled, NORMAQMModel.unapply)
} val NORMAQMQuery = TableQuery[NORMAQMTable]

下面是这个表的初始化铺垫代码:

  val db = Database.forConfig("h2db")

  //drop original table schema
val futVectorTables = db.run(MTable.getTables) val futDropTable = futVectorTables.flatMap{ tables => {
val tableNames = tables.map(t => t.name.name)
if (tableNames.contains(NORMAQMQuery.baseTableRow.tableName))
db.run(NORMAQMQuery.schema.drop)
else Future()
}
}.andThen {
case Success(_) => println(s"Table ${NORMAQMQuery.baseTableRow.tableName} dropped successfully! ")
case Failure(e) => println(s"Failed to drop Table ${NORMAQMQuery.baseTableRow.tableName}, it may not exist! Error: ${e.getMessage}")
}
Await.ready(futDropTable,Duration.Inf) //create new table to refine AQMRawTable
val actionCreateTable = Models.NORMAQMQuery.schema.create
val futCreateTable = db.run(actionCreateTable).andThen {
case Success(_) => println("Table created successfully!")
case Failure(e) => println(s"Table may exist already! Error: ${e.getMessage}")
}
//would carry on even fail to create table
Await.ready(futCreateTable,Duration.Inf) //truncate data, only available in slick 3.2.1
val futTruncateTable = futVectorTables.flatMap{ tables => {
val tableNames = tables.map(t => t.name.name)
if (tableNames.contains(NORMAQMQuery.baseTableRow.tableName))
db.run(NORMAQMQuery.schema.truncate)
else Future()
}
}.andThen {
case Success(_) => println(s"Table ${NORMAQMQuery.baseTableRow.tableName} truncated successfully!")
case Failure(e) => println(s"Failed to truncate Table ${NORMAQMQuery.baseTableRow.tableName}! Error: ${e.getMessage}")
}
Await.ready(futDropTable,Duration.Inf)

我们需要设计一个函数从STATES表里用AQMRPT表的STATENAME查询ID。我故意把这个函数设计成一个完整的FunDA程序。这样可以模拟一个比较消耗io和计算资源的独立过程(不要理会任何合理性,目标是增加io和运算消耗):

  //a conceived task for the purpose of resource consumption
//getting id with corresponding name from STATES table
def getStateID(state: String): Int = {
//create a stream for state id with state name
implicit def toState(row: StateTable#TableElementType) = StateModel(row.id,row.name)
val stateLoader = FDAViewLoader(slick.jdbc.H2Profile)(toState _)
val stateSeq = stateLoader.fda_typedRows(StateQuery.result)(db).toSeq
//constructed a Stream[Task,String]
val stateStream = fda_staticSource(stateSeq)()
var id = -
def getid: FDAUserTask[FDAROW] = row => {
row match {
case StateModel(stid,stname) => //target row type
if (stname.contains(state)) {
id = stid
fda_break //exit
}
else fda_skip //take next row
case _ => fda_skip
}
}
stateStream.appendTask(getid).startRun
id
}

可以看到getStateID函数每次运算都重复构建stateStream。这样可以达到增加io操作的目的。

同样,我们也需要设计另一个函数来从COUNTIES表里获取id字段:

  //another conceived task for the purpose of resource consumption
//getting id with corresponding names from COUNTIES table
def getCountyID(state: String, county: String): Int = {
//create a stream for county id with state name and county name
implicit def toCounty(row: CountyTable#TableElementType) = CountyModel(row.id,row.name)
val countyLoader = FDAViewLoader(slick.jdbc.H2Profile)(toCounty _)
val countySeq = countyLoader.fda_typedRows(CountyQuery.result)(db).toSeq
//constructed a Stream[Task,String]
val countyStream = fda_staticSource(countySeq)()
var id = -
def getid: FDAUserTask[FDAROW] = row => {
row match {
case CountyModel(cid,cname) => //target row type
if (cname.contains(state) && cname.contains(county)) {
id = cid
fda_break //exit
}
else fda_skip //take next row
case _ => fda_skip
}
}
countyStream.appendTask(getid).startRun
id
}

我们可以如下这样获取这个程序的数据源:

  //original table listing
implicit def toAQMRPT(row: AQMRPTTable#TableElementType) =
AQMRPTModel(row.rid,row.mid,row.state,row.county,row.year,row.value,row.total,row.valid)
val AQMRPTLoader = FDAStreamLoader(slick.jdbc.H2Profile)(toAQMRPT _)
val AQMRPTStream = AQMRPTLoader.fda_typedStream(AQMRPTQuery.result)(db)(,)()

按照正常的FunDA流程我们设计了两个用户自定义函数:一个根据数据行内的state和county字段调用函数getStateID和getCountyID获取相应id后构建一条新的NORMAQM表插入指令行,然后传给下个自定义函数。下个自定义函数就直接运算收到的动作行:

  def getIdsThenInsertAction: FDAUserTask[FDAROW] = row => {
row match {
case aqm: AQMRPTModel =>
if (aqm.valid) {
val stateId = getStateID(aqm.state)
val countyId = getCountyID(aqm.state,aqm.county)
val action = NORMAQMQuery += NORMAQMModel(,aqm.mid, stateId, countyId, aqm.year,aqm.value,aqm.total)
fda_next(FDAActionRow(action))
}
else fda_skip
case _ => fda_skip
}
}
val runner = FDAActionRunner(slick.jdbc.H2Profile)
def runInsertAction: FDAUserTask[FDAROW] = row =>
row match {
case FDAActionRow(action) =>
runner.fda_execAction(action)(db)
fda_skip
case _ => fda_skip
}

像前面几篇示范那样我们把这两个用户自定义函数与数据源组合起来成为完整的FunDA程序后startRun就可以得到实际效果了:

    AQMRPTStream.take()
.appendTask(getIdsThenInsertAction)
.appendTask(runInsertAction)
.startRun

这个程序运算了579秒,不过这是个单一线程运算。我们想知道并行运算结果。那么我们首先要把这个getIdsThenInsertAction转成一个并行运算函数FDAParTask:

AQMRPTStream.toPar(getIdsThenInsertAction)

FunDA提供了并行运算器fda_runPar:

      implicit val strategy = Strategy.fromCachedDaemonPool("cachedPool")
fda_runPar(AQMRPTStream.take().toPar(getIdsThenInsertAction))() //max 8 open computations
.appendTask(runInsertAction)
.startRun

我们可以自定义线程池。fda_runPar返回标准的FunDA FDAPipeLine,所以我们可以在后面挂上runInsertAction函数。下面是不同行数的运算时间对比结果:

    //processing 10000 rows in a single thread in 570 seconds
// processing 10000 rows parallelly in 316 seconds //processing 20000 rows in a single thread in 1090 seconds
//processing 20000 rows parallelly in 614 seconds //processing 100000 rows in a single thread in 2+ hrs
//processing 100000 rows parallelly in 3885 seconds

可以得出,并行运算对越大数据集有更大的效率提高。下面就是这次示范的源代码:

import slick.jdbc.meta._
import com.bayakala.funda._
import api._
import scala.language.implicitConversions
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
import scala.concurrent.{Await, Future}
import scala.util.{Failure, Success}
import slick.jdbc.H2Profile.api._
import Models._
import fs2.Strategy object ParallelTasks extends App { val db = Database.forConfig("h2db") //drop original table schema
val futVectorTables = db.run(MTable.getTables) val futDropTable = futVectorTables.flatMap{ tables => {
val tableNames = tables.map(t => t.name.name)
if (tableNames.contains(NORMAQMQuery.baseTableRow.tableName))
db.run(NORMAQMQuery.schema.drop)
else Future()
}
}.andThen {
case Success(_) => println(s"Table ${NORMAQMQuery.baseTableRow.tableName} dropped successfully! ")
case Failure(e) => println(s"Failed to drop Table ${NORMAQMQuery.baseTableRow.tableName}, it may not exist! Error: ${e.getMessage}")
}
Await.ready(futDropTable,Duration.Inf) //create new table to refine AQMRawTable
val actionCreateTable = Models.NORMAQMQuery.schema.create
val futCreateTable = db.run(actionCreateTable).andThen {
case Success(_) => println("Table created successfully!")
case Failure(e) => println(s"Table may exist already! Error: ${e.getMessage}")
}
//would carry on even fail to create table
Await.ready(futCreateTable,Duration.Inf) //truncate data, only available in slick 3.2.1
val futTruncateTable = futVectorTables.flatMap{ tables => {
val tableNames = tables.map(t => t.name.name)
if (tableNames.contains(NORMAQMQuery.baseTableRow.tableName))
db.run(NORMAQMQuery.schema.truncate)
else Future()
}
}.andThen {
case Success(_) => println(s"Table ${NORMAQMQuery.baseTableRow.tableName} truncated successfully!")
case Failure(e) => println(s"Failed to truncate Table ${NORMAQMQuery.baseTableRow.tableName}! Error: ${e.getMessage}")
}
Await.ready(futDropTable,Duration.Inf) //a conceived task for the purpose of resource consumption
//getting id with corresponding name from STATES table
def getStateID(state: String): Int = {
//create a stream for state id with state name
implicit def toState(row: StateTable#TableElementType) = StateModel(row.id,row.name)
val stateLoader = FDAViewLoader(slick.jdbc.H2Profile)(toState _)
val stateSeq = stateLoader.fda_typedRows(StateQuery.result)(db).toSeq
//constructed a Stream[Task,String]
val stateStream = fda_staticSource(stateSeq)()
var id = -
def getid: FDAUserTask[FDAROW] = row => {
row match {
case StateModel(stid,stname) => //target row type
if (stname.contains(state)) {
id = stid
fda_break //exit
}
else fda_skip //take next row
case _ => fda_skip
}
}
stateStream.appendTask(getid).startRun
id
}
//another conceived task for the purpose of resource consumption
//getting id with corresponding names from COUNTIES table
def getCountyID(state: String, county: String): Int = {
//create a stream for county id with state name and county name
implicit def toCounty(row: CountyTable#TableElementType) = CountyModel(row.id,row.name)
val countyLoader = FDAViewLoader(slick.jdbc.H2Profile)(toCounty _)
val countySeq = countyLoader.fda_typedRows(CountyQuery.result)(db).toSeq
//constructed a Stream[Task,String]
val countyStream = fda_staticSource(countySeq)()
var id = -
def getid: FDAUserTask[FDAROW] = row => {
row match {
case CountyModel(cid,cname) => //target row type
if (cname.contains(state) && cname.contains(county)) {
id = cid
fda_break //exit
}
else fda_skip //take next row
case _ => fda_skip
}
}
countyStream.appendTask(getid).startRun
id
} //original table listing
implicit def toAQMRPT(row: AQMRPTTable#TableElementType) =
AQMRPTModel(row.rid,row.mid,row.state,row.county,row.year,row.value,row.total,row.valid)
val AQMRPTLoader = FDAStreamLoader(slick.jdbc.H2Profile)(toAQMRPT _)
val AQMRPTStream = AQMRPTLoader.fda_typedStream(AQMRPTQuery.result)(db)(,)() def getIdsThenInsertAction: FDAUserTask[FDAROW] = row => {
row match {
case aqm: AQMRPTModel =>
if (aqm.valid) {
val stateId = getStateID(aqm.state)
val countyId = getCountyID(aqm.state,aqm.county)
val action = NORMAQMQuery += NORMAQMModel(,aqm.mid, stateId, countyId, aqm.year,aqm.value,aqm.total)
fda_next(FDAActionRow(action))
}
else fda_skip
case _ => fda_skip
}
}
val runner = FDAActionRunner(slick.jdbc.H2Profile)
def runInsertAction: FDAUserTask[FDAROW] = row =>
row match {
case FDAActionRow(action) =>
runner.fda_execAction(action)(db)
fda_skip
case _ => fda_skip
} val cnt_start = System.currentTimeMillis()
/*
AQMRPTStream.take()
.appendTask(getIdsThenInsertAction)
.appendTask(runInsertAction)
.startRun
//println(s"processing 10000 rows in a single thread in ${(System.currentTimeMillis - cnt_start)/1000} seconds")
//processing 10000 rows in a single thread in 570 seconds
//println(s"processing 20000 rows in a single thread in ${(System.currentTimeMillis - cnt_start)/1000} seconds")
//processing 20000 rows in a single thread in 1090 seconds
//println(s"processing 100000 rows in a single thread in ${(System.currentTimeMillis - cnt_start)/1000} seconds")
//processing 100000 rows in a single thread in 2+ hrs implicit val strategy = Strategy.fromCachedDaemonPool("cachedPool")
fda_runPar(AQMRPTStream.take().toPar(getIdsThenInsertAction))()
.appendTask(runInsertAction)
.startRun //println(s"processing 10000 rows parallelly in ${(System.currentTimeMillis - cnt_start)/1000} seconds")
// processing 10000 rows parallelly in 316 seconds
//println(s"processing 20000 rows parallelly in ${(System.currentTimeMillis - cnt_start)/1000} seconds")
//processing 20000 rows parallelly in 614 seconds
println(s"processing 100000 rows parallelly in ${(System.currentTimeMillis - cnt_start)/1000} seconds")
//processing 100000 rows parallelly in 3885 seconds }

FunDA(15)- 示范:任务并行运算 - user task parallel execution的更多相关文章

  1. Winform Global exception and task parallel library exception;

    static class Program { /// <summary> /// 应用程序的主入口点. /// </summary> [STAThread] static vo ...

  2. C#5.0之后推荐使用TPL(Task Parallel Libray 任务并行库) 和PLINQ(Parallel LINQ, 并行Linq). 其次是TAP(Task-based Asynchronous Pattern, 基于任务的异步模式)

    学习书籍: <C#本质论> 1--C#5.0之后推荐使用TPL(Task Parallel Libray 任务并行库) 和PLINQ(Parallel LINQ, 并行Linq). 其次是 ...

  3. Using the Task Parallel Library (TPL) for Events

    Using the Task Parallel Library (TPL) for Events The parallel tasks library was introduced with the ...

  4. TPL(Task Parallel Library)多线程、并发功能

    The Task Parallel Library (TPL) is a set of public types and APIs in the System.Threading and System ...

  5. Task Parallel Library01,基本用法

    我们知道,每个应用程序就是一个进程,一个进程有多个线程.Task Parallel Library为我们的异步编程.多线程编程提供了强有力的支持,它允许一个主线程运行的同时,另外的一些线程或Task也 ...

  6. System and method for parallel execution of memory transactions using multiple memory models, including SSO, TSO, PSO and RMO

    A data processor supports the use of multiple memory models by computer programs. At a device extern ...

  7. CMU Database Systems - Parallel Execution

    并发执行,主要为了增大吞吐,降低延迟,提高数据库的可用性 先区分一组概念,parallel和distributed的区别 总的来说,parallel是指在物理上很近的节点,比如本机的多个线程或进程,不 ...

  8. FunDA(14)- 示范:并行运算,并行数据库读取 - parallel data loading

    FunDA的并行数据库读取功能是指在多个线程中同时对多个独立的数据源进行读取.这些独立的数据源可以是在不同服务器上的数据库表,又或者把一个数据库表分成几个独立部分形成的独立数据源.当然,并行读取的最终 ...

  9. 异步和多线程,委托异步调用,Thread,ThreadPool,Task,Parallel,CancellationTokenSource

    1 进程-线程-多线程,同步和异步2 异步使用和回调3 异步参数4 异步等待5 异步返回值 5 多线程的特点:不卡主线程.速度快.无序性7 thread:线程等待,回调,前台线程/后台线程, 8 th ...

随机推荐

  1. Response.Redirect原理图解

  2. 使用dos 作为中介实现cpython 和c# 交互

    最近在使用python 处理一些图形的东西. 实现:对一些512 的图进行像素遍历RGBA 变量, 查询通道不是 255 255 255 颜色 的矩阵,进行切图到空白 之前使用c#进行 确实快10 倍 ...

  3. jQuery使用大全

    我的程序人生 提供基于Lesktop的IM二次开发,联系QQ:76159179 CnBlogs Home New Post Contact Admin Rss Posts - 476  Article ...

  4. hdu-1069(dp)

    题目链接:http://acm.hdu.edu.cn/showproblem.php?pid=1069 题意:一群猴子,给出n块砖的长x宽y高z,用这些砖拼起的高度最高是多少, 要求底下的砖的长宽都要 ...

  5. 手动安装jar到maven

    mvn install:install-file -DgroupId=com.chinacloud.mir.common -DartifactId=one-aa-sdk -Dversion=1.0-S ...

  6. IntelliJ IDEA 2017版 编译器使用学习笔记(一) (图文详尽版);IDE快捷键使用;IDE多行代码同时编写

    IntellJ是一款强大的编译器,那么它有很多实用的功能 一.鼠标点击减少效率,快捷键实现各种跳转 (1)项目之间的跳转 快捷键位置: 操作:首先要有两个项目,然后,在不同窗口打开:如图: 然后使用快 ...

  7. MySQL外键使用及说明详解

    一.外键约束 MySQL通过外键约束来保证表与表之间的数据的完整性和准确性. 外键的使用条件: 1.两个表必须是InnoDB表,MyISAM表暂时不支持外键(据说以后的版本有可能支持,但至少目前不支持 ...

  8. Sql Server R8 密码问题及5102错误

    登录的两种方式: 集成登录连接:con = new SqlConnection("server=.\\SQLEXPRESS;database=db_news;Trusted_Connecti ...

  9. NSDictionary NSMutableDictionary NSSet NSMutableSet

    //description只是返回了一个字符串 //    [person description]; //        //如果想要打印需要NSLog //    NSLog(@"%@& ...

  10. AugularJS, Responsive, phonegap, BAE, SAE,GAE, Paas

    http://freewind.me/blog/20121226/1167.html http://88250.b3log.org/bae-sae-gae http://www.ruanyifeng. ...