Scalaz（57）－ scalaz-stream: fs2-多线程编程，fs2 concurrency

fs2的多线程编程模式不但提供了无阻碍I/O（java nio）能力，更为并行运算提供了良好的编程工具。在进入并行运算讨论前我们先示范一下fs2 pipe2对象里的一些Stream合并功能。我们先设计两个帮助函数（helper）来跟踪运算及模拟运算环境：

   def log[A](prompt: String): Pipe[Task,A,A] = _.evalMap {a =>

     Task.delay { println(prompt + a); a}}         //> log: [A](prompt: String)fs2.Pipe[fs2.Task,A,A]

   Stream(,,).through(log(">")).run.unsafeRun   //> >1

                                                   //| >2

                                                   //| >3

log是个运算跟踪函数。

  implicit val strategy = Strategy.fromFixedDaemonPool()

   //> strategy  : fs2.Strategy = Strategy

  implicit val scheduler = Scheduler.fromFixedDaemonPool()

   //> scheduler  : fs2.Scheduler = Scheduler(java.util.concurrent.ScheduledThreadPoolExecutor@16022d9d[Running, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0])

   def randomDelay[A](max: FiniteDuration): Pipe[Task, A, A] = _.evalMap { a => {

     val delay: Task[Int] = Task.delay {

       scala.util.Random.nextInt(max.toMillis.toInt)

     }

     delay.flatMap { d => Task.now(a).schedule(d.millis) }

    }

   }      //> randomDelay: [A](max: scala.concurrent.duration.FiniteDuration)fs2.Pipe[fs2.Task,A,A]

   Stream(,,).through(randomDelay(.second)).through(log("delayed>")).run.unsafeRun

                                                   //> delayed>1

                                                   //| delayed>2

                                                   //| delayed>3

randomDelay是一个模拟任意延迟运算环境的函数。我们也可以在连接randomDelay前后进行跟踪：

 Stream(,,).through(log("befor delay>"))

                .through(randomDelay(.second))

                .through(log("after delay>")).run.unsafeRun

                                                   //> befor delay>1

                                                   //| after delay>1

                                                   //| befor delay>2

                                                   //| after delay>2

                                                   //| befor delay>3

                                                   //| after delay>3

值得注意的是randomDelay并不会阻碍（block）当前运算。

下面我们来看看pipe2对象里的合并函数interleave：

 val sa = Stream(,,).through(randomDelay(.second)).through(log("A>"))

         //> sa  : fs2.Stream[fs2.Task,Int] = Segment(Emit(Chunk(1, 2, 3))).flatMap(<function1>).flatMap(<function1>)

 val sb = Stream(,,).through(randomDelay(.second)).through(log("B>"))

         //> sb  : fs2.Stream[fs2.Task,Int] = Segment(Emit(Chunk(1, 2, 3))).flatMap(<function1>).flatMap(<function1>)

 (sa interleave sb).through(log("AB")).run.unsafeRun

                                                   //> A>1

                                                   //| B>1

                                                   //| AB>1

                                                   //| AB>1

                                                   //| A>2

                                                   //| B>2

                                                   //| AB>2

                                                   //| AB>2

                                                   //| A>3

                                                   //| B>3

                                                   //| AB>3

                                                   //| AB>3

我们看到合并后的数据发送必须等待sa,sb完成了元素发送之后。这是一种固定顺序的合并操作。merge是一种不定顺序的合并方式，我们看看它的使用示范：

 (sa merge sb).through(log("AB>")).run.unsafeRun   //> B>1

                                                   //| AB>1

                                                   //| B>2

                                                   //| AB>2

                                                   //| B>3

                                                   //| AB>3

                                                   //| A>1

                                                   //| AB>1

                                                   //| A>2

                                                   //| AB>2

                                                   //| A>3

                                                   //| AB>3

我们看到merge不会同时等待sa,sb完成后再发送结果，只要其中一个完成发送就开始发送结果了。换言之merge合并基本上是跟着跑的快的那个，所以结果顺序是不规则不可确定的（nondeterministic）。那么从运算时间上来讲：interleave合并所花费时间就是确定的sa+sb，而merge则选sa,sb之间最快的时间。当然总体运算所需时间是相当的，但在merge时我们可以对发出的元素进行并行运算，能大大缩短运算时间。用merge其中一个问题是我们无法确定当前的元素是从那里发出的，我们可以用either来解决这个问题：

 (sa either sb).through(log("AB>")).run.unsafeRun  //> A>1

                                                   //| AB>Left(1)

                                                   //| B>1

                                                   //| AB>Right(1)

                                                   //| A>2

                                                   //| AB>Left(2)

                                                   //| B>2

                                                   //| AB>Right(2)

                                                   //| B>3

                                                   //| AB>Right(3)

                                                   //| A>3

                                                   //| AB>Left(3)

我们通过left,right分辨数据源头。如果再增多一个Stream源头，我们还是可以用merge来合并三个Stream：

 val sc = Stream.range(,).through(randomDelay(.second)).through(log("C>"))

     //> sc  : fs2.Stream[fs2.Task,Int] = Segment(Emit(Chunk(()))).flatMap(<function1>).flatMap(<function1>).flatMap(<function1>)

 ((sa merge sb) merge sc).through(log("ABC>")).run.unsafeRun

                                                   //> B>1

                                                   //| ABC>1

                                                   //| C>1

                                                   //| ABC>1

                                                   //| A>1

                                                   //| ABC>1

                                                   //| B>2

                                                   //| ABC>2

                                                   //| A>2

                                                   //| ABC>2

                                                   //| B>3

                                                   //| ABC>3

                                                   //| C>2

                                                   //| ABC>2

                                                   //| A>3

                                                   //| ABC>3

                                                   //| C>3

                                                   //| ABC>3

                                                   //| C>4

                                                   //| ABC>4

                                                   //| C>5

                                                   //| ABC>5

                                                   //| C>6

                                                   //| ABC>6

                                                   //| C>7

                                                   //| ABC>7

                                                   //| C>8

                                                   //| ABC>8

                                                   //| C>9

                                                   //| ABC>9

如果我们无法确定数据源头数量的话，那么我们可以用以下的类型款式来表示：

Stream[Task,Stream[Task,A]]

这个类型代表的是Stream of Streams。在外部的Stream里包含了不确定数量的Streams。用具体的例子可以解释：外部的Stream代表客端数据连接（connection），内部的Stream代表每个客端读取的数据。把上面的三个Stream用这种类型来表示的话：

 val streams:Stream[Task,Stream[Task,Int]] = Stream(sa,sb,sc)

      //> streams  : fs2.Stream[fs2.Task,fs2.Stream[fs2.Task,Int]] = Segment(Emit(Chunk(Segment(Emit(Chunk(1, 2, 3))).flatMap(<function1>).flatMap(<function1>),Segment(Emit(Chunk(1, 2, 3))).flatMap(<function1>).flatMap(<function1>), S

 egment(Emit(Chunk(()))).flatMap(<function1>).flatMap(<function1>).flatMap(<function1>))))

现在我们不但需要对内部Stream进行运算还需要把结果打平成Stream[Task,A]。在fs2.concurrent包里就有这样一个组件（combinator）：

  def join[F[_],O](maxOpen: Int)(outer: Stream[F,Stream[F,O]])(implicit F: Async[F]): Stream[F,O] = {...}

输入参数outer和运算结果类型都对得上。maxOpen代表最多并行运算数。我们可以用join运算上面合并sa,sb,sc的例子：

 val ms = concurrent.join()(streams)              //> ms  : fs2.Stream[fs2.Task,Int] = attemptEval(Task).flatMap(<function1>).flatMap(<function1>)

 ms.through(log("ABC>")).run.unsafeRun             //> C>1

                                                   //| ABC>1

                                                   //| A>1

                                                   //| ABC>1

                                                   //| C>2

                                                   //| ABC>2

                                                   //| B>1

                                                   //| ABC>1

                                                   //| C>3

                                                   //| ABC>3

                                                   //| A>2

                                                   //| ABC>2

                                                   //| B>2

                                                   //| ABC>2

                                                   //| C>4

                                                   //| ABC>4

                                                   //| A>3

                                                   //| ABC>3

                                                   //| B>3

                                                   //| ABC>3

                                                   //| C>5

                                                   //| ABC>5

                                                   //| C>6

                                                   //| ABC>6

                                                   //| C>7

                                                   //| ABC>7

                                                   //| C>8

                                                   //| ABC>8

                                                   //| C>9

                                                   //| ABC>9

结果就是我们预料的。上面提到过maxOpen是最大并行运算数。我们用另一个例子来观察：

 val rangedStreams = Stream.range(,).map {id =>

       Stream.range(,).through(randomDelay(.second)).through(log((('A'+id).toChar).toString +">")) }

       //> rangedStreams  : fs2.Stream[Nothing,fs2.Stream[fs2.Task,Int]] = Segment(Emit(Chunk(()))).flatMap(<function1>).mapChunks(<function1>)

 concurrent.join()(rangedStreams).run.unsafeRun   //> B>1

                                                   //| A>1

                                                   //| C>1

                                                   //| B>2

                                                   //| C>2

                                                   //| A>2

                                                   //| B>3

                                                   //| C>3

                                                   //| C>4

                                                   //| D>1

                                                   //| A>3

                                                   //| A>4

                                                   //| B>4

                                                   //| E>1

                                                   //| E>2

                                                   //| E>3

                                                   //| D>2

                                                   //| D>3

                                                   //| E>4

                                                   //| D>4

可以看到一共只有三个运算过程同时存在，如：ABC, ED...

当我们的程序需要与外界程序交互时，可能会以下面的几种形式进行：

1、产生副作用的运算是同步运行的。这种情况最容易处理，因为直接可以获取结果

2、产生副作用的运算是异步的：通过调用一次callback函数来提供运算结果

3、产生副作用的运算是异步的，但结果必须通过多次调用callback函数来分批提供

下面我们就一种一种情况来分析：

1、同步运算最容易处理：我们只需要把运算包嵌在Stream.eval里就行了：

 def destroyUniverse: Unit = println("BOOOOM!!!")  //> destroyUniverse: => Unit

 val s = Stream.eval_(Task.delay(destroyUniverse)) ++ Stream("...move on")

     //> s  : fs2.Stream[fs2.Task,String] = append(attemptEval(Task).flatMap(<function1>).flatMap(<function1>), Segment(Emit(Chunk(()))).flatMap(<function1>))

 s.runLog.unsafeRun                        //> BOOOOM!!!

                                           //| res8: Vector[String] = Vector(...move on)

2、第二种情况：fs2里的Async trait有个async是用来登记callback函数的：

trait Async[F[_]] extends Effect[F] { self =>

/**

   Create an `F[A]` from an asynchronous computation, which takes the form

   of a function with which we can register a callback. This can be used

   to translate from a callback-based API to a straightforward monadic

   version.

   */

  def async[A](register: (Either[Throwable,A] => Unit) => F[Unit]): F[A] =

    bind(ref[A]) { ref =>

    bind(register { e => runSet(ref)(e) }) { _ => get(ref) }}

...

我们用一个实际的例子来做示范，假设我们有一个callback函数readBytes：

 trait Connection {

   def readBytes(onSuccess: Array[Byte] => Unit, onFailure: Throwable => Unit): Unit

这个Connection就是一个交互界面（interface）。假设它是这样实现实例化的：

 val conn = new Connection {

   def readBytes(onSuccess: Array[Byte] => Unit, onFailure: Throwable => Unit): Unit = {

     Thread.sleep()

     onSuccess(Array(,,,,))

   }

 }  //> conn  : demo.ws.fs2Concurrent.connection = demo.ws.fs2Concurrent$$anonfun$main$1$$anon$1@4c40b76e

我们可以用async登记（register）这个callback函数，把它变成纯代码可组合的（monadic）组件Task[Array[Byte]]：

 val bytes = T.async[Array[Byte]] { (cb: Either[Throwable,Array[Byte]] => Unit) => {

    Task.delay { conn.readBytes (

      ready => cb(Right(ready)),

      fail => cb(Left(fail))

    ) }

 }}             //> bytes  : fs2.Task[Array[Byte]] = Task

这样我们才能用Stream.eval来运算bytes：

 Stream.eval(bytes).map(_.toList).runLog.unsafeRun //> res9: Vector[List[Byte]] = Vector(List(1, 2, 3, 4, 5))

这种只调用一次callback函数的情况也比较容易处理：当我们来不及处理数据时停止读取就是了。如果需要多次调用callback，比如外部程序也是一个Stream API：一旦数据准备好就调用一次callback进行传送。这种情况下可能出现我们的程序来不及处理收到的数据的状况。我们可以用fs2.async包提供的queue来解决这个问题：

 import fs2.async

   import fs2.util.Async

   type Row = List[String]

   // defined type alias Row

   trait CSVHandle {

     def withRows(cb: Either[Throwable,Row] => Unit): Unit

   }

   // defined trait CSVHandle

   def rows[F[_]](h: CSVHandle)(implicit F: Async[F]): Stream[F,Row] =

     for {

       q <- Stream.eval(async.unboundedQueue[F,Either[Throwable,Row]])

       _ <- Stream.suspend { h.withRows { e => F.unsafeRunAsync(q.enqueue1(e))(_ => ()) }; Stream.emit(()) }

       row <- q.dequeue through pipe.rethrow

     } yield row

   // rows: [F[_]](h: CSVHandle)(implicit F: fs2.util.Async[F])fs2.Stream[F,Row]

enqueue1和dequeue在Queue trait里是这样定义的：

/**

 * Asynchronous queue interface. Operations are all nonblocking in their

 * implementations, but may be 'semantically' blocking. For instance,

 * a queue may have a bound on its size, in which case enqueuing may

 * block until there is an offsetting dequeue.

 */

trait Queue[F[_],A] {

/**

   * Enqueues one element in this `Queue`.

   * If the queue is `full` this waits until queue is empty.

   *

   * This completes after `a`  has been successfully enqueued to this `Queue`

   */

  def enqueue1(a: A): F[Unit]

/** Repeatedly call `dequeue1` forever. */

  def dequeue: Stream[F, A] = Stream.repeatEval(dequeue1)

  /** Dequeue one `A` from this queue. Completes once one is ready. */

  def dequeue1: F[A]

...

我们用enqueue1把一次callback调用存入queue。dequeue的运算结果是Stream[F,Row]，所以我们用dequeue运算存在queue里的任务取出数据。

fs2提供了signal,queue,semaphore等数据类型。下面是一些使用示范：async.signal

 Stream.eval(async.signalOf[Task,Int]()).flatMap {s =>

     val monitor: Stream[Task,Nothing] =

       s.discrete.through(log("s updated>")).drain

     val data: Stream[Task,Int] =

       Stream.range(,).through(randomDelay(.second))

     val writer: Stream[Task,Unit] =

       data.evalMap {d => s.set(d)}

     monitor merge writer

   }.run.unsafeRun                                 //> s updated>0

                                                   //| s updated>10

                                                   //| s updated>11

                                                   //| s updated>12

                                                   //| s updated>13

                                                   //| s updated>14

                                                   //| s updated>15

async.queue使用示范:

 Stream.eval(async.boundedQueue[Task,Int]()).flatMap {q =>

     val monitor: Stream[Task,Nothing] =

       q.dequeue.through(log("dequeued>")).drain

     val data: Stream[Task,Int] =

       Stream.range(,).through(randomDelay(.second))

     val writer: Stream[Task,Unit] =

       data.to(q.enqueue)

     monitor mergeHaltBoth  writer

   }.run.unsafeRun                                 //> dequeued>10

                                                   //| dequeued>11

                                                   //| dequeued>12

                                                   //| dequeued>13

                                                   //| dequeued>14

                                                   //| dequeued>15

fs2还在time包里提供了一些定时自动产生数据的函数和类型。我们用一些代码来示范它们的用法：

 time.awakeEvery[Task](.second)

    .through(log("time:"))

    .take().run.unsafeRun                         //> time:1002983266 nanoseconds

                                                   //| time:2005972864 nanoseconds

                                                   //| time:3004831159 nanoseconds

                                                   //| time:4002104307 nanoseconds

                                                   //| time:5005091850 nanoseconds

awakeEvery产生的是一个无穷数据流，所以我们用take(5)来取前5个元素。我们也可以让它运算5秒钟：

  val tick = time.awakeEvery[Task](.second).through(log("time:"))

     //> tick  : fs2.Stream[fs2.Task,scala.concurrent.duration.FiniteDuration] = Segment(Emit(Chunk(()))).flatMap(<function1>).flatMap(<function1>).flatMap(<function1>)

  tick.run.unsafeRunFor(.seconds)                 //> time:1005685270 nanoseconds

                                                   //| time:2004331473 nanoseconds

                                                   //| time:3005046945 nanoseconds

                                                   //| time:4002795227 nanoseconds

                                                   //| time:5002807816 nanoseconds

                                                   //| java.util.concurrent.TimeoutException

如果我们希望避免TimeoutException，可以用Task.schedule:

 val tick = time.awakeEvery[Task](.second).through(log("time:"))

    //> tick  : fs2.Stream[fs2.Task,scala.concurrent.duration.FiniteDuration] = Seg

 ment(Emit(Chunk(()))).flatMap(<function1>).flatMap(<function1>).flatMap(<function1>)

  tick.interruptWhen(Stream.eval(Task.schedule(true,.seconds)))

       .run.unsafeRun                              //> time:1004963839 nanoseconds

                                                   //| time:2005325025 nanoseconds

                                                   //| time:3005238921 nanoseconds

                                                   //| time:4004240985 nanoseconds

                                                   //| time:5001334732 nanoseconds

                                                   //| time:6003586673 nanoseconds

                                                   //| time:7004728267 nanoseconds

                                                   //| time:8004333608 nanoseconds

                                                   //| time:9003907670 nanoseconds

                                                   //| time:10002624970 nanoseconds

最直接的方法是用fs2的tim.sleep：

  (time.sleep[Task](.seconds) ++ Stream.emit(true)).runLog.unsafeRun

                                                   //> res14: Vector[Boolean] = Vector(true)

  tick.interruptWhen(time.sleep[Task](.seconds) ++ Stream.emit(true))

     .run.unsafeRun                                //> time:1002078506 nanoseconds

                                                   //| time:2005144318 nanoseconds

                                                   //| time:3004049135 nanoseconds

                                                   //| time:4002963861 nanoseconds

                                                   //| time:5000088103 nanoseconds

Scalaz（57）－ scalaz-stream: fs2-多线程编程，fs2 concurrency的更多相关文章

Scalaz（54）－ scalaz-stream: 函数式多线程编程模式－Free Streaming Programming Model
长久以来,函数式编程模式都被认为是一种学术研究用或教学实验用的编程模式.直到近几年由于大数据和多核CPU的兴起造成了函数式编程模式在一些实际大型应用中的出现,这才逐渐改变了人们对函数式编程无用论的观点 ...
Scalaz（45）－ concurrency ：Task－函数式多线程编程核心配件
我们在上一节讨论了scalaz Future,我们说它是一个不完善的类型,最起码没有完整的异常处理机制,只能用在构建类库之类的内部环境.如果scalaz在Future类定义中增加异常处理工具的话,用户 ...
Scalaz（22）－泛函编程思维： Coerce Monadic Thinking
马上进入新的一年2016了,来点轻松点的内容吧.前面写过一篇关于用Reader实现依赖注入管理的博文(Scalaz(16)- Monad:依赖注入-Dependency Injection By Re ...
Java—多线程编程
一个多线程程序包含两个或多个能并发运行的部分.程序的每一部分都称作一个线程,并且每个线程定义了一个独立的执行路径. 进程:一个进程包括由操作系统分配的内存空间,包含一个或多个线程.一个线程不能独立的存 ...
转载自~浮云比翼:Step by Step：Linux C多线程编程入门(基本API及多线程的同步与互斥)
Step by Step:Linux C多线程编程入门(基本API及多线程的同步与互斥) 介绍:什么是线程,线程的优点是什么线程在Unix系统下,通常被称为轻量级的进程,线程虽然不是进程,但却可 ...
【转】Windows的多线程编程，C/C++
在Windows的多线程编程中,创建线程的函数主要有CreateThread和_beginthread(及_beginthreadex). CreateThread 和 ExitThread 使 ...
iOS多线程编程Part 2/3 - NSOperation
多线程编程Part 1介绍了NSThread以及NSRunLoop,这篇Blog介绍另一种并发编程技术:NSOPeration. NSOperation & NSOperationQueue ...
linux多线程编程（转）
原文地址:http://www.cnblogs.com/BiffoLee/archive/2011/11/18/2254540.html 1.Linux“线程” 进程与线程之间是有区别的,不过Linu ...
Java多线程编程核心技术
Java多线程编程核心技术这本书有利于对Java多线程API的理解,但不容易从中总结规律. JDK文档 1. Thread类部分源码: public class Thread implements ...
Linux多线程编程详细解析----条件变量 pthread_cond_t
Linux操作系统下的多线程编程详细解析----条件变量 1.初始化条件变量pthread_cond_init #include <pthread.h> int pthread_cond_ ...

随机推荐

Ubuntu 16 安装ElasticSearch
首先安装Java,参见博客:http://www.cnblogs.com/1zhk/p/6056406.html 下载ElasticSearch安装包 curl -L -O https://artif ...
C语言 · 特殊回文数
问题描述 123321是一个非常特殊的数,它从左边读和从右边读是一样的. 输入一个正整数n, 编程求所有这样的五位和六位十进制数,满足各位数字之和等于n . 输入格式输入一行,包含一个正整数n. 输 ...
EasyUI刚加载时候Window窗体自动弹出的解决办法
进程管理三大扩展工具htop
三大进程管理监控工具 HTOP 介绍: Htop是一款运行于Linux系统监控与进程管理软件,htop提供所有进程的列表,并且使用彩色标识出处理器.swap和内存状态.用户一般可以在top无法提供详尽 ...
File类使用小结
一.构造函数 File(String pathname):根据参数转换为抽象路径名创建File实例 File(String parent,String filename):根据parent和filen ...
理解javascript中的浏览器窗口——窗口基本操作
× 目录 [1]窗口位置 [2]窗口大小 [3]打开窗口[4]关闭窗口前面的话 BOM全称是brower object model(浏览器对象模型),主要用于管理窗口及窗口间的通讯,其核心对象是wi ...
使用RequireJs和Bootstrap模态框实现表单提交
下面我将使用requirejs结合模态框实现三五行代码部署表单提交操作. 传统开发思路如下:
iOS开发之版本控制（SVN）
版本控制对于团队合作显得尤为重要,那么如何在iOS开发中进行版本控制呢?在今天的博客中将会介绍如何在MAC下配置SVN服务器,如何导入我们的工程,如何在Xcode中进行工程的checkOut和Comm ...
Ubuntu杂记——Ubuntu下Eclipse安装Maven问题
转:在线安装maven插件问题:Cannot complete the install because one or more required items could not be found. 使 ...
geotrellis使用（十）缓冲区分析以及多种类型要素栅格化
目录前言缓冲区分析多种类型要素栅格化总结参考链接一.前言上两篇文章介绍了如何使用Geotrellis进行矢量数据栅格化以及栅格渲染,本文主要介绍栅格化过程中常用到的缓冲区分 ...

Scalaz（57）－ scalaz-stream: fs2-多线程编程，fs2 concurrency

Scalaz（57）－ scalaz-stream: fs2-多线程编程，fs2 concurrency的更多相关文章

随机推荐

热门专题