FunDA（6）－ Reactive Streams：Play with Iteratees、Enumerator and Enumeratees

在上一节我们介绍了Iteratee。它的功能是消耗从一些数据源推送过来的数据元素，不同的数据消耗方式代表了不同功能的Iteratee。所谓的数据源就是我们这节要讨论的Enumerator。Enumerator是一种数据源：它会根据下游数据消耗方（Iteratee）的具体状态主动向下推送数据元素。我们已经讨论过Iteratee的状态Step类型：

trait Step[E,+A]

case class Done[+A,E](a: A, remain: Input[E]) extends Step[E,A]

case class Cont[E,+A](k: Input[E] => InputStreamHandler[E,A]) extends Step[E,A]

case class Error[E](msg: String, loc:Input[E]) extends Step[E,Nothing]

这其中Iteratee通过Cont状态通知Enumerator可以发送数据元素，并提供了k函数作为Enumerator的数据推送函数。Enumerator推送的数据元素，也就是Iteratee的输入Input[E]，除单纯数据元素之外还代表着数据源状态：

trait Input[+E]

case class EL[E](e: E) extends Input[E]

case object EOF extends Input[Nothing]

case object Empty extends Input[Nothing]

Enumerator通过Input[E]来通知Iteratee当前数据源状态，如：是否已经完成所有数据推送（EOF），或者当前推送了什么数据元素（El[E](e:E)）。Enumerator主动向Iteratee输出数据然后返回新状态的Iteratee。我们可以从Enumerator的类型款式看得出：

trait Enumerator[E] {

  /**

   * Apply this Enumerator to an Iteratee

   */

  def apply[A](i: Iteratee[E, A]): Future[Iteratee[E, A]]

}

这个Future的目的主要是为了避免占用线程。实际上我们可以最终通过调用Iteratee的fold函数来实现Enumerator功能，如：

 /**

   * Creates an enumerator which produces the one supplied

   * input and nothing else. This enumerator will NOT

   * automatically produce Input.EOF after the given input.

   */

  def enumInput[E](e: Input[E]) = new Enumerator[E] {

    def apply[A](i: Iteratee[E, A]): Future[Iteratee[E, A]] =

      i.fold {

        case Step.Cont(k) => eagerFuture(k(e))

        case _ => Future.successful(i)

      }(dec)

  }

又或者通过构建器（constructor, apply）来构建Eumerator：

/**

   * Create an Enumerator from a set of values

   *

   * Example:

   * {{{

   *   val enumerator: Enumerator[String] = Enumerator("kiki", "foo", "bar")

   * }}}

   */

  def apply[E](in: E*): Enumerator[E] = in.length match {

    case  => Enumerator.empty

    case  => new Enumerator[E] {

      def apply[A](i: Iteratee[E, A]): Future[Iteratee[E, A]] = i.pureFoldNoEC {

        case Step.Cont(k) => k(Input.El(in.head))

        case _ => i

      }

    }

    case _ => new Enumerator[E] {

      def apply[A](i: Iteratee[E, A]): Future[Iteratee[E, A]] = enumerateSeq(in, i)

    }

  }

  /**

   * Create an Enumerator from any TraversableOnce like collection of elements.

   *

   * Example of an iterator of lines of a file :

   * {{{

   *  val enumerator: Enumerator[String] = Enumerator( scala.io.Source.fromFile("myfile.txt").getLines )

   * }}}

   */

  def enumerate[E](traversable: TraversableOnce[E])(implicit ctx: scala.concurrent.ExecutionContext): Enumerator[E] = {

    val it = traversable.toIterator

    Enumerator.unfoldM[scala.collection.Iterator[E], E](it: scala.collection.Iterator[E])({ currentIt =>

      if (currentIt.hasNext)

        Future[Option[(scala.collection.Iterator[E], E)]]({

          val next = currentIt.next

          Some((currentIt -> next))

        })(ctx)

      else

        Future.successful[Option[(scala.collection.Iterator[E], E)]]({

          None

        })

    })(dec)

  }

  /**

   * An empty enumerator

   */

  def empty[E]: Enumerator[E] = new Enumerator[E] {

    def apply[A](i: Iteratee[E, A]) = Future.successful(i)

  }

  private def enumerateSeq[E, A]: (Seq[E], Iteratee[E, A]) => Future[Iteratee[E, A]] = { (l, i) =>

    l.foldLeft(Future.successful(i))((i, e) =>

      i.flatMap(it => it.pureFold {

        case Step.Cont(k) => k(Input.El(e))

        case _ => it

      }(dec))(dec))

  }

下面是个直接构建Enumerator的例子：

 val enumUsers: Enumerator[String] = {

   Enumerator("Tiger","Hover","Grand","John")

       //> enumUsers  : play.api.libs.iteratee.Enumerator[String] = play.api.libs.iteratee.Enumerator$$anon$19@2ef9b8bc

在这个例子里的Enumerator就是用上面那个apply构建的。我们把enumUsers连接到costume Iteratee：

 val consume = Iteratee.consume[String]()        //> consume  : play.api.libs.iteratee.Iteratee[String,String] = Cont(<function1>)

 val consumeUsers = enumUsers.apply(consume)      //> consumeUsers  : scala.concurrent.Future[play.api.libs.iteratee.Iteratee[String,String]] = Success(play.api.libs.iteratee.FutureIteratee@1dfe2924)

我们是用apply(consume)来连接Enumerator和Iteratees的。apply函数的定义如下：

/**

   * Attaches this Enumerator to an [[play.api.libs.iteratee.Iteratee]], driving the

   * Iteratee to (asynchronously) consume the input. The Iteratee may enter its

   * [[play.api.libs.iteratee.Done]] or [[play.api.libs.iteratee.Error]]

   * state, or it may be left in a [[play.api.libs.iteratee.Cont]] state (allowing it

   * to consume more input after that sent by the enumerator).

   *

   * If the Iteratee reaches a [[play.api.libs.iteratee.Done]] state, it will

   * contain a computed result and the remaining (unconsumed) input.

   */

  def apply[A](i: Iteratee[E, A]): Future[Iteratee[E, A]]

这是个抽象函数。举个例实现这个apply函数的例子：

/**

   * Creates an enumerator which produces the one supplied

   * input and nothing else. This enumerator will NOT

   * automatically produce Input.EOF after the given input.

   */

  def enumInput[E](e: Input[E]) = new Enumerator[E] {

    def apply[A](i: Iteratee[E, A]): Future[Iteratee[E, A]] =

      i.fold {

        case Step.Cont(k) => eagerFuture(k(e))

        case _ => Future.successful(i)

      }(dec)

  }

consumeUsers: Future[Iteratee[String,String]]，我们用Future的函数来显示发送数据内容：

 val futPrint = consumeUsers.flatMap { i => i.run }.map(println)

    //> futPrint  : scala.concurrent.Future[Unit] = List()

 Await.ready(futPrint,Duration.Inf)     //> TigerHoverGrandJohn res0: demo.worksheet.enumerator.futPrint.type = Success(())

另一种更直接的方式：

val futUsers = Iteratee.flatten(consumeUsers).run.map(println)

      //> futUsers  : scala.concurrent.Future[Unit] = List()

 Await.ready(futPrint,Duration.Inf)

      //> TigerHoverGrandJohnres1: demo.worksheet.enumerator.futPrint.type = Success(())

我们也可以使用函数符号 |>> ：

 val futPrintUsers = {

  Iteratee.flatten(enumUsers |>> consume).run.map(println)

     //> futPrintUsers  : scala.concurrent.Future[Unit] = List()

 }

 Await.ready(futPrintUsers,Duration.Inf)

     //> TigerHoverGrandJohn res2: demo.worksheet.enumerator.futPrintUsers.type = Success(())

我们还可以把两个Enumerator串联起来向一个Iteratee发送数据：

 val futEnums = {

   Iteratee.flatten {

     (enumUsers >>> enumColors) |>> consume

   }.run.map(println)                       //> futEnums  : scala.concurrent.Future[Unit] = List()

 }

  Await.ready(futEnums,Duration.Inf)

      //> TigerHoverGrandJohnRedWhiteBlueYellow res3: demo.worksheet.enumerator.futEnums.type = Success(())

当然，最实用的应该是把InputStream的数据推送给一个Iteratee，如把一个文件内容发送给Iteratee：

/**

   * Create an enumerator from the given input stream.

   *

   * Note that this enumerator will block when it reads from the file.

   *

   * @param file The file to create the enumerator from.

   * @param chunkSize The size of chunks to read from the file.

   */

  def fromFile(file: java.io.File, chunkSize: Int =  * )(implicit ec: ExecutionContext): Enumerator[Array[Byte]] = {

    fromStream(new java.io.FileInputStream(file), chunkSize)(ec)

  }

/**

   * Create an enumerator from the given input stream.

   *

   * This enumerator will block on reading the input stream, in the supplied ExecutionContext.  Care must therefore

   * be taken to ensure that this isn't a slow stream.  If using this with slow input streams, make sure the

   * ExecutionContext is appropriately configured to handle the blocking.

   *

   * @param input The input stream

   * @param chunkSize The size of chunks to read from the stream.

   * @param ec The ExecutionContext to execute blocking code.

   */

  def fromStream(input: java.io.InputStream, chunkSize: Int =  * )(implicit ec: ExecutionContext): Enumerator[Array[Byte]] = {

    implicit val pec = ec.prepare()

    generateM({

      val buffer = new Array[Byte](chunkSize)

      val bytesRead = blocking { input.read(buffer) }

      val chunk = bytesRead match {

        case - => None

        case `chunkSize` => Some(buffer)

        case read =>

          val input = new Array[Byte](read)

          System.arraycopy(buffer, , input, , read)

          Some(input)

      }

      Future.successful(chunk)

    })(pec).onDoneEnumerating(input.close)(pec)

  }

这项功能的核心函数是这个generateM，它的函数款式如下：

/**

   * Like [[play.api.libs.iteratee.Enumerator.repeatM]], but the callback returns an Option, which allows the stream

   * to be eventually terminated by returning None.

   *

   * @param e The input function.  Returns a future eventually redeemed with Some value if there is input to pass, or a

   *          future eventually redeemed with None if the end of the stream has been reached.

   */

  def generateM[E](e: => Future[Option[E]])(implicit ec: ExecutionContext): Enumerator[E] = checkContinue0(new TreatCont0[E] {

    private val pec = ec.prepare()

    def apply[A](loop: Iteratee[E, A] => Future[Iteratee[E, A]], k: Input[E] => Iteratee[E, A]) = executeFuture(e)(pec).flatMap {

      case Some(e) => loop(k(Input.El(e)))

      case None => Future.successful(Cont(k))

    }(dec)

  })

checkContinue0函数是这样定义的：

trait TreatCont0[E] {

    def apply[A](loop: Iteratee[E, A] => Future[Iteratee[E, A]], k: Input[E] => Iteratee[E, A]): Future[Iteratee[E, A]]

  }

  def checkContinue0[E](inner: TreatCont0[E]) = new Enumerator[E] {

    def apply[A](it: Iteratee[E, A]): Future[Iteratee[E, A]] = {

      def step(it: Iteratee[E, A]): Future[Iteratee[E, A]] = it.fold {

        case Step.Done(a, e) => Future.successful(Done(a, e))

        case Step.Cont(k) => inner[A](step, k)

        case Step.Error(msg, e) => Future.successful(Error(msg, e))

      }(dec)

      step(it)

    }

  }

从这段代码 case Step.Cont(k)=>inner[A](step, k)可以推断操作模式应该是当下游Iteratee在Cont状态下不断递归式调用Cont函数k向下推送数据e。我们再仔细看看generateM的函数款式；

 def generateM[E](e: => Future[Option[E]])(implicit ec: ExecutionContext): Enumerator[E]

实际上刚才的操作就是重复调用这个e:=>Future[Option[E]]函数。再分析fromStream代码：

  def fromStream(input: java.io.InputStream, chunkSize: Int =  * )(implicit ec: ExecutionContext): Enumerator[Array[Byte]] = {

    implicit val pec = ec.prepare()

    generateM({

      val buffer = new Array[Byte](chunkSize)

      val bytesRead = blocking { input.read(buffer) }

      val chunk = bytesRead match {

        case - => None

        case `chunkSize` => Some(buffer)

        case read =>

          val input = new Array[Byte](read)

          System.arraycopy(buffer, , input, , read)

          Some(input)

      }

      Future.successful(chunk)

    })(pec).onDoneEnumerating(input.close)(pec)

  }

我们看到传入generateM的参数是一段代码，在Iteratee状态为Cont时会不断重复运行，也就是说这段代码会逐次从输入源中读取chunkSize个Byte。这种做法是典型的Streaming方式，避免了一次性上载所有数据。下面是一个文件读取Enumerator例子：

 import java.io._

 val fileEnum: Enumerator[Array[Byte]] = {

  Enumerator.fromFile(new File("/users/tiger/lines.txt"))

 }

 val futFile = Iteratee.flatten { fileEnum |>> consume }.run.map(println)

注意：fileEnum |>> consume并不能通过编译，这是因为fileEnum是个Enumerator[Array[Byte]]，而consume是个Iteratee[String,String]，Array[Byte]与String类型不符。我们可以用个Enumeratee来进行相关的转换。下面就介绍一下Enumeratee的功能。

Enumeratee其实是一种转换器。它把Enumerator产生的数据转换成能适配Iteratee的数据类型，或者Iteratee所需要的数据。比如我们想把一串字符类的数字汇总相加时，首先必须把字符转换成数字类型才能进行Iteratee的汇总操作：

val strNums = Enumerator("","","")            //> strNums  : play.api.libs.iteratee.Enumerator[String] = play.api.libs.iteratee.Enumerator$$anon$19@36b4cef0

 val sumIteratee: Iteratee[Int,Int] = Iteratee.fold()((s,i) => s+i)

                                                 //> sumIteratee  : play.api.libs.iteratee.Iteratee[Int,Int] = Cont(<function1>)

 val strToInt: Enumeratee[String,Int] = Enumeratee.map {s => s.toInt}

                                                 //> strToInt  : play.api.libs.iteratee.Enumeratee[String,Int] = play.api.libs.iteratee.Enumeratee$$anon$38$$anon$1@371a67ec

 strNums |>> strToInt.transform(sumIteratee)     //> res4: scala.concurrent.Future[play.api.libs.iteratee.Iteratee[String,Int]] = List()

 strNums |>> strToInt &>> sumIteratee            //> res5: scala.concurrent.Future[play.api.libs.iteratee.Iteratee[String,Int]] = List()

 strNums.through(strToInt) |>> sumIteratee       //> res6: scala.concurrent.Future[play.api.libs.iteratee.Iteratee[Int,Int]] = List()

 val futsum = Iteratee.flatten(strNums &> strToInt |>> sumIteratee).run.map(println)

                                                //> futsum  : scala.concurrent.Future[Unit] = List()

 Await.ready(futsum,Duration.Inf)               //> 6

                                                //| res7: demo.worksheet.enumerator.futsum.type = Success(())

在上面这个例子里Enumerator数据元素是String, Iteratee操作数据类型是Int, strToInt是个把String转换成Int的Enumeratee，我们用了几种转换方式的表达形式，结果都是一样的，等于6。我们可以用Enumerator.through或者Enumeratee.transform来连接Enumerator与Iteratee。当然，我们也可以筛选输入Iteratee的数据：

val sum2 = strNums &> Enumeratee.take() &> strToInt |>> sumIteratee

                 //> sum2  : scala.concurrent.Future[play.api.libs.iteratee.Iteratee[Int,Int]] =List()

 val futsum2 = Iteratee.flatten(sum2).run.map(println)

                                                  //> futsum2  : scala.concurrent.Future[Unit] = List()

 Await.ready(futsum2,Duration.Inf)                //> 3

                                                  //| res8: demo.worksheet.enumerator.futsum2.type = Success(())

上面例子里的Enumeratee.take(2)就是一个数据处理的Enumeratee。

现在Enumerator+Enumeratee+Iteratee从功能上越来越像fs2了，当然了，Iteratee就是一个流工具库。我们已经选择了fs2，因为它可以支持灵活的并行运算，所以再深入讨论Iteratee就没什么意义了。

FunDA（6）－ Reactive Streams：Play with Iteratees、Enumerator and Enumeratees的更多相关文章

FunDA（5）－ Reactive Streams：Play with Iteratees
FunDA的设计目标就是把后台数据库中的数据搬到内存里,然后进行包括并行运算的数据处理,最后可能再对后台数据库进行更新.如果需要把数据搬到内存的话,那我们就必须考虑内存是否能一次性容纳所有的数据,有必 ...
FunDA（7）－ Reactive Streams to fs2 Pull Streams
Reactive-Stream不只是简单的push-model-stream, 它还带有“拖式”(pull-model)性质.这是因为在Iteratee模式里虽然理论上由Enumerator负责主动推 ...
FunDA（9）－ Stream Source：reactive data streams
上篇我们讨论了静态数据源(Static Source, snapshot).这种方式只能在预知数据规模有限的情况下使用,对于超大型的数据库表也可以说是不安全的资源使用方式.Slick3.x已经增加了支 ...
FunDA（8）－ Static Source：保证资源使用安全 - Resource Safety
我们在前面用了许多章节来讨论如何把数据从后台数据库中搬到内存,然后进行逐行操作运算.我们选定的解决方案是把后台数据转换成内存中的数据流.无论在打开数据库表或从数据库读取数据等环节都涉及到对数据库表这项 ...
FunDA（4）－数据流内容控制：Stream data element control
上节我们探讨了通过scalaz-stream-fs2来驱动一套数据处理流程,用fs2的Pipe类型来实现对数据流的逐行操作.本篇讨论准备在上节讨论的基础上对数据流的流动和元素操作进行优化完善.如数据流 ...
FunDA（14）－示范：并行运算，并行数据库读取 - parallel data loading
FunDA的并行数据库读取功能是指在多个线程中同时对多个独立的数据源进行读取.这些独立的数据源可以是在不同服务器上的数据库表,又或者把一个数据库表分成几个独立部分形成的独立数据源.当然,并行读取的最终 ...
FunDA（2）－ Streaming Data Operation：流式数据操作
在上一集的讨论里我们介绍并实现了强类型返回结果行.使用强类型主要的目的是当我们把后端数据库SQL批次操作搬到内存里转变成数据流式按行操作时能更方便.准确.高效地选定数据字段.在上集讨论示范里我们用集合 ...
FunDA（3）－流动数据行操作：FDAPipeLine operations using scalaz-stream-fs2
在上节讨论里我们介绍了数据行流式操作的设想,主要目的是把后台数据库的数据载入前端内存再拆分为强类型的数据行,这样我们可以对每行数据进行使用和处理.形象点描述就是对内存里的一个数据流(data-stre ...
FunDA（17）－示范：异常处理与事后处理 - Exceptions handling and Finalizers
作为一个能安全运行的工具库,为了保证占用资源的安全性,对异常处理(exception handling)和事后处理(final clean-up)的支持是不可或缺的.FunDA的数据流FDAPipeL ...

随机推荐

OpenGL.ProjectiveTextureMapping
1. 简介 https://developer.nvidia.com/content/projective-texture-mapping
Codeforces 658A. Robbers' watch 模拟
A. Robbers' watch time limit per test: 2 seconds memory limit per test: 256 megabytes input: standar ...
Debian9开机运行Python脚本
吾星喵关注 2018.04.14 15:30 字数 214 阅读 202评论 0喜欢 1 Debian9开机运行Python脚本 Debian 9.x "stretch" 解决 ...
android 混淆文件proguard.cfg详解（转载）
-injars androidtest.jar[jar包所在地址] -outjars out[输出地址] -libraryjars 'D:\android-sdk-windows\platf ...
a标签的四个伪类
A标签的css样式 CSS为一些特殊效果准备了特定的工具,我们称之为“伪类”.其中有几项是我们经常用到的,下面我们就详细介绍一下经常用于定义链接样式的四个伪类,它们分别是: :link :v ...
2018.08.17 洛谷P3135 [USACO16JAN]堡哞（前缀和处理）
传送门有趣的前缀和. 数据范围中的n≤200" role="presentation" style="position: relative;"> ...
微信第三方平台解密报错：Illegal key size
今天在交接别人代码的时候遇到的,微信第三方平台解密报的错误,原因: 如果密钥大于128, 会抛出java.security.InvalidKeyException: Illegal key size ...
Django入门与实践-第17章：保护视图(完结)
http://127.0.0.1:8000/boards/1/ #boards/views.py from django.contrib.auth.decorators import login_re ...
arduino IO口
AVR单片机的每组I/O口都配备有三个8位寄存器,分别是:方向控制寄存器DDRx.数据寄存器PORTx.输入引脚寄存器PINx(x=A/B/C/D).I/O口的工作方式和表现特征由这三个I/O寄存器控 ...
HDU 2844 Coins （多重背包问题DP）
题意:给定n种硬币,每种价值是a,数量是c,让你求不大于给定V的不同的价值数,就是说让你用这些硬币来组成多少种不同的价格,并且价格不大于V. 析:一看就应该知道是一个动态规划的背包问题,只不过是变形, ...

FunDA（6）－ Reactive Streams：Play with Iteratees、Enumerator and Enumeratees

FunDA（6）－ Reactive Streams：Play with Iteratees、Enumerator and Enumeratees的更多相关文章

随机推荐

热门专题