Akka(22): Stream:实时操控:动态管道连接-MergeHub,BroadcastHub and PartitionHub
在现实中我们会经常遇到这样的场景:有一个固定的数据源Source,我们希望按照程序运行状态来接驳任意数量的下游接收方subscriber、又或者我需要在程序运行时(runtime)把多个数据流向某个固定的数据流终端Sink推送。这就涉及到动态连接合并型Merge或扩散型Broadcast的数据流连接点junction。从akka-stream的技术文档得知:一对多,多对一或多对多类型的复杂数据流组件必须用GraphDSL来设计,产生Graph类型结果。前面我们提到过:Graph就是一种运算预案,要求所有的运算环节都必须是预先明确指定的,如此应该是无法实现动态的管道连接的。但akka-stream提供了MergeHub,BroadcastHub和PartitionHub来支持这样的功能需求。
1、MergeHub:多对一合并类型。支持动态的多个上游publisher连接
2、BroadcastHub:一对多扩散类型。支持动态的多个下游subscriber连接
3、PartitionHub:实际上是一对多扩散类型。通过一个函数来选择数据派送目的地
MergeHub对象中有个source函数:
/**
* Creates a [[Source]] that emits elements merged from a dynamic set of producers. After the [[Source]] returned
* by this method is materialized, it returns a [[Sink]] as a materialized value. This [[Sink]] can be materialized
* arbitrary many times and each of the materializations will feed the elements into the original [[Source]].
*
* Every new materialization of the [[Source]] results in a new, independent hub, which materializes to its own
* [[Sink]] for feeding that materialization.
*
* If one of the inputs fails the [[Sink]], the [[Source]] is failed in turn (possibly jumping over already buffered
* elements). Completed [[Sink]]s are simply removed. Once the [[Source]] is cancelled, the Hub is considered closed
* and any new producers using the [[Sink]] will be cancelled.
*
* @param perProducerBufferSize Buffer space used per producer. Default value is 16.
*/
def source[T](perProducerBufferSize: Int): Source[T, Sink[T, NotUsed]] =
Source.fromGraph(new MergeHub[T](perProducerBufferSize))
MergeHub.source函数的返回结果类型是Source[T,Sink[T,NotUsed]],本质上MergeHub就是一个共用的Sink,如下所示:
val fixedSink = Sink.foreach(println)
val sinkGraph: RunnableGraph[Sink[Any,NotUsed]] = MergeHub.source(perProducerBufferSize = ).to(fixedSink)
val inGate: Sink[Any,NotUsed] = sinkGraph.run() //common input //now connect any number of source
val (killSwitch,_) = (Source(Stream.from()).delay(.second,DelayOverflowStrategy.backpressure)
.viaMat(KillSwitches.single)(Keep.right).toMat(inGate)(Keep.both)).run() val (killSwitch2,_) = (Source(List("a","b","c","d","e")).delay(.second,DelayOverflowStrategy.backpressure)
.viaMat(KillSwitches.single)(Keep.right).toMat(inGate)(Keep.both)).run() val (killSwitch3,_) = (Source(List("AA","BB","CC","DD","EE")).delay(.second,DelayOverflowStrategy.backpressure)
.viaMat(KillSwitches.single)(Keep.right).toMat(inGate)(Keep.both)).run() scala.io.StdIn.readLine()
killSwitch.shutdown()
killSwitch2.shutdown()
killSwitch3.shutdown()
actorSys.terminate()
同样,BroadcastHub就是一种共用的Source,可以连接任何数量的下游subscriber。下面是BroadcastHub.sink的定义:
/**
* Creates a [[Sink]] that receives elements from its upstream producer and broadcasts them to a dynamic set
* of consumers. After the [[Sink]] returned by this method is materialized, it returns a [[Source]] as materialized
* value. This [[Source]] can be materialized an arbitrary number of times and each materialization will receive the
* broadcast elements from the original [[Sink]].
*
* Every new materialization of the [[Sink]] results in a new, independent hub, which materializes to its own
* [[Source]] for consuming the [[Sink]] of that materialization.
*
* If the original [[Sink]] is failed, then the failure is immediately propagated to all of its materialized
* [[Source]]s (possibly jumping over already buffered elements). If the original [[Sink]] is completed, then
* all corresponding [[Source]]s are completed. Both failure and normal completion is "remembered" and later
* materializations of the [[Source]] will see the same (failure or completion) state. [[Source]]s that are
* cancelled are simply removed from the dynamic set of consumers.
*
* @param bufferSize Buffer size used by the producer. Gives an upper bound on how "far" from each other two
* concurrent consumers can be in terms of element. If this buffer is full, the producer
* is backpressured. Must be a power of two and less than 4096.
*/
def sink[T](bufferSize: Int): Sink[T, Source[T, NotUsed]] = Sink.fromGraph(new BroadcastHub[T](bufferSize))
BroadcastHub.sink返回结果类型:Sink[T,Source[T,NotUsed]],就是个可连接任何数量下游的共用Source:
val killAll = KillSwitches.shared("terminator")
val fixedSource=Source(Stream.from()).delay(.second,DelayOverflowStrategy.backpressure)
val sourceGraph = fixedSource.via(killAll.flow).toMat(BroadcastHub.sink(bufferSize = ))(Keep.right).async
val outPort = sourceGraph.run() //shared source
//now connect any number of sink to outPort
outPort.to(Sink.foreach{c =>println(s"A: $c")}).run()
outPort.to(Sink.foreach{c =>println(s"B: $c")}).run()
outPort.to(Sink.foreach{c =>println(s"C: $c")}).run()
还有一种做法是把MergeHub和BroadcastHub背对背连接起来形成一种多对多的形状。理论上应该能作为一种集散中心容许连接任何数量的上游publisher和下游subscriber。我们先把它们连接起来获得一个Sink和一个Source:
val (sink, source) = MergeHub.source[Int](perProducerBufferSize = )
.toMat(BroadcastHub.sink(bufferSize = ))(Keep.both).run()
理论上我们现在可以对sink和source进行任意连接了。但有个特殊情况是:当下游没有任何subscriber时上游所有producer都无法发送任何数据。这是由于backpressure造成的:作为一个合成的节点,下游速率跟不上则通过backpressure制约上游数据发布。我们可以安装一个泄洪机制来保证上游publisher数据推送的正常进行:
source.runWith(Sink.ignore)
这样在没有任何下游subscriber的情况下,上游producer还是能够正常运作。
现在我们可以用Flow.fromSinkAndSource(sink, source)来构建一个Flow[I,O,?]:
def fromSinkAndSource[I, O](sink: Graph[SinkShape[I], _], source: Graph[SourceShape[O], _]): Flow[I, O, NotUsed] =
fromSinkAndSourceMat(sink, source)(Keep.none)
我们还可以把上篇提到的KillSwitches.singleBidi用上:
val channel: Flow[Int, Int, UniqueKillSwitch] =
Flow.fromSinkAndSource(sink, source)
.joinMat(KillSwitches.singleBidi[Int, Int])(Keep.right)
.backpressureTimeout(.seconds)
上面backpressureTimeout保证了任何下游subscriber阻塞超时的话都会被强力终止。如下:
/**
* If the time between the emission of an element and the following downstream demand exceeds the provided timeout,
* the stream is failed with a [[scala.concurrent.TimeoutException]]. The timeout is checked periodically,
* so the resolution of the check is one period (equals to timeout value).
*
* '''Emits when''' upstream emits an element
*
* '''Backpressures when''' downstream backpressures
*
* '''Completes when''' upstream completes or fails if timeout elapses between element emission and downstream demand.
*
* '''Cancels when''' downstream cancels
*/
def backpressureTimeout(timeout: FiniteDuration): Repr[Out] = via(new Timers.BackpressureTimeout[Out](timeout))
好了,下面我们可以把channel当作Flow来使用了:
val killChannel1 = fixedSource.viaMat(channel)(Keep.right).to(fixedSink).run()
val killChannel2 = Source.repeat()
.delay(.second,DelayOverflowStrategy.backpressure)
.viaMat(channel)(Keep.right).to(fixedSink).run()
上面我们提到:PartitionHub就是一种特殊的BroadcastHub。功能是扩散型的。不过PartitionHub用了一个函数来选择下游的subscriber。从PartitionHub.sink函数款式可以看出:
def sink[T](partitioner: (Int, T) ⇒ Int, startAfterNrOfConsumers: Int,
bufferSize: Int = defaultBufferSize): Sink[T, Source[T, NotUsed]] =
statefulSink(() ⇒ (info, elem) ⇒ info.consumerIdByIdx(partitioner(info.size, elem)), startAfterNrOfConsumers, bufferSize)
可以看出:partitioner函数就是一种典型的状态转换函数款式,实际上sink调用了statefulSink方法并固定了partitioner函数:
* This `statefulSink` should be used when there is a need to keep mutable state in the partition function,
* e.g. for implemening round-robin or sticky session kind of routing. If state is not needed the [[#sink]] can
* be more convenient to use.
*
* @param partitioner Function that decides where to route an element. It is a factory of a function to
* to be able to hold stateful variables that are unique for each materialization. The function
* takes two parameters; the first is information about active consumers, including an array of consumer
* identifiers and the second is the stream element. The function should return the selected consumer
* identifier for the given element. The function will never be called when there are no active consumers,
* i.e. there is always at least one element in the array of identifiers.
* @param startAfterNrOfConsumers Elements are buffered until this number of consumers have been connected.
* This is only used initially when the stage is starting up, i.e. it is not honored when consumers have
* been removed (canceled).
* @param bufferSize Total number of elements that can be buffered. If this buffer is full, the producer
* is backpressured.
*/
@ApiMayChange def statefulSink[T](partitioner: () ⇒ (ConsumerInfo, T) ⇒ Long, startAfterNrOfConsumers: Int,
bufferSize: Int = defaultBufferSize): Sink[T, Source[T, NotUsed]] =
Sink.fromGraph(new PartitionHub[T](partitioner, startAfterNrOfConsumers, bufferSize))
与BroadcastHub相同,我们首先构建一个共用的数据源producer,然后连接PartitionHub形成一个通往下游终端的通道让任何下游subscriber可以连接这个通道:
//interupted temination
val killAll = KillSwitches.shared("terminator")
//fix a producer
val fixedSource = Source.tick(.second, .second, "message")
.zipWith(Source( to ))((a, b) => s"$a-$b")
//connect to PartitionHub which uses function to select sink
val sourceGraph = fixedSource.via(killAll.flow).toMat(PartitionHub.sink(
(size, elem) => math.abs(elem.hashCode) % size,
startAfterNrOfConsumers = , bufferSize = ))(Keep.right)
//materialize the source
val fromSource = sourceGraph.run()
//connect to fixedSource freely
fromSource.runForeach(msg => println("subs1: " + msg))
fromSource.runForeach(msg => println("subs2: " + msg)) scala.io.StdIn.readLine()
killAll.shutdown()
actorSys.terminate()
可以看到:上游数据流向多个下游中哪个subscriber是通过partitioner函数选定的。从这项功能来讲:PartitionHub又是某种路由Router。下面的例子实现了仿Router的RoundRobin推送策略:
//partitioner function
def roundRobin(): (PartitionHub.ConsumerInfo, String) ⇒ Long = {
var i = -1L (info, elem) => {
i +=
info.consumerIdByIdx((i % info.size).toInt)
}
}
val roundRobinGraph = fixedSource.via(killAll.flow).toMat(PartitionHub.statefulSink(
() => roundRobin(),startAfterNrOfConsumers = ,bufferSize = )
)(Keep.right)
val roundRobinSource = roundRobinGraph.run() roundRobinSource.runForeach(msg => println("roundRobin1: " + msg))
roundRobinSource.runForeach(msg => println("roundRobin2: " + msg))
上面例子里数据源流动方向是由roundRobin函数确定的。
而在下面这个例子里数据流向速率最快的subscriber:
val producer = Source( until ) // ConsumerInfo.queueSize is the approximate number of buffered elements for a consumer.
// Note that this is a moving target since the elements are consumed concurrently.
val runnableGraph: RunnableGraph[Source[Int, NotUsed]] =
producer.via(killAll.flow).toMat(PartitionHub.statefulSink(
() => (info, elem) ⇒ info.consumerIds.minBy(id ⇒ info.queueSize(id)),
startAfterNrOfConsumers = , bufferSize = ))(Keep.right) val fromProducer: Source[Int, NotUsed] = runnableGraph.run() fromProducer.runForeach(msg => println("fast1: " + msg))
fromProducer.throttle(, .millis, , ThrottleMode.Shaping)
.runForeach(msg => println("fast2: " + msg))
上面这个例子里partitioner函数是根据众下游的缓冲数量(queueSize)来确定数据应该流向哪个subscriber,queueSize数值越大则表示速率越慢。
下面是以上示范中MergeHub及BroadcastHub示范的源代码:
import akka.NotUsed
import akka.stream.scaladsl._
import akka.stream._
import akka.actor._ import scala.concurrent.duration._
object HubsDemo extends App {
implicit val actorSys = ActorSystem("sys")
implicit val ec = actorSys.dispatcher
implicit val mat = ActorMaterializer(
ActorMaterializerSettings(actorSys)
.withInputBuffer(,)
) val fixedSink = Sink.foreach(println)
val sinkGraph: RunnableGraph[Sink[Any,NotUsed]] = MergeHub.source(perProducerBufferSize = ).to(fixedSink).async
val inGate: Sink[Any,NotUsed] = sinkGraph.run() //common input //now connect any number of source
val (killSwitch,_) = (Source(Stream.from()).delay(.second,DelayOverflowStrategy.backpressure)
.viaMat(KillSwitches.single)(Keep.right).toMat(inGate)(Keep.both)).run() val (killSwitch2,_) = (Source(List("a","b","c","d","e")).delay(.second,DelayOverflowStrategy.backpressure)
.viaMat(KillSwitches.single)(Keep.right).toMat(inGate)(Keep.both)).run() val (killSwitch3,_) = (Source(List("AA","BB","CC","DD","EE")).delay(.second,DelayOverflowStrategy.backpressure)
.viaMat(KillSwitches.single)(Keep.right).toMat(inGate)(Keep.both)).run() val killAll = KillSwitches.shared("terminator")
val fixedSource=Source(Stream.from()).delay(.second,DelayOverflowStrategy.backpressure)
val sourceGraph = fixedSource.via(killAll.flow).toMat(BroadcastHub.sink(bufferSize = ))(Keep.right).async
val outPort = sourceGraph.run() //shared source
//now connect any number of sink to outPort
outPort.to(Sink.foreach{c =>println(s"A: $c")}).run()
outPort.to(Sink.foreach{c =>println(s"B: $c")}).run()
outPort.to(Sink.foreach{c =>println(s"C: $c")}).run() val (sink, source) = MergeHub.source[Int](perProducerBufferSize = )
.toMat(BroadcastHub.sink(bufferSize = ))(Keep.both).run() source.runWith(Sink.ignore) val channel: Flow[Int, Int, UniqueKillSwitch] =
Flow.fromSinkAndSource(sink, source)
.joinMat(KillSwitches.singleBidi[Int, Int])(Keep.right)
.backpressureTimeout(.seconds) val killChannel1 = fixedSource.viaMat(channel)(Keep.right).to(fixedSink).run()
val killChannel2 = Source.repeat()
.delay(.second,DelayOverflowStrategy.backpressure)
.viaMat(channel)(Keep.right).to(fixedSink).run() scala.io.StdIn.readLine()
killSwitch.shutdown()
killSwitch2.shutdown()
killSwitch3.shutdown()
killAll.shutdown()
killChannel1.shutdown()
killChannel2.shutdown()
scala.io.StdIn.readLine()
actorSys.terminate() }
下面是PartitionHub示范源代码:
import akka.NotUsed
import akka.stream.scaladsl._
import akka.stream._
import akka.actor._ import scala.concurrent.duration._
object PartitionHubDemo extends App {
implicit val actorSys = ActorSystem("sys")
implicit val ec = actorSys.dispatcher
implicit val mat = ActorMaterializer(
ActorMaterializerSettings(actorSys)
.withInputBuffer(,)
) //interupted temination
val killAll = KillSwitches.shared("terminator")
//fix a producer
val fixedSource = Source.tick(.second, .second, "message")
.zipWith(Source( to ))((a, b) => s"$a-$b")
//connect to PartitionHub which uses function to select sink
val sourceGraph = fixedSource.via(killAll.flow).toMat(PartitionHub.sink(
(size, elem) => math.abs(elem.hashCode) % size,
startAfterNrOfConsumers = , bufferSize = ))(Keep.right)
//materialize the source
val fromSource = sourceGraph.run()
//connect to fixedSource freely
fromSource.runForeach(msg => println("subs1: " + msg))
fromSource.runForeach(msg => println("subs2: " + msg)) //partitioner function
def roundRobin(): (PartitionHub.ConsumerInfo, String) ⇒ Long = {
var i = -1L (info, elem) => {
i +=
info.consumerIdByIdx((i % info.size).toInt)
}
}
val roundRobinGraph = fixedSource.via(killAll.flow).toMat(PartitionHub.statefulSink(
() => roundRobin(),startAfterNrOfConsumers = ,bufferSize = )
)(Keep.right)
val roundRobinSource = roundRobinGraph.run() roundRobinSource.runForeach(msg => println("roundRobin1: " + msg))
roundRobinSource.runForeach(msg => println("roundRobin2: " + msg)) val producer = Source( until ) // ConsumerInfo.queueSize is the approximate number of buffered elements for a consumer.
// Note that this is a moving target since the elements are consumed concurrently.
val runnableGraph: RunnableGraph[Source[Int, NotUsed]] =
producer.via(killAll.flow).toMat(PartitionHub.statefulSink(
() => (info, elem) ⇒ info.consumerIds.minBy(id ⇒ info.queueSize(id)),
startAfterNrOfConsumers = , bufferSize = ))(Keep.right) val fromProducer: Source[Int, NotUsed] = runnableGraph.run() fromProducer.runForeach(msg => println("fast1: " + msg))
fromProducer.throttle(, .millis, , ThrottleMode.Shaping)
.runForeach(msg => println("fast2: " + msg)) scala.io.StdIn.readLine()
killAll.shutdown()
actorSys.terminate() }
Akka(22): Stream:实时操控:动态管道连接-MergeHub,BroadcastHub and PartitionHub的更多相关文章
- 《Entity Framework 6 Recipes》中文翻译系列 (38) ------ 第七章 使用对象服务之动态创建连接字符串和从数据库读取模型
翻译的初衷以及为什么选择<Entity Framework 6 Recipes>来学习,请看本系列开篇 第七章 使用对象服务 本章篇幅适中,对真实应用中的常见问题提供了切实可行的解决方案. ...
- [杂]SQL Server 之命名管道连接
命名管道是通过进程间通信(IPC)机制实现通信.具体来说,命名管道建立在服务器的IPC$共享基础上,通过IPC$共享来进行通信. SQL Server命名管道 SQL Server 首先在服务器上创建 ...
- BZOJ4006 JLOI2015 管道连接(斯坦纳树生成森林)
4006: [JLOI2015]管道连接 Time Limit: 30 Sec Memory Limit: 128 MB Description 小铭铭最近进入了某情报部门,该部门正在被如何建立安全的 ...
- BZOJ_4006_[JLOI2015]管道连接_斯坦纳树
BZOJ_4006_[JLOI2015]管道连接_斯坦纳树 题意: 小铭铭最近进入了某情报部门,该部门正在被如何建立安全的通道连接困扰. 该部门有 n 个情报站,用 1 到 n 的整数编号.给出 m ...
- 「JLOI2015」管道连接 解题报告
「JLOI2015」管道连接 先按照斯坦纳树求一个 然后合并成斯坦纳森林 直接枚举树的集合再dp一下就好了 Code: #include <cstdio> #include <cct ...
- [BZOJ4006][JLOI2015]管道连接 状压dp+斯坦纳树
4006: [JLOI2015]管道连接 Time Limit: 30 Sec Memory Limit: 128 MBSubmit: 1020 Solved: 552[Submit][Statu ...
- [bzoj4006][JLOI2015]管道连接_斯坦纳树_状压dp
管道连接 bzoj-4006 JLOI-2015 题目大意:给定一张$n$个节点$m$条边的带边权无向图.并且给定$p$个重要节点,每个重要节点都有一个颜色.求一个边权和最小的边集使得颜色相同的重要节 ...
- Redis05——Redis高级运用(管道连接,发布订阅,布隆过滤器)
Redis高级运用 一.管道连接redis(一次发送多个命令,节省往返时间) 1.安装nc yum install nc -y 2.通过nc连接redis nc localhost 6379 3.通过 ...
- VMware:未能将管道连接到虚拟机, 所有的管道范例都在使用中
问题描述:虚拟机下的Ubuntu系统长时间死机无法正常关机,用Windows任务管理器关闭VMware也关不掉,没办法,只能直接关电脑了...重新打开电脑,启动VMware,发现提示客户机已经处于打开 ...
随机推荐
- 对称加密详解,以及JAVA简单实现
(原) 常用的加密有3种 1.正向加密,如MD5,加密后密文固定,目前还没办法破解,但是可以能过数据库撞库有一定概率找到,不过现在一般用这种方式加密都会加上盐值. 2.对称加密,通过一个固定的对称密钥 ...
- Java并发编程笔记——技术点汇总
目录 · 线程安全 · 线程安全的实现方法 · 互斥同步 · 非阻塞同步 · 无同步 · volatile关键字 · 线程间通信 · Object.wait()方法 · Object.notify() ...
- jmeter之beanshell提取json数据
Jmeter BeanShell PostProcessor提取json数据 假设现有需求: 提取sample返回json数据中所有name字段对应的值,返回的json格式如下: {“body”:{“ ...
- LuaFramework热更新过程(及可更新的loading界面实现)
1.名词解释: 资源包:点击 LuaFramework | Build XXX(平台名) Resource,框架会自动将自定义指定的资源打包到StreamingAssets文件夹,这个 ...
- 初始jvm(一)---jvm内存区域与溢出
jvm内存区域与溢出 为什么学习jvm 木板原理,最短的一块板决定一个水的深度,当一个系统垃圾收集成为瓶颈的时候,那么就需要你对jvm的了解掌握. 当一个系统出现内存溢出,内存泄露的时候,因为你懂jv ...
- (转)Linux下查看文件和文件夹大小 删除日志
场景:在sts中执行自动部署时候maven提示No space left on device错误,后来经检查发现是磁盘空间满了,用下面的方法分析发现tomcat下面的logs目录占用了很大的空间,删除 ...
- java 线程之concurrent中的常用工具 CyclicBarrier
一.CyclicBarrier CyclicBarrier是一个同步辅助类,它允许一组线程互相等待,直到到达某个公共屏障点 (common barrier point).在涉及一组固定大小的线程的程序 ...
- TypeScript 异步代码类型技巧
在typescript下编写异步代码,会遇到难以自动识别异步返回值类型的情况,本文介绍一些技巧,以辅助编写更健全的异步代码. callback 以读取文件为例: readFile是一个异步函数,包含p ...
- java Socket(详解)转载
在客户/服务器通信模式中, 客户端需要主动创建与服务器连接的 Socket(套接字), 服务器端收到了客户端的连接请求, 也会创建与客户连接的 Socket. Socket可看做是通信连接两端的收发器 ...
- Alluxio 1.5集群搭建
一.依赖文件安装 1.1 JDK 参见博文:http://www.cnblogs.com/liugh/p/6623530.html 二.文件准备 2.1 文件名称 alluxio-1.5.0-hado ...