Akka源码分析-Persistence-AtLeastOnceDelivery

　　使用过akka的应该都知道，默认情况下，消息是按照最多一次发送的，也就是tell函数会尽量把消息发送出去，如果发送失败，不会重发。但有些业务场景，消息的发送需要满足最少一次，也就是至少要成功发送一次。akka在Persistence的基础之上提供了at-least-once传递的语法。

　　简单来说akka中的at-least-once机制，会在规定时间内等待消息接收成功的确认消息。如果收到，则发送成功；否则，尝试重发；超过重试次数则不再重发。

　　其实如果不看akka的源码，让我们自己来实现至少一次的语法，实现基本功能也非常简单。首先，我们会在内存中保存消息ID与消息的映射列表，如果指定时间内收到确认消息，则从列表中移除；否则进行重发，到达指定次数之后将为确认的消息从列表移除。重发通过定时器来实现，也就是在固定时间间隔发送心跳消息，发送哪些已经发送且规定时间内没有收到确认消息的消息。当然了，这个映射列表不能无限增长，要不然内存就爆了，所以这应该是一个固定长度的列表。当然了，我们还可以把未确认消息保存到redis等第三方缓存中去，来避免OOM和actor重启后未确认消息列表丢失的问题。

　　那akka是如何实现的呢？既然at-least-once是在Persistence章节出现，所以未确认消息会通过Persistence机制来持久化喽？

　　在阅读akka的at-least-once源码之前，有些概念和结论需要说明一下，具体可参见官网，但为了更好的阅读源码，这里再啰嗦几句。

发送方actor会在内存中保存为确认消息的列表，为了防止actor重启导致内存数据丢失，需要手动调用持久化相关的函数，来保存内存数据。AtLeastOnceDelivery只提供接口，不负责自动持久化。
AtLeastOnceDelivery通过deliver方法发送消息，开发者必须在手动通过confirmDelivery来确认消息已经收到。
每个消息都必须有一个消息ID，这个消息ID由AtLeastOnceDelivery生成。
AtLeastOnceDelivery给ActorSelectioin发送消息，这就意味着是通过ActorPath给actor发消息。如果某个actor被开发者stop只有，重新actorOf创建，可能会收到上一个实例的消息。这一点开发者需要特别注意。
由于未确认消息保存在内存，可能造成OOM，且在超过内存列表长度的时候，投递消息时会发生异常，需要开发者自行处理该异常。
消息发送不再保证顺序，因为消息可能重发，会打乱顺序。

　　首先来看下官网的demo。

case class Msg(deliveryId: Long, s: String)

case class Confirm(deliveryId: Long)

sealed trait Evt

case class MsgSent(s: String) extends Evt

case class MsgConfirmed(deliveryId: Long) extends Evt

class MyPersistentActor(destination: ActorSelection)

  extends PersistentActor with AtLeastOnceDelivery {

  override def persistenceId: String = "persistence-id"

  override def receiveCommand: Receive = {

    case s: String           ⇒ persist(MsgSent(s))(updateState)

    case Confirm(deliveryId) ⇒ persist(MsgConfirmed(deliveryId))(updateState)

  }

  override def receiveRecover: Receive = {

    case evt: Evt ⇒ updateState(evt)

  }

  def updateState(evt: Evt): Unit = evt match {

    case MsgSent(s) ⇒

      deliver(destination)(deliveryId ⇒ Msg(deliveryId, s))

    case MsgConfirmed(deliveryId) ⇒ confirmDelivery(deliveryId)

  }

}

class MyDestination extends Actor {

  def receive = {

    case Msg(deliveryId, s) ⇒

      // ...

      sender() ! Confirm(deliveryId)

  }

}

　　demo很简单，我们需要关注它都继承了哪些接口：PersistentActor、AtLeastOnceDelivery。之前说过，至少一次是基于持久化的，所以能不能只继承AtLeastOnceDelivery呢？能！AtLeastOnceDelivery本身就已经继承PersistentActor了。

/**

* Scala API: Mix-in this trait with your `PersistentActor` to send messages with at-least-once

* delivery semantics to destinations. It takes care of re-sending messages when they

* have not been confirmed within a configurable timeout. Use the [[AtLeastOnceDeliveryLike#deliver]] method to

* send a message to a destination. Call the [[AtLeastOnceDeliveryLike#confirmDelivery]] method when the destination

* has replied with a confirmation message.

* At-least-once delivery implies that original message send order is not always retained

* and the destination may receive duplicate messages due to possible resends.

* The interval between redelivery attempts can be defined by [[AtLeastOnceDeliveryLike#redeliverInterval]].

* After a number of delivery attempts a [[AtLeastOnceDelivery.UnconfirmedWarning]] message

* will be sent to `self`. The re-sending will still continue, but you can choose to call

* [[AtLeastOnceDeliveryLike#confirmDelivery]] to cancel the re-sending.

* The `AtLeastOnceDelivery` trait has a state consisting of unconfirmed messages and a

* sequence number. It does not store this state itself. You must persist events corresponding

* to the `deliver` and `confirmDelivery` invocations from your `PersistentActor` so that the

* state can be restored by calling the same methods during the recovery phase of the

* `PersistentActor`. Sometimes these events can be derived from other business level events,

* and sometimes you must create separate events. During recovery calls to `deliver`

* will not send out the message, but it will be sent later if no matching `confirmDelivery`

* was performed.

* Support for snapshots is provided by [[AtLeastOnceDeliveryLike#getDeliverySnapshot]] and [[AtLeastOnceDeliveryLike#setDeliverySnapshot]].

* The `AtLeastOnceDeliverySnapshot` contains the full delivery state, including unconfirmed messages.

* If you need a custom snapshot for other parts of the actor state you must also include the

* `AtLeastOnceDeliverySnapshot`. It is serialized using protobuf with the ordinary Akka

* serialization mechanism. It is easiest to include the bytes of the `AtLeastOnceDeliverySnapshot`

* as a blob in your custom snapshot.

* @see [[AtLeastOnceDeliveryLike]]

* @see [[AbstractPersistentActorWithAtLeastOnceDelivery]] for Java API

trait AtLeastOnceDelivery extends PersistentActor with AtLeastOnceDeliveryLike

　　请大家仔细、认真的阅读官方源码注释，它说明了AtLeastOnceDelivery这个接口几个非常重要的概念和细节，当然了，前面我们也已经提前说过了。

　　AtLeastOnceDelivery源码很多，从头分析有点费事，简单起见，还是从官方demo用法入手。demo中发消息时调用了deliver，收到确认消息时调用了confirmDelivery，那就从这两个函数入手。

 /**

   * Scala API: Send the message created by the `deliveryIdToMessage` function to

   * the `destination` actor. It will retry sending the message until

   * the delivery is confirmed with [[#confirmDelivery]]. Correlation

   * between `deliver` and `confirmDelivery` is performed with the

   * `deliveryId` that is provided as parameter to the `deliveryIdToMessage`

   * function. The `deliveryId` is typically passed in the message to the

   * destination, which replies with a message containing the same `deliveryId`.

   *

   * The `deliveryId` is a strictly monotonically increasing sequence number without

   * gaps. The same sequence is used for all destinations of the actor, i.e. when sending

   * to multiple destinations the destinations will see gaps in the sequence if no

   * translation is performed.

   *

   * During recovery this method will not send out the message, but it will be sent

   * later if no matching `confirmDelivery` was performed.

   *

   * This method will throw [[AtLeastOnceDelivery.MaxUnconfirmedMessagesExceededException]]

   * if [[#numberOfUnconfirmed]] is greater than or equal to [[#maxUnconfirmedMessages]].

   */

  def deliver(destination: ActorSelection)(deliveryIdToMessage: Long ⇒ Any): Unit = {

    internalDeliver(destination)(deliveryIdToMessage)

  }

　　注释中说deliveryIdToMessage是一个函数，它根据消息ID构建Any类型的消息。它会一直重试发送消息，直到通过confirmDelivery来确认消息送达。deliveryId是一个自增长的序列号，步长为1。在恢复时不会对外发送消息，但之后会重新发送未确认消息。如果未确认消息到达maxUnconfirmedMessages阈值，则会抛出异常AtLeastOnceDelivery.MaxUnconfirmedMessagesExceededException。

 private[akka] final def internalDeliver(destination: ActorSelection)(deliveryIdToMessage: Long ⇒ Any): Unit = {

    val isWildcardSelection = destination.pathString.contains("*")

    require(!isWildcardSelection, "Delivering to wildcard actor selections is not supported by AtLeastOnceDelivery. " +

      "Introduce an mediator Actor which this AtLeastOnceDelivery Actor will deliver the messages to," +

      "and will handle the logic of fan-out and collecting individual confirmations, until it can signal confirmation back to this Actor.")

    internalDeliver(ActorPath.fromString(destination.toSerializationFormat))(deliveryIdToMessage)

  }

　　internalDeliver首先会校验ActorSelection中不能包含*号，它说你可以自己实现，其实就是用一个中转actor来做汇总。具体为啥就不讨论了，反正注意这一点就好了。

private[akka] final def internalDeliver(destination: ActorPath)(deliveryIdToMessage: Long ⇒ Any): Unit = {

    if (unconfirmed.size >= maxUnconfirmedMessages)

      throw new MaxUnconfirmedMessagesExceededException(

        s"Too many unconfirmed messages, maximum allowed is [$maxUnconfirmedMessages]")

    val deliveryId = nextDeliverySequenceNr()

    val now = if (recoveryRunning) { System.nanoTime() - redeliverInterval.toNanos } else System.nanoTime()

    val d = Delivery(destination, deliveryIdToMessage(deliveryId), now, attempt = 0)

    if (recoveryRunning)

      unconfirmed = unconfirmed.updated(deliveryId, d)

    else

      send(deliveryId, d, now)

  }

　　第一个if语句就不说了，就是判断当前内存列表大小的，而且未保存消息是保存在unconfirmed中的。nextDeliverySequenceNr用来生成消息ID。之后创建了Delivery消息，封装了一些参数，大家要注意下这个case class各个字段的值。最后调用send发送Delivery消息。

private def send(deliveryId: Long, d: Delivery, timestamp: Long): Unit = {

    context.actorSelection(d.destination) ! d.message

    unconfirmed = unconfirmed.updated(deliveryId, d.copy(timestamp = timestamp, attempt = d.attempt + 1))

  }

　　send非常简单，就是把源消息发送出去，然后把对应的Delivery数据保存到unconfirmed中。

private var unconfirmed = immutable.SortedMap.empty[Long, Delivery]

　　unconfirmed是一个有序的Map，按照序列号排序。

/**

   * Call this method when a message has been confirmed by the destination,

   * or to abort re-sending.

   * @see [[#deliver]]

   * @return `true` the first time the `deliveryId` is confirmed, i.e. `false` for duplicate confirm

   */

  def confirmDelivery(deliveryId: Long): Boolean = {

    if (unconfirmed.contains(deliveryId)) {

      unconfirmed -= deliveryId

      true

    } else false

  }

　　confirmDelivery怎么实现的呢？就是从unconfirmed中移除对应deliveryId的数据，当然了没有收到deliveryId的确认消息，这个列表是不会移除相关数据的。这是不是太简单了点。你是不是想说一句f**k。哈哈，我也想说。不过消息重发是如何实现的呢？前文分析过，是通过心跳来实现的，这就会涉及到一个timer，timer一般会在preStart来实现或者在字段初始化时设定。不过翻遍AtLeastOnceDeliveryLike居然没找到相关的代码，只找到了一个timer的定义。

// will be started after recovery completed

  private var redeliverTask: Option[Cancellable] = None

private def startRedeliverTask(): Unit = {

    val interval = redeliverInterval / 2

    redeliverTask = Some(

      context.system.scheduler.schedule(interval, interval, self, RedeliveryTick)(context.dispatcher))

  }

　　官方注释说，会在recovery完成的时候启动这个timmer。如果你看过之前关于持久化的文章，就一定知道，在PersistentActor启动时候，会首先进行recovery操作，而不管这是第一次启动，还是一次重启。所有消息都恢复之后，会再发送恢复成功的消息，同时调用onReplaySuccess方法。而在AtLeastOnceDeliveryLike中也覆盖了onReplaySuccess。其实我们可以把onReplaySuccess看做普通actor的preStart函数。

override private[akka] def onReplaySuccess(): Unit = {

    redeliverOverdue()

    startRedeliverTask()

    super.onReplaySuccess()

  }

　　redeliverOverdue先不分析，可以看到startRedeliverTask的调用，也就是启动了一个timer。

/**

   * Interval between redelivery attempts.

   *

   * The default value can be configured with the

   * `akka.persistence.at-least-once-delivery.redeliver-interval`

   * configuration key. This method can be overridden by implementation classes to return

   * non-default values.

   */

  def redeliverInterval: FiniteDuration = defaultRedeliverInterval

  private val defaultRedeliverInterval: FiniteDuration =

    Persistence(context.system).settings.atLeastOnceDelivery.redeliverInterval

　　根据startRedeliverTask的源码来看，它以akka.persistence.at-least-once-delivery.redeliver-interval配置的一半时间作为间隔给self发送RedeliveryTick消息。

override protected[akka] def aroundReceive(receive: Receive, message: Any): Unit =

    message match {

      case RedeliveryTick ⇒

        redeliverOverdue()

      case x ⇒

        super.aroundReceive(receive, message)

    }

　　很显然收到RedeliveryTick消息，调用了redeliverOverdue方法，这个方法应该就是在重新投递未确认的消息喽。

private def redeliverOverdue(): Unit = {

    val now = System.nanoTime()

    val deadline = now - redeliverInterval.toNanos

    var warnings = Vector.empty[UnconfirmedDelivery]

    unconfirmed

      .iterator

      .filter { case (_, delivery) ⇒ delivery.timestamp <= deadline }

      .take(redeliveryBurstLimit)

      .foreach {

        case (deliveryId, delivery) ⇒

          send(deliveryId, delivery, now)

          if (delivery.attempt == warnAfterNumberOfUnconfirmedAttempts)

            warnings :+= UnconfirmedDelivery(deliveryId, delivery.destination, delivery.message)

      }

    if (warnings.nonEmpty)

      self ! UnconfirmedWarning(warnings)

  }

　　这段源码还算简单，就是用当前时间减去重发时间间隔，把小于该时间的消息重新发送，也就是说在一个时间间隔内的消息不会重发（因为还没有到重发的时间，这部分消息会在下一个时间间隔到达的时候发送）。

/**

   * Maximum number of unconfirmed messages that will be sent at each redelivery burst

   * (burst frequency is half of the redelivery interval).

   * If there's a lot of unconfirmed messages (e.g. if the destination is not available for a long time),

   * this helps to prevent an overwhelming amount of messages to be sent at once.

   *

   * The default value can be configured with the

   * `akka.persistence.at-least-once-delivery.redelivery-burst-limit`

   * configuration key. This method can be overridden by implementation classes to return

   * non-default values.

   */

  def redeliveryBurstLimit: Int = defaultRedeliveryBurstLimit

　　redeliveryBurstLimit这个参数也需要注意下，也就是说在重发未确认消息时，不能一次性发送完，而是有一个最大值限制的。不要问我为啥，要是我，就直接一次性全部发送了，堆积的消息还留着撑爆内存啊。不过akka是个通用的、稳定的框架，考虑一下也不算坏事吧。

　　注意，如果消息重试次数达到warnAfterNumberOfUnconfirmedAttempts这个阈值的话，会给self发送给一个UnconfirmedWarning消息，这跟直接丢弃好像不符合嘛，而且只有在等于阈值的时候才会发送UnconfirmedWarning消息。好丢脸，居然跟预想的不一样。不过这估计是为了考虑灵活性，如果到达重试阈值都没收到确认消息，需要开发者在收到UnconfirmedWarning消息后自行处理。如何处理？三个方案吧，手动调用confirmDelivery丢弃该消息，即发送失败；调用confirmDelivery先把该消息从内存中移除，然后再调用deliver相关的逻辑，做下一次重试，不过此时要做好重试消息的区分；忽略该消息。

　　分析到这里，AtLeastOnceDelivery机制就基本清楚了，那读者可能会问，这就是把未确认消息先保存在内存，等收到确认消息后再从内存中移除，这是不是太简单了。如果内存爆了，或者actor失败重启了，消息不就丢了？既然是基于持久化的，为啥不把未确认消息持久化呢？我想大概有几点可以说明吧，保存在内存就是为了快，如果每发一个消息都需要持久化，性能上跟不上，而且还涉及到顺序写和随机读两次IO。当然了，AtLeastOnceDelivery提供了持久化的接口。

/**

   * Full state of the `AtLeastOnceDelivery`. It can be saved with [[PersistentActor#saveSnapshot]].

   * During recovery the snapshot received in [[SnapshotOffer]] should be set

   * with [[#setDeliverySnapshot]].

   *

   * The `AtLeastOnceDeliverySnapshot` contains the full delivery state, including unconfirmed messages.

   * If you need a custom snapshot for other parts of the actor state you must also include the

   * `AtLeastOnceDeliverySnapshot`. It is serialized using protobuf with the ordinary Akka

   * serialization mechanism. It is easiest to include the bytes of the `AtLeastOnceDeliverySnapshot`

   * as a blob in your custom snapshot.

   */

  def getDeliverySnapshot: AtLeastOnceDeliverySnapshot =

    AtLeastOnceDeliverySnapshot(

      deliverySequenceNr,

      unconfirmed.map { case (deliveryId, d) ⇒ UnconfirmedDelivery(deliveryId, d.destination, d.message) }(breakOut))

　　这个是啥呢？其实就是为你构建一个AtLeastOnceDeliverySnapshot，这个值包含当前的发送序列号和未确认的消息列表。

/**

   * If snapshot from [[#getDeliverySnapshot]] was saved it will be received during recovery

   * in a [[SnapshotOffer]] message and should be set with this method.

   */

  def setDeliverySnapshot(snapshot: AtLeastOnceDeliverySnapshot): Unit = {

    deliverySequenceNr = snapshot.currentDeliveryId

    val now = System.nanoTime()

    unconfirmed = snapshot.unconfirmedDeliveries.map(d ⇒

      d.deliveryId → Delivery(d.destination, d.message, now, 0))(breakOut)

  }

　　还有一个就是setDeliverySnapshot，他就是从AtLeastOnceDeliverySnapshot中，恢复当前的发送序列号和未确认消息列表。

　　AtLeastOnceDelivery关于持久化，就提供了这两个接口，仅此而已！那啥时候调用setDeliverySnapshot呢，调用setDeliverySnapshot时的AtLeastOnceDeliverySnapshot参数从哪里获取呢？哈哈，你猜。

　　其实，这是AtLeastOnceDelivery灵活的地方，它让你自己去实现。怎么实现呢？如果你用过akka的持久化接口，就一定知道除了persist还有一个saveSnapshot函数，用来保存当前状态的快照。这就简单了，你可以给self发一个定时消息，或者在当前未确认消息达到一定值的时候，通过getDeliverySnapshot函数，获取当前未确认消息的快照，调用saveSnapshot保存起来。在receiveRecover方法中，收到SnapshotOffer消息后，再调用setDeliverySnapshot设置当前未确认消息。肿么样是不是很简单呢。

　　AtLeastOnceDelivery就是太简单了，所以还是有点问题的。比如发送的消息中只包含一个发送序列号，并没有消息ID和重试次数相关的信息，那如何区别消息是重试的，还是第一次发送的呢？我觉得这是一个很大的bug啊，因为我自己是无法区分消息是不是重发的！这样在使用AtLeastOnceDelivery时，就注意几点。

消息最好包含消息ID，即唯一值。
对消息的处理一定要幂等，也就是说收到重复的消息不会影响业务逻辑。
如果对消息的顺序要求严格，一定要仔细研究这其中的逻辑关系。

Akka源码分析-Persistence-AtLeastOnceDelivery的更多相关文章

Akka源码分析-Persistence
在学习akka过程中,我们了解了它的监督机制,会发现actor非常可靠,可以自动的恢复.但akka框架只会简单的创建新的actor,然后调用对应的生命周期函数,如果actor有状态需要回复,我们需要h ...
Akka源码分析-Persistence Query
Akka Persistence Query是对akka持久化的一个补充,它提供了统一的.异步的流查询接口.今天我们就来研究下这个Persistence Query. 前面我们已经分析过Akka Pe ...
Akka源码分析-Akka Typed
对不起,akka typed 我是不准备进行源码分析的,首先这个库的API还没有release,所以会may change,也就意味着其概念和设计包括API都会修改,基本就没有再深入分析源码的意义了. ...
Akka源码分析-Akka-Streams-概念入门
今天我们来讲解akka-streams,这应该算akka框架下实现的一个很高级的工具.之前在学习akka streams的时候,我是觉得云里雾里的,感觉非常复杂,而且又难学,不过随着对akka源码的深 ...
Akka源码分析-Cluster-Metrics
一个应用软件维护的后期一定是要做监控,akka也不例外,它提供了集群模式下的度量扩展插件. 其实如果读者读过前面的系列文章的话,应该是能够自己写一个这样的监控工具的.简单来说就是创建一个actor,它 ...
Akka源码分析-Cluster-Distributed Publish Subscribe in Cluster
在ClusterClient源码分析中,我们知道,他是依托于“Distributed Publish Subscribe in Cluster”来实现消息的转发的,那本文就来分析一下Pub/Sub是如 ...
Akka源码分析-Cluster-Singleton
akka Cluster基本实现原理已经分析过,其实它就是在remote基础上添加了gossip协议,同步各个节点信息,使集群内各节点能够识别.在Cluster中可能会有一个特殊的节点,叫做单例节点. ...
Akka源码分析-local-DeathWatch
生命周期监控,也就是死亡监控,是akka编程中常用的机制.比如我们有了某个actor的ActorRef之后,希望在该actor死亡之后收到响应的消息,此时我们就可以使用watch函数达到这一目的. c ...
Akka源码分析-Cluster-ActorSystem
前面几篇博客,我们依次介绍了local和remote的一些内容,其实再分析cluster就会简单很多,后面关于cluster的源码分析,能够省略的地方,就不再贴源码而是一句话带过了,如果有不理解的地方 ...

随机推荐

第十四节：Web爬虫之Ajax数据爬取
有时候在爬取数据的时候我们需要手动向上滑一下,网页才加载一定量的数据,但是网页的url并没有发生变化,这时我们就要考虑使用ajax进行数据爬取了...
洛谷 1339 [USACO09OCT]热浪Heat Wave
[题解] 最短路.那么直接写dijkstra就好了. #include<cstdio> #include<algorithm> #include<cstring> ...
c# 缓存！
做项目的时候获取所有城市的时候,发现每次去获取都花费了很多时间,所以用缓存方法让效率更高! 这是我做的例子,如下: public class CacheGetCity { /// <summar ...
定时任务-----Springboot中使用Scheduled做定时任务----http://www.cnblogs.com/lirenqing/p/6596557.html
Springboot中使用Scheduled做定时任务---http://www.cnblogs.com/lirenqing/p/6596557.html 已经验证的方案: pom文件加入依赖 < ...
这可能是vue-cli最全的解析了……
题言: 相信很多vue新手,都像我一样,只是知道可以用vue-cli直接生成一个vue项目的架构,并不明白,他究竟是怎么运行的,现在我们一起来研究一下... 一.安装vue-cli,相信你既然会用到v ...
[luoguP2659] 美丽的序列（单调栈）
传送门单调栈大水题 l[i] 表示 i 能扩展到的左边 r[i] 表示 i 能扩展到的右边 ——代码 #include <cstdio> #include <iostream> ...
[luoguP1879] [USACO06NOV]玉米田Corn Fields（DP）
传送门说要统计方案,感觉就是个 Σ 而矩阵中只有 01 ,可以用二进制表示这样,预处理出每一个每一行所有可能的状态 s 然后初始化第一行所有状态的方案数为 1 f[i][j] = Σf[i - 1 ...
hdu 2647拓扑排序容器
#include<stdio.h> #include<queue> #include<vector> #include<iostream> using ...
邮票（codevs 2033）
题目描述 Description 已知一个 N 枚邮票的面值集合(如,{1 分,3 分})和一个上限 K —— 表示信封上能够贴 K 张邮票.计算从 1 到 M 的最大连续可贴出的邮资. 例如,假设有 ...
[kuangbin带你飞]专题五并查集 A - Wireless Network
An earthquake takes place in Southeast Asia. The ACM (Asia Cooperated Medical team) have set up a wi ...

Akka源码分析-Persistence-AtLeastOnceDelivery

Akka源码分析-Persistence-AtLeastOnceDelivery的更多相关文章

随机推荐

热门专题