其实akka的DistributedData有点类似缓存系统,当你需要在集群中分享数据的话,DistributedData就非常有用了。可以通过跟K/V缓存系统类似的API来存取数据,不过DistributedData中南的数据是 Conflict Free Replicated Data Types (CRDTs),即无冲突可复制数据类型,CRDT我也不太熟,就不介绍了,感兴趣的同学可以自行谷歌。我们姑且先认为它是用来解决数据复制的最终一致性的吧。
Akka Distributed Data的所有数据实体分布在所有节点或一组节点上,这是通过基于gossip协议的复制来实现的。可以有更细粒度的一致性读写控制。CRDT可以在没有协调器的情况下对数据进行更新,所有的一致性更新会被所有节点通过可监控的合并操作解决。数据的状态最终达到一致。
Akka Distributed Data支持的数据类型必须是收敛的CRDT,且继承ReplicatedData特质,也就是说都必须提供单调的合并函数,并且状态变化总是收敛的。akka内置的数据类型有:
- Counters:
- Sets:
- Maps:
- Registers:
GCounter是一个只增长的计数器,它只能增加,不能减少。它以类似于向量时钟的方式工作,跟踪所有节点的值,以最大值进行合并。如果同时需要对计数器递增和递减 ,就需要使用PNCounter(正负计数器)了。PNCounter单独对递增和递减进行跟踪,二者都是以内部的GCounter来表示,合并的时候也是通过GCounter。
GSet是一个只能增加元素的集合;ORSet(observed-remove set)可以同时增加、删除元素。ORSet有一个版本向量,它在增加元素的时候递增。版本向量被一个叫“birth dot”的对象跟踪。
ORMap(observed-remove map)是一个map,其key可以是任何类型,values必须是ReplicatedData类型,它支持增加、更新、删除。如果增加和删除同时执行,则增加会成功。如果多个更新同时执行,则values会被合并。
ORMultiMap (observed-remove multi-map)是一个多值映射的map,其values是ORSet类型。
PNCounterMap (positive negative counter map) 是一个命名计数器,其values是PNCounte类型。
LWWMap (last writer wins map)是一个map,其values是LWWRegister (last writer wins register)。
LWWRegister (last writer wins register)可以保存任何能序列化的值。它保存最后更新的值,其实“最后”是很难判断的,因为在分布式环境下,各个节点很难达到绝对的时间一致的状态。且如果时间一致,会以IP地址最小的值为准。这也就意味着LWWRegister的值并不一定是物理上最新的值,也就意味着不一定是一致性更新,说白了就不是真的最终一致。
说了那么多,Akka Distributed Data其实不是一个缓存系统,它并不适用于所有类型的问题,最终一致性也并不一定符合所有的场景。而且它也不是为大数据准备的,顶级实体的数量不应超过10万。当有新节点加入集群的时候,所有的数据都会被转移到新节点。所有的数据都是在内存中的,这也是不适合大数据的另外一个原因。当数据实体变化的时候,它的所有状态可能会被复制到其他所有节点,如果它支持增量CRDT也是可以增量赋值的。
class DataBot extends Actor with ActorLogging {
import DataBot._ val replicator = DistributedData(context.system).replicator
implicit val node = Cluster(context.system) import context.dispatcher
val tickTask = context.system.scheduler.schedule(5.seconds, 5.seconds, self, Tick) val DataKey = ORSetKey[String]("key") replicator ! Subscribe(DataKey, self) def receive = {
case Tick ⇒
val s = ThreadLocalRandom.current().nextInt(97, 123).toChar.toString
if (ThreadLocalRandom.current().nextBoolean()) {
// add
log.info("Adding: {}", s)
replicator ! Update(DataKey, ORSet.empty[String], WriteLocal)(_ + s)
} else {
// remove
log.info("Removing: {}", s)
replicator ! Update(DataKey, ORSet.empty[String], WriteLocal)(_ - s)
} case _: UpdateResponse[_] ⇒ // ignore case c @ Changed(DataKey) ⇒
val data = c.get(DataKey)
log.info("Current elements: {}", data.elements)
} override def postStop(): Unit = tickTask.cancel() }
* Akka extension for convenient configuration and use of the
* [[Replicator]]. Configuration settings are defined in the
* `akka.cluster.ddata` section, see `reference.conf`.
class DistributedData(system: ExtendedActorSystem) extends Extension { private val config = system.settings.config.getConfig("akka.cluster.distributed-data")
private val settings = ReplicatorSettings(config) /**
* Returns true if this member is not tagged with the role configured for the
* replicas.
def isTerminated: Boolean = Cluster(system).isTerminated || !settings.roles.subsetOf(Cluster(system).selfRoles) /**
* `ActorRef` of the [[Replicator]] .
val replicator: ActorRef =
if (isTerminated) {
system.log.warning("Replicator points to dead letters: Make sure the cluster node is not terminated and has the proper role!")
} else {
val name = config.getString("name")
system.systemActorOf(Replicator.props(settings), name)
* A replicated in-memory data store supporting low latency and high availability
* requirements.
* The `Replicator` actor takes care of direct replication and gossip based
* dissemination of Conflict Free Replicated Data Types (CRDTs) to replicas in the
* the cluster.
* The data types must be convergent CRDTs and implement [[ReplicatedData]], i.e.
* they provide a monotonic merge function and the state changes always converge.
* You can use your own custom [[ReplicatedData]] or [[DeltaReplicatedData]] types,
* and several types are provided by this package, such as:
* <ul>
* <li>Counters: [[GCounter]], [[PNCounter]]</li>
* <li>Registers: [[LWWRegister]], [[Flag]]</li>
* <li>Sets: [[GSet]], [[ORSet]]</li>
* <li>Maps: [[ORMap]], [[ORMultiMap]], [[LWWMap]], [[PNCounterMap]]</li>
* </ul>
* The `Replicator` actor must be started on each node in the cluster, or group of
* nodes tagged with a specific role. It communicates with other `Replicator` instances
* with the same path (without address) that are running on other nodes . For convenience it
* can be used with the [[DistributedData]] extension but it can also be started as an ordinary
* actor using the `Replicator.props`. If it is started as an ordinary actor it is important
* that it is given the same name, started on same path, on all nodes.
* The protocol for replicating the deltas supports causal consistency if the data type
* is marked with [[RequiresCausalDeliveryOfDeltas]]. Otherwise it is only eventually
* consistent. Without causal consistency it means that if elements 'c' and 'd' are
* added in two separate `Update` operations these deltas may occasionally be propagated
* to nodes in different order than the causal order of the updates. For this example it
* can result in that set {'a', 'b', 'd'} can be seen before element 'c' is seen. Eventually
* it will be {'a', 'b', 'c', 'd'}.
* == CRDT Garbage ==
* One thing that can be problematic with CRDTs is that some data types accumulate history (garbage).
* For example a `GCounter` keeps track of one counter per node. If a `GCounter` has been updated
* from one node it will associate the identifier of that node forever. That can become a problem
* for long running systems with many cluster nodes being added and removed. To solve this problem
* the `Replicator` performs pruning of data associated with nodes that have been removed from the
* cluster. Data types that need pruning have to implement [[RemovedNodePruning]]. The pruning consists
* of several steps:
* <ol>
* <li>When a node is removed from the cluster it is first important that all updates that were
* done by that node are disseminated to all other nodes. The pruning will not start before the
* `maxPruningDissemination` duration has elapsed. The time measurement is stopped when any
* replica is unreachable, but it's still recommended to configure this with certain margin.
* It should be in the magnitude of minutes.</li>
* <li>The nodes are ordered by their address and the node ordered first is called leader.
* The leader initiates the pruning by adding a `PruningInitialized` marker in the data envelope.
* This is gossiped to all other nodes and they mark it as seen when they receive it.</li>
* <li>When the leader sees that all other nodes have seen the `PruningInitialized` marker
* the leader performs the pruning and changes the marker to `PruningPerformed` so that nobody
* else will redo the pruning. The data envelope with this pruning state is a CRDT itself.
* The pruning is typically performed by "moving" the part of the data associated with
* the removed node to the leader node. For example, a `GCounter` is a `Map` with the node as key
* and the counts done by that node as value. When pruning the value of the removed node is
* moved to the entry owned by the leader node. See [[RemovedNodePruning#prune]].</li>
* <li>Thereafter the data is always cleared from parts associated with the removed node so that
* it does not come back when merging. See [[RemovedNodePruning#pruningCleanup]]</li>
* <li>After another `maxPruningDissemination` duration after pruning the last entry from the
* removed node the `PruningPerformed` markers in the data envelope are collapsed into a
* single tombstone entry, for efficiency. Clients may continue to use old data and therefore
* all data are always cleared from parts associated with tombstoned nodes. </li>
* </ol>
final class Replicator(settings: ReplicatorSettings) extends Actor with ActorLogging
// the actual data
var dataEntries = Map.empty[KeyId, (DataEnvelope, Digest)]
type KeyId = String
// Gossip Status message contains SHA-1 digests of the data to determine when
// to send the full data
type Digest = ByteString
* The `DataEnvelope` wraps a data entry and carries state of the pruning process for the entry.
final case class DataEnvelope(
data: ReplicatedData,
pruning: Map[UniqueAddress, PruningState] = Map.empty,
deltaVersions: VersionVector = VersionVector.empty)
extends ReplicatorMessage
def receive =
if (hasDurableKeys) load
else normalReceive
val normalReceive: Receive = {
case Get(key, consistency, req) ⇒ receiveGet(key, consistency, req)
case u @ Update(key, writeC, req) ⇒ receiveUpdate(key, u.modify, writeC, req)
case Read(key) ⇒ receiveRead(key)
case Write(key, envelope) ⇒ receiveWrite(key, envelope)
case ReadRepair(key, envelope) ⇒ receiveReadRepair(key, envelope)
case DeltaPropagation(from, reply, deltas) ⇒ receiveDeltaPropagation(from, reply, deltas)
case FlushChanges ⇒ receiveFlushChanges()
case DeltaPropagationTick ⇒ receiveDeltaPropagationTick()
case GossipTick ⇒ receiveGossipTick()
case ClockTick ⇒ receiveClockTick()
case Status(otherDigests, chunk, totChunks) ⇒ receiveStatus(otherDigests, chunk, totChunks)
case Gossip(updatedData, sendBack) ⇒ receiveGossip(updatedData, sendBack)
case Subscribe(key, subscriber) ⇒ receiveSubscribe(key, subscriber)
case Unsubscribe(key, subscriber) ⇒ receiveUnsubscribe(key, subscriber)
case Terminated(ref) ⇒ receiveTerminated(ref)
case MemberWeaklyUp(m) ⇒ receiveWeaklyUpMemberUp(m)
case MemberUp(m) ⇒ receiveMemberUp(m)
case MemberRemoved(m, _) ⇒ receiveMemberRemoved(m)
case evt: MemberEvent ⇒ receiveOtherMemberEvent(evt.member)
case UnreachableMember(m) ⇒ receiveUnreachable(m)
case ReachableMember(m) ⇒ receiveReachable(m)
case GetKeyIds ⇒ receiveGetKeyIds()
case Delete(key, consistency, req) ⇒ receiveDelete(key, consistency, req)
case RemovedNodePruningTick ⇒ receiveRemovedNodePruningTick()
case GetReplicaCount ⇒ receiveGetReplicaCount()
case TestFullStateGossip(enabled) ⇒ fullStateGossipEnabled = enabled
def receiveSubscribe(key: KeyR, subscriber: ActorRef): Unit = {
newSubscribers.addBinding(key.id, subscriber)
if (!subscriptionKeys.contains(key.id))
subscriptionKeys = subscriptionKeys.updated(key.id, key)
private[akka] type KeyR = Key[ReplicatedData]
* Key for the key-value data in [[Replicator]]. The type of the data value
* is defined in the key. Keys are compared equal if the `id` strings are equal,
* i.e. use unique identifiers.
* Specific classes are provided for the built in data types, e.g. [[ORSetKey]],
* and you can create your own keys.
abstract class Key[+T <: ReplicatedData](val id: Key.KeyId) extends Serializable
def receiveUpdate(key: KeyR, modify: Option[ReplicatedData] ⇒ ReplicatedData,
writeConsistency: WriteConsistency, req: Option[Any]): Unit = {
val localValue = getData(key.id) def deltaOrPlaceholder(d: DeltaReplicatedData): Option[ReplicatedDelta] = {
d.delta match {
case s @ Some(_) ⇒ s
case None ⇒ Some(NoDeltaPlaceholder)
} Try {
localValue match {
case Some(DataEnvelope(DeletedData, _, _)) ⇒ throw new DataDeleted(key, req)
case Some(envelope @ DataEnvelope(existing, _, _)) ⇒
modify(Some(existing)) match {
case d: DeltaReplicatedData if deltaCrdtEnabled ⇒
(envelope.merge(d.resetDelta.asInstanceOf[existing.T]), deltaOrPlaceholder(d))
case d ⇒
(envelope.merge(d.asInstanceOf[existing.T]), None)
case None ⇒ modify(None) match {
case d: DeltaReplicatedData if deltaCrdtEnabled ⇒
(DataEnvelope(d.resetDelta), deltaOrPlaceholder(d))
case d ⇒ (DataEnvelope(d), None)
} match {
case Success((envelope, delta)) ⇒
log.debug("Received Update for key [{}]", key) // handle the delta
delta match {
case Some(d) ⇒ deltaPropagationSelector.update(key.id, d)
case None ⇒ // not DeltaReplicatedData
} // note that it's important to do deltaPropagationSelector.update before setData,
// so that the latest delta version is used
val newEnvelope = setData(key.id, envelope) val durable = isDurable(key.id)
if (isLocalUpdate(writeConsistency)) {
if (durable)
durableStore ! Store(key.id, new DurableDataEnvelope(newEnvelope),
Some(StoreReply(UpdateSuccess(key, req), StoreFailure(key, req), replyTo)))
replyTo ! UpdateSuccess(key, req)
} else {
val (writeEnvelope, writeDelta) = delta match {
case Some(NoDeltaPlaceholder) ⇒ (newEnvelope, None)
case Some(d: RequiresCausalDeliveryOfDeltas) ⇒
val v = deltaPropagationSelector.currentVersion(key.id)
(newEnvelope, Some(Delta(newEnvelope.copy(data = d), v, v)))
case Some(d) ⇒ (newEnvelope.copy(data = d), None)
case None ⇒ (newEnvelope, None)
val writeAggregator =
context.actorOf(WriteAggregator.props(key, writeEnvelope, writeDelta, writeConsistency,
req, nodes, unreachable, replyTo, durable)
if (durable) {
durableStore ! Store(key.id, new DurableDataEnvelope(newEnvelope),
Some(StoreReply(UpdateSuccess(key, req), StoreFailure(key, req), writeAggregator)))
case Failure(e: DataDeleted[_]) ⇒
log.debug("Received Update for deleted key [{}]", key)
replyTo ! e
case Failure(e) ⇒
log.debug("Received Update for key [{}], failed: {}", key, e.getMessage)
replyTo ! ModifyFailure(key, "Update failed: " + e.getMessage, e, req)
def getData(key: KeyId): Option[DataEnvelope] = dataEntries.get(key).map { case (envelope, _) ⇒ envelope }
def write(key: KeyId, writeEnvelope: DataEnvelope): Option[DataEnvelope] = {
getData(key) match {
case someEnvelope @ Some(envelope) if envelope eq writeEnvelope ⇒ someEnvelope
case Some(DataEnvelope(DeletedData, _, _)) ⇒ Some(DeletedEnvelope) // already deleted
case Some(envelope @ DataEnvelope(existing, _, _)) ⇒
try {
// DataEnvelope will mergeDelta when needed
val merged = envelope.merge(writeEnvelope).addSeen(selfAddress)
Some(setData(key, merged))
} catch {
case e: IllegalArgumentException ⇒
"Couldn't merge [{}], due to: {}", key, e.getMessage)
case None ⇒
// no existing data for the key
val writeEnvelope2 =
writeEnvelope.data match {
case d: ReplicatedDelta ⇒
val z = d.zero
writeEnvelope.copy(data = z.mergeDelta(d.asInstanceOf[z.D]))
case _ ⇒
} val writeEnvelope3 = writeEnvelope2.addSeen(selfAddress)
Some(setData(key, writeEnvelope3))
