Two-Phase-Commit for Distributed In-Memory Caches--reference
Part I
reference from:http://gridgain.blogspot.kr/2014/09/two-phase-commit-for-distributed-in.html
2-Phase-Commit is probably one of the oldest consensus protocols and is known for its deficiencies when it comes to handling failures, as it may indefinitely block the servers waiting in prepare state. To mitigate this, a 3-Phase-Commit protocol was introduced which adds better fault tolerance at the expense of extra network round-trip message and higher latencies.
I would like to extend these traditional quorum concepts into distributed in-memory caching, as it is particularly relevant to what we do at GridGain. GridGain too has 2-phase-commit transactions, but unlike disk-based persistent systems, GridGain does not need to add a 3rd phase to the commit protocol in order to preserve data consistency during failures.
We want to avoid the 3-Phase-Commit protocol because it adds an additional network round-trip and has a negative impact on latencies and performance.
In GridGain, the data is partitioned in memory, which in the purest form means that every key-value pair is assigned to a specific node within the cluster. If for example, we have 100 keys and 4 nodes, then every node will cache 100 / 4 = 25keys. This way the more nodes we have, the more data we can cache. Of course, in real life scenarios, we also have to worry about failures and redundancy, so for every key-value pair we will have 1 primary copy, and 1 or more backup copies.
Let us now look at the 2-Phase-Commit protocol in more detail to see why we can avoid the 3rd commit phase in GridGain.
Two-Phase-Commit Protocol
Generally the 2-Phase-Commit protocol is initiated whenever an application has already made a decision to commit the transaction. The Coordinator node sends a Prepare message to all the participating nodes holding primary copies (primary nodes), and the primary nodes, after acquiring the necessary locks, synchronously send the "Prepare"message to the nodes holding backup copies (backup nodes). Once every node votes "Yes", then the"Commit" message is sent and the transaction gets committed.
Advantages of In-Memory
So, why does in-memory-only processing allow us to avoid the 3rd phase as opposed to persistent disk-based transactions? The main reason is that the in-memory cache is purely volatile and does not need to worry about leaving any state behind in case of a crash. Contrast that to the persistent architectures, where we need to worry whether the state on disk was committed or not after a failure had occurred.
Another advantage of in-memory-only distributed cache is that the only valid vote for the "Prepare" message is "Yes". There is really no valid reason for any cluster member to vote "No". This essentially means that the only reason a rollback should happen is if the Coordinator node crashed before it was able to send the "Prepare" message to all the participating nodes.
Now, that we have the ground rules settled, let's analyze what happens in case of failures:
Backup Node Failures
If a backup node fails during either "Prepare" phase or "Commit" phase, then no special handling is needed. The data will still be committed on the nodes that are alive. GridGain will then, in the background, designate a new backup node and the data will be copied there outside of the transaction scope.
Primary Node Failures
If a primary node fails before or during the "Prepare" phase, then the coordinator will designate one of the backup nodes to become primary and retry the "Prepare" phase. If the failure happens before or during the "Commit" phase, then the backup nodes will detect the crash and send a message to the Coordinator node to find out whether to commit or rollback. The transaction still completes and the data within distributed cache remains consistent.
Coordinator Node Failures
Failure of the Coordinator node becomes a little tricky. Without the Coordinator node, neither primary nodes, nor backup nodes know whether to commit the transaction or to rollback. In this case the "Recovery Protocol" initiates, and all the nodes participating in the transaction send a message to every other participating node asking whether the"Prepare" message was received. If at least one of the nodes replied "No", then the transaction will be rolled back, otherwise the transaction will be committed.
Note that after all the nodes received the "Prepare" message, we can safely commit, since voting "No" is impossible during the "Prepare" phase as stated above.
It is possible that some nodes will have already committed the transaction by the time they have received the Recovery Protocol message from other nodes. For such cases, every node keeps a backlog of completed transaction IDs for a certain period of time. If there is no ongoing transaction with given ID found, then the backlog is checked. If the transaction was not found in the backlog, then it was never started, which means that the failure had occurred before the Prepare phase was completed and it is safe to rollback.
Since the data is volatile, and does not leave any state after the crash, it does not really matter if any of the primary or backup nodes also crashed together with the coordinator node. The Recovery Protocol still works the same.
Conclusion
Note that we are able to fully recover the transaction without introducing the 3rd commit phase mainly because the data in distributed in-memory caches is volatile and node crashes do not leave any state behind.
The important advantage of the 2-Phase-Commit Recovery Protocol in GridGain over the 3-Phase-Commit is that, in the absence of failures, the recovery protocol does not introduce any additional overhead, while the 3-phase-commit adds an additional synchronous network roundtrip to the transaction and, therefore, has negative impact on performance and latencies.
GridGain also supports modes where it works with various persistent stores including transactional databases. In my next blog, I will cover the scenario where a cache sits on top of a transactional persistent store, and how the 2-Phase-Commit protocol works in that case.
Part II
reference from:http://java.dzone.com/articles/two-phase-commit-memory-caches
Generally, persistent disk-oriented systems will require the additional 3rd phase in commit protocol in order to ensure data consistency in case of failures. In my previous blog I covered why the 2-Phase-Commit protocol (without 3rd phase) is sufficient to handle failures for distributed in-memory caches. The explanation was based on the open source GridGain architecture, however it can be applied to any in-memory distributed system.
In this blog we will cover a case when an in-memory cache serves as a layer on top of a persistent database. In this case the database serves as a primary system of records, and distributed in-memory cache is added for performance and scalability reasons to accelerate reads and (sometimes) writes to the data. Cache must be kept consistent with database which means that a cache transaction must merge with the database transaction.
When we add a persistent store to an in-memory cache, our primary goal is to make sure that the cache will remain consistent with on-disk database at all times.
In order to keep the data consistent between memory and database, data is automatically loaded on demand whenever a read happens and the data cannot be found in cache. This behavior is called read-through. Alternatively, whenever a write operation happens, data is stored in cache and is automatically persisted to the database. This behavior is called write-through. Additionally, there is also a mode called write-behind which batches up the writes in memory and flushes them to the database in one bulk operation (we will not be covering this mode here).
Conclusion
Two-Phase-Commit for Distributed In-Memory Caches--reference的更多相关文章
- XA transaction ,JTA , two phase commit , GTX0-j
本文主要是对以前一直迷惑的几个名字的学习. xa transaction 简单查了一下,大概就是 distributed transaction. 相比传统的transaction而言,传统的tra ...
- Satisfying memory ordering requirements between partial reads and non-snoop accesses
A method and apparatus for preserving memory ordering in a cache coherent link based interconnect in ...
- Why Memory Barrier?
引言:xchg做了什么? 首先,xchg eax, ecx并不会比mov edx, eax + mov eax, ecx + mov ecx, edx这三条指令加一起快,原因是xchg有副作用. Mi ...
- Apache Spark 2.2.0 中文文档 - Spark RDD(Resilient Distributed Datasets)论文 | ApacheCN
Spark RDD(Resilient Distributed Datasets)论文 概要 1: 介绍 2: Resilient Distributed Datasets(RDDs) 2.1 RDD ...
- Apache Spark RDD(Resilient Distributed Datasets)论文
Spark RDD(Resilient Distributed Datasets)论文 概要 1: 介绍 2: Resilient Distributed Datasets(RDDs) 2.1 RDD ...
- Spark的核心RDD(Resilient Distributed Datasets弹性分布式数据集)
Spark的核心RDD (Resilient Distributed Datasets弹性分布式数据集) 原文链接:http://www.cnblogs.com/yjd_hycf_space/p/7 ...
- [原]git的使用(一)---建立本地仓库、add和commit、status和git diff、版本回退使用git reset
在window下已经安装了git的环境 1.建立本地仓库 mkdir test #建立test目录 cd test #进入目录 git init # ...
- spark hadoop 对比 Resilient Distributed Datasets
hadoop 迭代消耗大 每次迭代启动一个完整的MapReduce作业 spark 首要目标就是避免运算时 过多的网络和磁盘IO开销 Resilient Distributed Datasets ht ...
- Distributed Symmetric Multiprocessing Computing Architecture
Example embodiments of the present invention includes systems and methods for implementing a scalabl ...
- Method and apparatus for training a memory signal via an error signal of a memory
Described herein is a method and an apparatus for training a memory signal via an error signal of a ...
随机推荐
- 《C# 并发编程 · 经典实例》读书笔记
前言 最近在看<C# 并发编程 · 经典实例>这本书,这不是一本理论书,反而这是一本主要讲述怎么样更好的使用好目前 C#.NET 为我们提供的这些 API 的一本书,书中绝大部分是一些实例 ...
- Hibernate 性能优化之抓取策略
fetch 抓取策略 前提条件:必须是一个对象操作其关联对象. 1. 根据一的一方加载多的一方,在一的一方集合中,有三个值:join/select/subselect 2.根据多的一方加载一的一方, ...
- [置顶] Spring的DI依赖实现分析
DI(依赖注入)是Spring最底层的核心容器要实现的功能之一,利用DI可以实现程序功能的控制反转(控制反转即程序之间之间的依赖关系不再由程序员来负责,而是由Spring容器来负责) 一个简单的例子( ...
- linux c下几种定时器实现
1.alarm n秒后触发一次,不是循环的2.setitimer 可以发出3种信号给自己,3.timerfd 这个接口基于文件描述符,通过文件描述符类似epoll那种的可读事件进行超时通知,能够被用于 ...
- 试用ubuntu-12.04.3-desktop-amd64(二)
首先说明,采用主机+虚拟机+ubuntu的形式,更具体的则为Win7-64bit + VMWare + ubuntu-12.04.3-desktop-amd64 进入ubuntu后首先考虑到的就是怎么 ...
- WPS 去掉自动打开的文档漫游和在线模板
关闭文档漫游 在cmd(命令提示符)中输入regedit.exe回车,将弹出”注册表编辑器“,选择HKEY_CURRENT_USER>>Software>>Kingsoft& ...
- 广东移动NGBOSS系统话费查询
基于很多客户的需要 现承接广东移动NGBOSS华为系统的各项功能开发 承接广东深圳.佛山.东莞.广州.惠州.汕头.湛江移动NGBOSS的全球通开户,批量话费查询.缴费,号码导出等功能开发. 有需要者联 ...
- CSS 字体描边
-webkit-text-stroke: 2px #; text-stroke: 2px #; -o-text-stroke: 2px #;
- PHP文章管理
功能说明: 文章的基本操作:添加,修改,锁定,解锁,推荐,删除等待 并有强大功能的搜索,评论,推荐给朋友等功能,并对安全性进行着重加强,漂亮的界面人性化的设计. 主要文件列表: setup.p ...
- python模块之json序列化
31.序列化: 1.json实现序列化,json.dumps()和json.loads(). >>> s1 = {'k1':'v1','k2':'v2' ...