CAP Confusion: Problems with ‘partition tolerance’
- by Henry Robinson, April 26, 2010
The ‘CAP’ theorem is a hot topic in the design of distributed data storage systems. However, it’s often widely misused. In this post I hope to highlight why the common ‘consistency, availability and partition tolerance: pick two’ formulation is inadequate for
distributed systems. In fact, the lesson of the theorem is that the choice is almost always between sequential consistency and high availability.
It’s very common to invoke the ‘CAP theorem’ when designing, or talking about designing, distributed data storage systems. The theorem, as commonly stated, gives system designers a choice between three competing guarantees:
- Consistency – roughly meaning that all clients of a data store get responses to requests that ‘make sense’. For example, if Client A writes 1 then 2 to location X, Client B cannot read 2 followed by 1.
- Availability – all operations on a data store eventually return successfully. We say that a data store is ‘available’ for, e.g. write operations.
- Partition tolerance – if the network stops delivering messages between two sets of servers, will the system continue to work correctly?
This is often summarised as a single sentence: “consistency, availability, partition tolerance. Pick two.”. Short, snappy and useful.
At least, that’s the conventional wisdom. Many modern distributed data stores, including those often caught under the ‘NoSQL’ net, pride themselves on offering availability and partition tolerance over strong consistency; the reasoning being that short periods
of application misbehavior are less problematic than short periods of unavailability. Indeed, Dr. Michael Stonebraker posted an
article on the ACM’s blog bemoaning the preponderance of systems that are choosing the ‘AP’ data point, and that consistency and availability are the two to choose. However for the vast majority of systems, I contend that the choice is almost always between
consistency and availability, and unavoidably so.
Dr. Stonebraker’s central thesis is that, since partitions are rare, we might simply sacrifice ‘partition-tolerance’ in favour of sequential consistency and availability – a model that is well suited to traditional transactional data processing and the maintainance
of the good old ACID invariants of most relational databases. I want to illustrate why this is a misinterpretation of the CAP theorem.
We first need to get exactly what is meant by ‘partition tolerance’ straight. Dr. Stonebraker asserts that a system is partition tolerant if processing can continue in both partitions in the case of a network failure.
“If there is a network failure that splits the processing nodes into two groups that cannot talk to each other, then the goal would be to allow processing to continue in both subgroups.”
This is actually a very strong partition tolerance requirement. Digging into the history of the CAP theorem reveals some divergence from this definition.
Seth Gilbert and Professor Nancy Lynch provided both a formalisation and a proof of the CAP theorem in their
2002 SIGACT paper. We should defer to their definition of partition tolerance – if we are going to invoke CAP as a mathematical truth, we should formalize our foundations, otherwise we are building on very shaky ground. Gilbert and Lynch define partition
tolerance as follows:
“The network will be allowed to lose arbitrarily many messages sent from one node to another”
Note that Gilbert and Lynch’s definition isn’t a property of a distributed application, but a property of the network in which it executes. This is often misunderstood: partition tolerance is not something we have a choice about designing into our systems.
If you have a partition in your network, you lose either consistency (because you allow updates to both sides of the partition) or you lose availability (because you detect the error and shutdown the system until the error condition is resolved). Partition
tolerance means simply developing a coping strategy by choosing which of the other system properties to drop. This is the real lesson of the CAP theorem –
if you have a network that may drop messages, then you cannot have both availability and consistency, you must choose one. We should really be writing
Possibility of Network Partitions => not(availability and consistency), but that’s not nearly so snappy.
Dr. Stonebraker’s definition of partition tolerance is actually a measure of
availability – if a write may go to either partition, will it eventually be responded to? This is a very meaningful question for systems distributed across many geographic locations, but for the LAN case it is less common to have two partitions available
for writes. However, it is encompassed by the requirement for availability that we already gave – if your system is available for writes at all times, then it is certainly available for writes during a network partition.
So what causes partitions? Two things, really. The first is obvious – a network failure, for example due to a faulty switch, can cause the network to partition. The other is less obvious, but fits with the definition from Gilbert and Lynch: machine failures,
either hard or soft. In an asynchronous network, i.e. one where processing a message could take unbounded time, it is impossible to distinguish between machine failures and lost messages. Therefore a single machine failure partitions it from the rest of the
network. A correlated failure of several machines partitions them all from the network. Not being able to receive a message is the same as the network not delivering it. In the face of sufficiently many machine failures, it is still impossible to maintain
availability and consistency, not because two writes may go to separate partitions, but because the failure of an entire ‘quorum’ of servers may render some recent writes unreadable.
This is why defining P as ‘allowing partitioned groups to remain available’ is misleading – machine failures
are partitions, almost tautologously, and by definition cannot be available while they are failed. Yet, Dr. Stonebraker says that he would suggest choosing CA rather than P. This feels rather like we are invited to both have our cake and eat it. Not
‘choosing’ P is analogous to building a network that will never experience multiple correlated failures. This is unreasonable for a distributed system – precisely for all the valid reasons that are laid out in the CACM post about correlated failures, OS bugs
and cluster disasters – so what a designer has to do is to decide between maintaining consistency and availability. Dr. Stonebraker tells us to choose consistency, in fact, because availability will unavoidably be impacted by large failure incidents. This
is a legitimate design choice, and one that the traditional RDBMS lineage of systems has explored to its fullest, but it implicitly protects us neither from availability problems stemming from smaller failure incidents, nor from the high cost of maintaining
sequential consistency.
When the scale of a system increases to many hundreds or thousands of machines, writing in such a way to allow consistency in the face of potential failures can become very expensive (you have to write to one more machine than failures you are prepared to
tolerate at once). This kind of nuance is not captured by the CAP theorem: consistency is often much more expensive in terms of throughput or latency to maintain than availability.
Systems such as ZooKeeper are explicitly sequentially consistent because there are few enough nodes in a cluster that the cost of writing to quorum is relatively small. The
Hadoop Distributed File System (HDFS) also chooses consistency – three failed datanodes can render a file’s blocks unavailable if you are unlucky. Both systems are designed to work in
real networks, however, where partitions and failures will occur*, and when they do both systems will become unavailable, having made their choice between consistency and availability. That choice remains the unavoidable reality for distributed data stores.
Further Reading
*For more on the inevitably of failure modes in large distributed systems, the interested reader is referred to James Hamilton’s LISA ’07 paper
On Designing and Deploying Internet-Scale Services.
Daniel Abadi has written an excellent critique of the CAP theorem.
James Hamilton also responds to Dr. Stonebraker’s blog entry, agreeing (as I do) with the problems of eventual consistency but taking issue with the notion of infrequent network partitions.
原文:http://blog.cloudera.com/blog/2010/04/cap-confusion-problems-with-partition-tolerance/
CAP Confusion: Problems with ‘partition tolerance’的更多相关文章
- CAP理论中, P(partition tolerance, 分区容错性)的合理解释
在CAP理论中, 对partition tolerance分区容错性的解释一般指的是分布式网络中部分网络不可用时, 系统依然正常对外提供服务, 而传统的系统设计中往往将这个放在最后一位. 这篇文章对这 ...
- 详解 CAP 定理 Consistency(一致性)、 Availability(可用性)、Partition tolerance(分区容错性)
CAP原则又称CAP定理,指的是在一个分布式系统中,Consistency(一致性). Availability(可用性).Partition tolerance(分区容错性),三者不可得兼. 分布式 ...
- 分布式CAP理论介绍:一致性(Consistency),可用性(Availability),容忍网络分区(Partition tolerance)
在理论计算机科学中,CAP定理(CAP theorem),又被称作布鲁尔定理(Brewer's theorem),它指出对于一个分布式计算系统来说,不可能同时满足以下三点: 一致性(Consisten ...
- consistence availability partition tolerance quit
理论证明
- 分布式系统理论基础 - CAP
引言 CAP是分布式系统.特别是分布式存储领域中被讨论最多的理论,“什么是CAP定理?”在Quora 分布式系统分类下排名 FAQ 的 No.1.CAP在程序员中也有较广的普及,它不仅仅是“C.A.P ...
- 分布式系统理论基础2 :CAP
本文转自:https://www.cnblogs.com/bangerlee/p/5328888.html 本系列文章将整理到我在GitHub上的<Java面试指南>仓库,更多精彩内容请到 ...
- 数据一致性(consistency)、服务可用性(availability)、分区容错性(partition-tolerance)
数据一致性(consistency).服务可用性(availability).分区容错性(partition-tolerance) 分布式系统理论基础 - CAP 2016-04-04 18:27 b ...
- Dynamo分布式系统——「RWN」协议解决多备份数据如何读写来保证数据一致性,而「向量时钟」来保证当读取到多个备份数据的时候,如何判断哪些数据是最新的这种情况
转自:http://blog.jqian.net/post/dynamo.html Dynamo是Amazon开发的一款高可用的分布式KV系统,已经在Amazon商店的后端存储有很成熟的应用.它的特点 ...
- 了解了解你自己的话zookeeper(从那时起,纠正了一些说法在线)
1,先看看官方的定义吧: ZooKeeper is a distributed, open-source coordination service for distributed applicatio ...
随机推荐
- python成长之路【第一篇】:python简介和入门
一.Python简介 Python(英语发音:/ˈpaɪθən/), 是一种面向对象.解释型计算机程序设计语言. 二.安装python windows: 1.下载安装包 https://www.pyt ...
- poj2194Stacking Cylinders
链接 可以根据反余弦和反正切算出角a和b的值, 然后向量旋转就可以了,图中的状态旋转rotate((2,0),a+b) 反状态把角度反过来,点取(-2,0)即可. 不知道是不是理解错了,题意写着两圆 ...
- Linux crontab 定时任务
http://linuxtools-rst.readthedocs.io/zh_CN/latest/tool/crontab.html 19. crontab 定时任务 通过crontab 命令,我们 ...
- (四)linux网络编程
一.CS架构,BS架构 (1)CS架构介绍(client server,客户端服务器架构),例如:qq.360网盘(2)BS架构介绍(broswer server,浏览器服务器架构)例如:浏览器 二. ...
- JavaScript设计模式-单例模式、模块模式(转载 学习中。。。。)
(转载地址:http://technicolor.iteye.com/blog/1409656) 之前在<JavaScript小特性-面向对象>里面介绍过JavaScript面向对象的特性 ...
- 深入理解JVM-3垃圾收集器与内存分配策略
在上面一篇文章中,介绍了java内存运行时区域,其中程序计数器.虚拟机栈.本地方法栈3个区域随线程生灭:栈中的栈帧随着方法的进入和退出而有条不紊的执行着进栈出栈的操作,每一个栈帧中分配着多少内存基本上 ...
- 2010 word 如何新建目录
首先插入一个bullet 填充内容,编好编号,选择文字,右键,然后选择相应的level,然后点击一级菜单reference, 然后点击table of contents, 选择某一个样式,然后插入成功 ...
- rsa加密--选择padding模式需要注意的问题。。。
最近在做一个项目中需要,在android对一个密码字段首先进行 一次md5加密后再进行一次rsa加密,然后把加密的结果通过 json协议传输给nginx服务器进行解密.在android中,可以直接 使 ...
- linux笔记:linux常用命令-链接命令
文件处理命令:ln(创建链接文件) ln -s 源文件 链接文件 需要源文件已经建立,执行链接文件就是执行源文件. 软链接文件的特点: 1.类似于windows中快捷方式的作用: 2.它的文件类型是 ...
- redis命令集合
一.连接控制 QUIT 关闭连接 AUTH (仅限启用时)简单的密码验证 二.适合全体类型的命令 EXISTS key 判断一个键是否存在;存在返回 1;否则返回0;DEL key 删除某个key,或 ...