Amazon’s Dynamo [9] and Facebook’s Cassandra [13], relax the consistency model,and offer only eventual consistency.

Others such as HBase [1] and BigTable [4] offer strong consistency only for operations touching a single partition, but not across the database as a whole.

Dynamo和Cassandra都是基于Vector Clocks和Gossip算法,强调高可用性,可以达到eventual consistency

而HBase,可以提供单分区的强一致性,但对于跨分区,或跨数据库,就无法保证;

for example, in a geo-replicated service, a user may first be routed to a local datacenter and perform some action such as sending an email message.
If they then reload their browser and are routed to another datacenter backend, they would still expect to see their sent message in their email outbox.

这个例子比较典型,要达到异地的一致性

 

一般要解决分布式问题的一致性问题的最简单的办法就是用synchronized clocks across all servers,但这基本是达不到的

the traditional approach to global consistency has been to use logical clocks.
Logical clocks, such as Lamport Clocks [15] or Vector Clocks [10, 20].

所以传统的做法就是用逻辑时钟

https://www.zhihu.com/question/19994133

贴个回答,写的不错

当然可以,但是不适合在分布式数据库里面。为什么呢? 首先,需要先确定你这个时间戳取自哪里。1、所有的时间戳都取自一个中心点节点,这样让时间戳得到一个全序关系,这是最简单的实现,如果你的分布式数据库只是部署在一个数据中心的话,但是这样就增加了一个中心节点,影响数据库的扩展性,而且任何一个取时间戳的动作都要增加上网络延迟的开销。如果分布式数据库部署需要跨数据中心,这种方案变的不可行,跨数据中心网络时延太大。从而影响整体性能2、所有的时间戳取自已所在的节点。这就涉及到时钟同步的问题,就像Dongdong说的,如果取的物理时间那么这些节点并没有一个一致的时间,总会有一点误差,这种解决方案就有两种思路:
一、就是现在以Leslie Lamport 提出的逻辑时钟的方法,重新定义了一种分布式的全序关系。vector clock就是以逻辑时钟为基础上发展而来的。
二、使用物理时钟,就是google spanner使用gps 加原子钟进行时间同步,不同节点之间的时间是不能精确同步的,只能把不同节点之间的时间同步到一个范围之内,spanner一个节点上获得一个时间戳,不是一个时间值,而是一个时间值加上一个误差范围,也就是一个时间窗,如果两个时间窗有重合,就无法比较大小,从而无法比较出来时间窗代表的事件的先后关系。而时间同步算法就是让分布式各节点之间的时间的误差尽量的小,这样取时间戳这个动作效率是最高的,只在本节点取一下物理时间即可,但是为了同步时间,各个节点肯定在周期与其它节点进行通讯来同步物理时间(google时间同步的算法好像没有开源)。时间同步算法只能让时间尽量的精确。分布式数据库里面是不能直接使用NTP来同步时间的,因为NTP时间同步协议里面允许节点的时间能够回退,而数据库里面要求本地的时钟必须是递增的。
三、还有一种混合逻辑时钟(Logical Physical Clocks),也是由逻辑时钟演变而来,但是时钟的值比较接近于物理时钟,而且不依赖于下面物理时钟的同步的算法,它的时间戳都可以在本地取,而且取出来是一个时间值,而不像spanner一样,取出来是一个时间窗。详细可以参见Logical Physical Clocks and Consistent Snapshots in Globally Distributed Databases这篇论文

关于Lamport或向量时钟,

Lamport时钟就是逻辑时钟,即不依赖于物理时钟,而依赖event之间的因果关系来定义的一种偏序

参考,全序, 分布式一致性的本质

而Vector时钟是,逻辑时钟的一种实现,具体的逻辑看下文,

Why Vector Clock are Easy or Hard?

摘自上面的知乎回答

“dynamo paper是七、八年前写的了,现在的amazon dynamo早已摒弃了version vector,而采用了synchronous replication(类似paxos的protocol),每一个partition会有一个leader负责写入。其实version vector并不scale,因为对于一个key来说,随着writer数量的增加,version vector数量成指数级的增长”

逻辑时钟的缺点,

first, they require that all clients must propagate clock data to achieve consistent views,
and second, the assigned timestamps have no relation to physical time.

 

Spanner introduced commit-wait, a way of ensuring physical-time based consistent global state by forcing operations to wait long enough so that all participants agree that the operation’s timestamp has passed based on worst case synchronization error bounds.

While innovative, the system performance becomes highly dependent on the quality of the time synchronization infrastructure, and thus may have unacceptable performance absent specialized hardware such as atomic clocks and GPS receivers.

Spanner利用原子钟和GPS接收器,实现了一个较为精确的时钟,这个时钟叫做TrueTime,每次调用TrueTime API返回的是一个时间区间,而不是一个具体的值,这个TrueTime保证的是真实时间(absolute time/real time)一定在这个区间内,这个区间范围通常大约14ms,甚至更小。

所以只有每个transaction的时间区间,都不重合,那么就可以比较容易的做到全局有序;

所以这里需要commit-wait,

可以看到,我在开始transaction的时候,取一次true time,[t1.earliest, t1.latest]

那么什么时候,我可以把这个transaction commit掉,即释放掉锁,即别人可以开始新的transaction

当我再取一次true time,[t2.earliest, t2.latest],当t2.earliest > t1.latest的时候,即可

因为这样两个transaction的时间区间就不可能重叠了,当然代价就是每个transaction都有等待大概2个error bound的时间

当然,spanner的实现依赖硬件的infra设备,普通用户无法复制;

 

HybridTime(HT), a hybrid between physical and logical clocks and we show how HT can be used to achieve the same semantics as Spanner, but with good performance even with commonly available time synchronization.

Like Spanner, our approach relies on physical time measurements with bounded error to assign HybridTime timestamps to events that occur in the system.
However, unlike Spanner, our approach does not usually require the error to be waited out, thus allowing for usage in common deployment scenarios where clocks are synchronized through common protocols such as the Network Time Protocol, in which clock synchronization error is often higher than with Spanner’s TrueTime.

The trade-off is that, in order to avoid commit-wait, HybridTime requires that timestamps be propagated across machines to achieve the same consistency semantics as Spanner.
Contrary to vector clocks, which can expand as the number of participants in the cluster grows, HybridTime timestamps have constant and small size.

HybridTime说是physical and logical clocks的结合,和spanner一样,也可以使用带error bound的物理时间
但由于他没有spanner的硬件设施,所以做不到低延迟的时间同步,所以他依然使用逻辑时钟来达到一致性;并且他解决了vector clocks随着client的数量增加而会size膨胀的问题

HybridTime clocks follow similar update rules to Lamport clocks, but the time values are not purely logical:
each time value has both a logical component, which helps in guaranteeing the same properties as a Lamport Clocks, and a physical component which allows the event to be associated with a physical point-in-time.

 

One of Spanner’s key benefits is that is externally consistent, which is defined as fully linearizable, even in the presence of hidden channels.

Additionally we use the term globally consistent to describe a system which provides the same linearizability semantics, provided that there are no hidden channels present.

区分为,是否有hidden channels?意思是clients间存在因果或先后关系,比如通过发message,都是这个因果关系对于系统是不可知的,比如A write,然后发送消息给B,然后B write,那么因果关系上,B write一定后于A write,但是对于数据库而言他并不知道,所以并不能保证A write先写入;

spanner是可以保证externally consistent, 由于他通过 commit wait,来保证每个transaction之间一定是全局有序的

相对的globally consistent,更简单些,因为并没有hidden channels

 

Spanner’s key innovation is that timestamps assigned by the system can be used to achieve external consistency, but also have physical meaning.

HybridTime is always globally consistent, and through selective application of commit-wait is externally consistent.

Spanner的主要创新在于实现external一致性的,同时还保留了物理时间的

而HybridTime,默认情况下是可以实现globally consistent,即偏序,因为他本身就是使用lamport时钟,并且当你选择commit-wait时,也可以保证externally consistent;

 

HybridTime Assumptions

1. HybridTime assumes that machines have a reasonably accurate physical clock, represented by the PCi(e) function, which outputs the numeric timestamp returned by the physical clock as read by process i for event e, that is able to provide absolute time measurements (usually in milli- or microseconds since 1 January 1970).

2. keeps the physical clocks across different servers synchronized with regard to a reference server, the “reference” time, represented by the PCref(e) function which outputs the numeric timestamp returned by the “reference” process for event e;

Additionally, we assume that such a substrate is able to provide an error bound along with each time measurement, denoted by the Ei(e) function, which outputs the numeric value ε error of process i at the time e occurred

3. we make no assumptions on the actual accuracy of the clocks, i.e. the physical timestamps returned by server’s clocks may have an arbitrarily large but finite error, as long as this error’s bound is known

说白了,假设

有个相对靠谱的物理时钟,一个理想的参照时钟,以及他们之间相差的,error bound

最后,我们并不假设这个error bound会很小,只要有限就可以

假设1,我们有有限的error bound

 

假设2, 进程级别的物理时钟单调递增,注意是进程级别

 

基于以上假设,提出HybridTime Clock and Protocol

 

HybridTime Clock and Protocol

HybridTime clock(HTC) is a pair (physical,logical) where the first component is a representation of the physical time at which the event occurred and the second component is a logical sequence number.

定义其实显而易见。。。

Algorithm 2 depicts the HTC algorithm.

HTC算法如上,两部分,NOW和UPDATE

now就是取当前的HybridTime clock

upate就是根据in 来更新当前的clock

Algorithm 2 implements a Lamport Clock, with the additional advantage that generated timestamps have physical meaning and are accurate representations of physical time within a bound error.

算法本身,其实就是Lamport Clock,只是增加了physical clock的部分

给出一个例子,

 

To order the events timestamped using the HybridTime Clock algorithm we use Definition 1.

Definition 1. HCT(e) < HCT( f ) is defined as the lexicographical ordering of the timestamp two-tuple (physical,logical)

Theorem 1. The HybriTime clock happened-before relation forms a total order of events

Theorem 2. For any event in a causal chain f , the physical component of a HTC timestamp approximates the “real” time the event occurred, with a error defined and bounded by

 

Implementation

No Consistency - In this mode there are no external consistency guarantees, transactions are assigned timestamps from each server’s physical clock and no guarantee is made that reads are consistent or repeatable.

直接用local的物理时间,不保证一致性

HybridTime Consistency - In this mode our implementation guarantees the global consistency as Spanner, absent hidden channels, but using HybridTime instead of commit-wait.
Clients choosing this consistency mode on writes must make sure that the timestamp that is received from the server is propagated to other servers and/or clients.
Within the same client process, timestamps are automatically propagated on behalf of the user.

这个其实和逻辑时钟,没有区别,就是保证偏序

Commit-wait Consistency - In this mode our implementation guarantees the same external consistency semantics as Spanner by also using commit-wait in the way described in the original paper.
However instead of using TrueTime, which is a proprietary and private API, we implemented commit-wait on top of the widely used Network Time Protocol (NTP). Hence, in this consistency mode we support hidden channels.

这个估计没啥用,在没有spanner硬件保证的情况下,commit-wait,性能不能忍吧

HybridTime - Accessible Global Consistency with High Clock Uncertainty的更多相关文章

  1. Clock Skew , Clock Uncertainty和 Period

    本文将介绍FPGA中和时钟有关的相关概念,阅读本文前需要对时序收敛的基本概念和建立.保持关系有一定了解,这些内容可以在时序收敛:基本概念,建立时间和保持时间(setup time 和 hold tim ...

  2. kudu

    Kudu White Paper http://www.cloudera.com/documentation/betas/kudu/0-5-0/topics/kudu_resources.html h ...

  3. Spanner: Google’s Globally-Distributed Database

    https://research.google.com/archive/spanner.html Spanner is Google’s scalable, multi-version, global ...

  4. FPGA Timing笔记

    很多FPGA工程师都会遇到timing的问题,如何让FPGA跑到更快的处理频率是永久话题.决定FPGA的timing关键是什么?如何才能跑到更快的频率呢? A. 第一步需要了解FPGA的timing路 ...

  5. A brief introduction to weakly supervised learning(简要介绍弱监督学习)

    by 南大周志华 摘要 监督学习技术通过学习大量训练数据来构建预测模型,其中每个训练样本都有其对应的真值输出.尽管现有的技术已经取得了巨大的成功,但值得注意的是,由于数据标注过程的高成本,很多任务很难 ...

  6. 推荐 的FPGA设计经验(2)-时钟策略优化

    Optimizing Clocking Schemes Avoid using internally generated clocks (other than PLLs) wherever possi ...

  7. 9.quartus_warning_altera_reserved_tck

    编译的时候没有注意,整个工程都可以在板子上跑起来.但是做Powerplay的时候,出现了这个Critical Warning:. Critical Warning: The following clo ...

  8. SLAM: SLAM的发展历程(WIKI)

    参考维基百科: https://en.wikipedia.org/wiki/Simultaneous_localization_and_mapping 你们叫他SLAM,我还是习惯叫他三维重建.... ...

  9. Timequest GUI

    Tasks界面 使用Tasks界面可以访问常用命令,例如生成网表建立报告等. 两个常用命令位于Tasks界面中:打开工程和编写SDC文件.其他命令在下面的文件夹中: Netlist Setup Rep ...

随机推荐

  1. 电赛总结(二)——AD芯片总结之AD7705

    1.特性参数 (1)16位无丢失代码性AD转化器 (2)只需要3.3V供电(即只需要单片机即可供电) (3)双通道差分输入 (4)基准电压为2.5V时,在单极性信号下,输入范围是0到2.5V,在双极性 ...

  2. JVM的粗略简述

    什么是Java虚拟机 虚拟机是一种抽象化的计算机,通过在实际的计算机上仿真模拟各种计算机功能来实现的.Java虚拟机有自己完善的硬体架构,如处理器.堆栈.寄存器等,还具有相应的指令系统.JVM屏蔽了与 ...

  3. SU Demos-04Deconvolution-02Wiener_Levinson

    本Demo共有4个脚本,我只看了第一个,后面的3个不熟悉,就不往这里贴图了,望谅解. 不足之处,欢迎批评指正. 先看readme 生成数据所需的c程序 make上述c程序时需要的makefile(本人 ...

  4. TYVJ P1069 cowtour 看不懂题意

    描述 农民John的农场里有很多牧区.有的路径连接一些特定的牧区.一片所有连通的牧区称为一个牧场.但是就目前而言,你能看到至少有两个牧区通过任何路径都不连通.这样,农民John就有多个牧场了. Joh ...

  5. TYVJ P1075 硬币游戏 Label:dp

    背景 农民John的牛喜欢玩硬币,所以John就为它们发明了一个新的两人硬币游戏,叫做Xoinc. 描述 最初地面上有一堆n个硬币(5<=n<=2000),从上面数第i个硬币的价值为C_i ...

  6. xfce4桌面自动整理脚本

    转自http://blog.chinaunix.net/uid-22101074-id-91073.html (有修改) xfce4桌面没有自动排列的功能,如果文件多了,超出了桌面的范围,就不好找了, ...

  7. POJ 1925 Spiderman(DP)

    题目链接 这个破题,好不容易思路清楚了,写的就是过不了..关键部分直接抄的别人的...终于A了,自己写的判断什么的,就是有一组数据过不了. #include <cstdio> #inclu ...

  8. 李洪强-C语言6-控制结构

    C语言流程控制 一.流程控制结构 (1)顺序结构:按书写顺序执行每一条语句. (2)选择结构:对给定的条件进行判断,根据判断结果决定执行哪一段代码. (3)循环结构:在给定条件成立的情况下,反复执行某 ...

  9. hdu A计划

    这道题是一道bfs的题目,因为题目中给的数据很小,所以可以采用优先队列的方式来简化处理.这道题在搜索的过程中要注意map1的不同层次的转换,即对'#'的理解.之前wa了两次是因为我考虑了如果上下两层对 ...

  10. javascript判断非空

    /* *判断非空 * */ function isEmpty(val){ if(val == null)return true; if(val == undefined || val == 'unde ...