基于etcd的分布式配置中心

etcd docs | etcd versus other key-value stores https://etcd.io/docs/v3.4.0/learning/why/

The name “etcd” originated from two ideas, the unix “/etc” folder and “d"istributed systems. The “/etc” folder is a place to store configuration data for a single system whereas etcd stores configuration information for large scale distributed systems. Hence, a “d"istributed “/etc” is “etcd”.

etcd is designed as a general substrate for large scale distributed systems. These are systems that will never tolerate split-brain operation and are willing to sacrifice availability to achieve this end. etcd stores metadata in a consistent and fault-tolerant way. An etcd cluster is meant to provide key-value storage with best of class stability, reliability, scalability and performance.

Distributed systems use etcd as a consistent key-value store for configuration management, service discovery, and coordinating distributed work. Many organizations use etcd to implement production systems such as container schedulers, service discovery services, and distributed data storage. Common distributed patterns using etcd include leader electiondistributed locks, and monitoring machine liveness.

Use cases

  • Container Linux by CoreOS: Applications running on Container Linux get automatic, zero-downtime Linux kernel updates. Container Linux uses locksmith to coordinate updates. Locksmith implements a distributed semaphore over etcd to ensure only a subset of a cluster is rebooting at any given time.
  • Kubernetes stores configuration data into etcd for service discovery and cluster management; etcd’s consistency is crucial for correctly scheduling and operating services. The Kubernetes API server persists cluster state into etcd. It uses etcd’s watch API to monitor the cluster and roll out critical configuration changes.

Comparison chart

Perhaps etcd already seems like a good fit, but as with all technological decisions, proceed with caution. Please note this documentation is written by the etcd team. Although the ideal is a disinterested comparison of technology and features, the authors’ expertise and biases obviously favor etcd. Use only as directed.

The table below is a handy quick reference for spotting the differences among etcd and its most popular alternatives at a glance. Further commentary and details for each column are in the sections following the table.

  etcd ZooKeeper Consul NewSQL (Cloud Spanner, CockroachDB, TiDB)
Concurrency Primitives Lock RPCsElection RPCscommand line lockscommand line electionsrecipes in go External curator recipes in Java Native lock API Rare, if any
Linearizable Reads Yes No Yes Sometimes
Multi-version Concurrency Control Yes No No Sometimes
Transactions Field compares, Read, Write Version checks, Write Field compare, Lock, Read, Write SQL-style
Change Notification Historical and current key intervals Current keys and directories Current keys and prefixes Triggers (sometimes)
User permissions Role based ACLs ACLs Varies (per-table GRANT, per-database roles)
HTTP/JSON API Yes No Yes Rarely
Membership Reconfiguration Yes >3.5.0 Yes Yes
Maximum reliable database size Several gigabytes Hundreds of megabytes (sometimes several gigabytes) Hundreds of MBs Terabytes+
Minimum read linearization latency Network RTT No read linearization RTT + fsync Clock barriers (atomic, NTP)

ZooKeeper

ZooKeeper solves the same problem as etcd: distributed system coordination and metadata storage. However, etcd has the luxury of hindsight taken from engineering and operational experience with ZooKeeper’s design and implementation. The lessons learned from Zookeeper certainly informed etcd’s design, helping it support large scale systems like Kubernetes. The improvements etcd made over Zookeeper include:

  • Dynamic cluster membership reconfiguration
  • Stable read/write under high load
  • A multi-version concurrency control data model
  • Reliable key monitoring which never silently drop events
  • Lease primitives decoupling connections from sessions
  • APIs for safe distributed shared locks

Furthermore, etcd supports a wide range of languages and frameworks out of the box. Whereas Zookeeper has its own custom Jute RPC protocol, which is totally unique to Zookeeper and limits its supported language bindings, etcd’s client protocol is built from gRPC, a popular RPC framework with language bindings for go, C++, Java, and more. Likewise, gRPC can be serialized into JSON over HTTP, so even general command line utilities like curl can talk to it. Since systems can select from a variety of choices, they are built on etcd with native tooling rather than around etcd with a single fixed set of technologies.

When considering features, support, and stability, new applications planning to use Zookeeper for a consistent key value store would do well to choose etcd instead.

Consul

Consul is an end-to-end service discovery framework. It provides built-in health checking, failure detection, and DNS services. In addition, Consul exposes a key value store with RESTful HTTP APIs. As it stands in Consul 1.0, the storage system does not scale as well as other systems like etcd or Zookeeper in key-value operations; systems requiring millions of keys will suffer from high latencies and memory pressure. The key value API is missing, most notably, multi-version keys, conditional transactions, and reliable streaming watches.

etcd and Consul solve different problems. If looking for a distributed consistent key value store, etcd is a better choice over Consul. If looking for end-to-end cluster service discovery, etcd will not have enough features; choose Kubernetes, Consul, or SmartStack.

NewSQL (Cloud Spanner, CockroachDB, TiDB)

Both etcd and NewSQL databases (e.g., CockroachTiDBGoogle Spanner) provide strong data consistency guarantees with high availability. However, the significantly different system design parameters lead to significantly different client APIs and performance characteristics.

NewSQL databases are meant to horizontally scale across data centers. These systems typically partition data across multiple consistent replication groups (shards), potentially distant, storing data sets on the order of terabytes and above. This sort of scaling makes them poor candidates for distributed coordination as they have long latencies from waiting on clocks and expect updates with mostly localized dependency graphs. The data is organized into tables, including SQL-style query facilities with richer semantics than etcd, but at the cost of additional complexity for processing, planning, and optimizing queries.

In short, choose etcd for storing metadata or coordinating distributed applications. If storing more than a few GB of data or if full SQL queries are needed, choose a NewSQL database.

Using etcd for metadata

etcd replicates all data within a single consistent replication group. For storing up to a few GB of data with consistent ordering, this is the most efficient approach. Each modification of cluster state, which may change multiple keys, is assigned a global unique ID, called a revision in etcd, from a monotonically increasing counter for reasoning over ordering. Since there’s only a single replication group, the modification request only needs to go through the raft protocol to commit. By limiting consensus to one replication group, etcd gets distributed consistency with a simple protocol while achieving low latency and high throughput.

The replication behind etcd cannot horizontally scale because it lacks data sharding. In contrast, NewSQL databases usually shard data across multiple consistent replication groups, storing data sets on the order of terabytes and above. However, to assign each modification a global unique and increasing ID, each request must go through an additional coordination protocol among replication groups. This extra coordination step may potentially conflict on the global ID, forcing ordered requests to retry. The result is a more complicated approach with typically worse performance than etcd for strict ordering.

If an application reasons primarily about metadata or metadata ordering, such as to coordinate processes, choose etcd. If the application needs a large data store spanning multiple data centers and does not heavily depend on strong global ordering properties, choose a NewSQL database.

Using etcd for distributed coordination

etcd has distributed coordination primitives such as event watches, leases, elections, and distributed shared locks out of the box. These primitives are both maintained and supported by the etcd developers; leaving these primitives to external libraries shirks the responsibility of developing foundational distributed software, essentially leaving the system incomplete. NewSQL databases usually expect these distributed coordination primitives to be authored by third parties. Likewise, ZooKeeper famously has a separate and independent library of coordination recipes. Consul, which provides a native locking API, goes so far as to apologize that it’s “ not a bulletproof method”.

In theory, it’s possible to build these primitives atop any storage systems providing strong consistency. However, the algorithms tend to be subtle; it is easy to develop a locking algorithm that appears to work, only to suddenly break due to thundering herd and timing skew. Furthermore, other primitives supported by etcd, such as transactional memory depend on etcd’s MVCC data model; simple strong consistency is not enough.

For distributed coordination, choosing etcd can help prevent operational headaches and save engineering effort.

https://mp.weixin.qq.com/s/86LN9l1hdviquFT8gwy0oA

etcd 与 Zookeeper、Consul 等其它 kv 组件的对比

原创 aoho aoho求索 2020-05-05

关于 etcd

本文的主角是 etcd。名称 “etcd” 源自两个想法,即 unix “/etc” 文件夹 和 “d” 分布式系统。“/etc” 文件夹是用于存储单个系统的配置数据的位置,而 etcd 用于存储大规模分布式的配置信息。因此,分配了 “d” 的 “/etc” 就是 “etcd”。

etcd 被设计为大型分布式系统的通用基板。这些大型系统需要避免脑裂,并且愿意牺牲可用性来实现此目的。etcd 以一致且容错的方式存储元数据。etcd 集群旨在提供具有稳定性、可靠性、可伸缩性和性能的键值存储。

分布式系统将 etcd 用作配置管理、服务发现和协调分布式工作的一致键值存储组件。许多组织在生产系统上使用 etcd,例如容器调度程序、服务发现服务和分布式数据存储。使用 etcd 的常见分布式模式包括领导者选举、分布式锁和监视机器活动状态等。

使用案例

  1. CoreOS 的 Container Linux:在 Container Linux 上运行的应用程序将获得零停机时间的 Linux 内核自动更新。Container Linux 使用锁来协调更新。Locksmith 在 etcd上 实现了一个分布式信号量,以确保在任何给定时间仅集群的一个子集正在重启。

  2. Kubernetes 将配置数据存储到 etcd 中以进行服务发现和集群管理;etcd的一致性对于容器的编排至关重要。Kubernetes API 服务器将群集状态持久保存到 etcd 中。它使用 etcd 的 watch API 监视集群并回滚关键的配置更改。

多维度对比

也许 etcd 已经看起来很合适,但是与所有技术选型一样,我们需要谨慎进行。尽管理想的情况是对技术和功能进行客观的比较,但是作者的专业知识和偏见显然倾向于etcd(实验和文档由etcd的作者编写)。

下表是一目了然的快速参考,可发现 etcd 及其最受欢迎的替代方案之间的差异。表格后面的各节中提供了每列的进一步说明和详细信息。

与 ZooKeeper

ZooKeeper 解决了与 etcd 相同的问题:分布式系统协调和元数据存储。但是, etcd 踩在前人的肩膀上,其参考了 ZooKeeper 的设计和实现经验。从 Zookeeper 汲取的经验教训无疑为 etcd 的设计提供了支撑,从而帮助其支持 Kubernetes 等大型系统。对 Zookeeper 进行的 etcd 改进包括:

  • 动态重新配置集群成员

  • 高负载下稳定的读写

  • 多版本并发控制数据模型

  • 可靠的键值监控

  • 租期原语将 session 中的连接解耦

  • 用于分布式共享锁的 API

此外,etcd 开箱即用地支持多种语言和框架。Zookeeper 拥有自己的自定义Jute RPC 协议,该协议对于 Zookeeper 而言是完全唯一的,并限制了其受支持的语言绑定,而 etcd 的客户端协议则是基于 gRPC 构建的,gRP 是一种流行的 RPC 框架,具有 go,C ++,Java 等语言支持。同样,gRPC 可以通过 HTTP 序列化为 JSON,因此即使是通用命令行使用程序(例如curl)也可以与之通信。由于系统可以从多种选择中进行选择,因此它们是基于具有本机工具的 etcd 构建的,而不是基于一组固定的技术围绕 etcd 构建的。

在考虑功能,支持和稳定性时,etcd 相比于 Zookeeper,更加适合用作一致性的键值存储的组件。

Consul

Consul 是一个端到端的服务发现框架。它提供内置的运行状况检查,故障检测和 DNS 服务。此外,Consul 还使用 RESTful HTTP API 公开了密钥值存储。在 Consul 1.0 中,存储系统在键值操作中无法像 etcd 或 Zookeeper 等其他组件那样扩展。数百万个键的系统将遭受高延迟和内存压力。Consul 最明显的是缺少多版本键,条件事务和可靠的流监视。

etcd 和 Consul 解决了不同的问题。如果要寻找分布式一致键值存储,那么与 Consul 相比,etcd是更好的选择。如果正在寻找端到端的集群服务发现,etcd 将没有足够的功能。可以选择 Kubernetes,Consul或 SmartStack。

NewSQL(Cloud Spanner, CockroachDB, TiDB)

etcd 和 NewSQL 数据库(例如Cockroach,TiDB,Google Spanner)都提供了具有高可用性的强大数据一致性保证。但是,不同的系统设计思路导致显著不同的客户端 API 和性能特征。

NewSQL 数据库旨在跨数据中心水平扩展。这些系统通常跨多个一致的复制组(分片)对数据进行分区,这些复制组可能相距很远,并以 TB 或更高级别存储数据集。这种缩放比例使它们成为分布式协调的较差候选者,因为它们需要很长的等待时间,并且期望使用大多数本地化的依赖拓扑进行更新。NewSQL 数据被组织成表格,包括具有比 etcd 更为丰富的语义的 SQL 样式的查询工具,但是以处理和优化查询的额外复杂性为代价。

简而言之,选择 etcd 来存储元数据或协调分布式应用程序。如果存储的数据超过数 GB,或者需要完整的 SQL 查询,请选择 NewSQL 数据库。

使用 etcd 存储元配置数据

etcd 在单个复制组中复制所有数据。对于以一致的顺序存储多达几 GB 的数据,这是最有效的方法。集群状态的每次修改(可能会更改多个键)都从一个单调递增的计数器中分配了一个全局唯一 ID(在etcd中称为修订版),以进行排序。由于只有一个复制组,因此修改请求只需通过 raft 协议提交。通过将共识限制在一个复制组中,etcd 使用简单的协议即可获得分布式一致性,同时实现低延迟和高吞吐量。

etcd 后面的复制无法水平扩展,因为它缺少数据分片。相反,NewSQL 数据库通常在多个一致的复制组之间分片数据,存储数据集的级别为 TB 或更高。但是,要为每个修改分配一个全局唯一且递增的 ID,每个请求必须通过复制组之间的附加协调协议。这个额外的协调步骤可能会在全局 ID 上发生冲突,从而强制有序的请求重试。结果是,对于严格的一致性,NewSQL 方法的性能通常比 etcd 更复杂。

如果应用程序主要是出于元数据或元数据排序的原因(例如协调流程),请选择etcd。如果应用程序需要跨多个数据中心的大型数据存储,并且在很大程度上不依赖于强大的全局排序属性,请选择 NewSQL 数据库。

使用 etcd 作为分布式协调组件

etcd 具有分布式协调原语,例如事件监视,租约,选举和开箱即用的分布式锁。这些原语由 etcd 开发人员维护和支持;将这些功能留给承担了开发基础分布式软件的外部库,实质上使系统不完整。NewSQL 数据库通常期望这些分布式协调原语由第三方编写。同样,ZooKeeper 有一个独立的协调库。提供本地锁 API 的 Consul 甚至对 “不是防弹方法” 深表歉意(1个client释放锁之后,其它client无法立刻获得锁,这可能是由于lock-delay设置引起的。)。

从理论上讲,可以在提供强一致性的任何存储系统上构建这些原语。但是,算法往往很微妙。很容易开发出一种看起来有效的锁定算法,但是由于边界和时序偏差而中断。此外,etcd 支持的其他原语(例如事务性存储器)取决于 etcd 的 MVCC 数据模型;简单的强一致性是不够的。

对于分布式协调,选择 etcd 可以帮助避免操作上的麻烦并减少工作量。

etcd 与 Zookeeper、Consul 等其它 kv 组件的对比的更多相关文章

  1. 使用Python进行分布式系统协调 (ZooKeeper/Consul/etcd)

    来源:naughty 链接:my.oschina.net/taogang/blog/410864 笔者之前的博文提到过,随着大数据时代的到来,分布式是解决大数据问题的一个主要手段,随着越来越多的分布式 ...

  2. 探索etcd,Zookeeper和Consul一致键值数据存储的性能

    这篇博文是探索三个分布式.一致性键值数据存储软件性能的系列文章中的第一篇:etcd.Zookeeper和Consul,由etcd团队所写,可以让我们全面地了解如何评估三个分布式一致存储软件的性能.翻译 ...

  3. Eureka&Zookeeper&Consul 原理与对比

    CAP 定理CAP定理:CAP定理又称CAP原则,指的是在一个分布式系统中,一致性(Consistency).可用性(Availability).分区容错性(Partition tolerance). ...

  4. MMKV 多进程K-V组件 MD

    Markdown版本笔记 我的GitHub首页 我的博客 我的微信 我的邮箱 MyAndroidBlogs baiqiantao baiqiantao bqt20094 baiqiantao@sina ...

  5. SpringBoot系列: 使用 consul 作为服务注册组件

    本文基本上摘自纯洁的微笑的博客 http://www.ityouknow.com/springcloud/2018/07/20/spring-cloud-consul.html . 感谢作者的付出. ...

  6. Etcd和ZooKeeper,究竟谁在watch的功能表现更好?

    ZooKeeper和Etcd的主要异同可以参考这篇文章,此外,Etcd的官网上也有对比表格(https://coreos.com/etcd/docs/latest/learning/why.html) ...

  7. .NET Core 3.0之创建基于Consul的Configuration扩展组件

    写在前面 经过前面三篇关于.NET Core Configuration的文章之后,本篇文章主要讨论如何扩展一个Configuration组件出来.如果前面三篇文章没有看到,可以点击如下地址访问 .N ...

  8. .net(C#)访问Oracle数据库的几种免安装组件的对比

    Oracle 数据存取组件(ODAC) 库为Borland Delphi,C++ Builder 以及 Kylix提供了一些非可视化的组件.它们用来存取Oracle关系数据库系统.与BDE类似, OD ...

  9. zookeeper 的多线程和单线程库使用对比

    zookeeper提供了两个库,zookeeper_st和 zookeeper_mt. 前者是单线程库,仅仅提供了异步API和集成在应用程序实现循环中的回调函数,这个库是为了支持pthread库不支持 ...

随机推荐

  1. 【Windows系统常用命令集合】

    查看建立的TCP连接:netstat -n 查看建立的TCP连接的进程:netstat -nb 查看本机侦听的端口: netstat -an (说明:如果端口没有侦听 检查服务) 测试到远程计算机的某 ...

  2. 【进程/作业管理】篇章一:Linux进程及管理(专用内存监控类工具)------【vmstat、pmap】

    主要讲解专用内存监控工具的使用:vmstat.pmap命令的使用. 命令概览: vmstat 显示虚拟内存状态 pmap 报告进程与内存映射关系 vmstat命令是最常见的Linux/Unix监控工具 ...

  3. Java学习日报10.1

    学习内容一 ********************************** 代码 **********************************public class EnumTest ...

  4. Redis的内存淘汰

    Redis占用内存大小 我们知道Redis是基于内存的key-value数据库,因为系统的内存大小有限,所以我们在使用Redis的时候可以配置Redis能使用的最大的内存大小. 1.通过配置文件配置 ...

  5. js 中的 DOM 和 BOM

    BOM浏览器对象模型   概念:Browser Object Model   组成:   Window:浏览器窗口对象   Navigator:浏览器对象   screen:显示器屏幕对象   His ...

  6. Android——几种数据存储应用浅谈

    (1)android中的数据存储主要有五种方式: 第一种.sharedPreferences存储数据, 适用范围:保存少量的数据,且这些数据的格式非常简单:字符串型.基本类型的值.比如应用程序的各种配 ...

  7. volatile 关键字精讲

    1.错误案例 通过一个案例引出volatile关键字,例如以下代码示例 : 此时没有加volatile关键字两个线程间的通讯就会有问题 public class ThreadsShare { priv ...

  8. linux系统重启网卡后网络不通(NetworkManager篇)

    一.故障现象 RHEL7.6系统,使用nmcli绑定双网卡后,再使用以下命令重启network服务后主机网络异常,导致无法通过ssh远程登录系统.      # systemctl restart n ...

  9. 单细胞分析实录(8): 展示marker基因的4种图形(一)

    今天的内容讲讲单细胞文章中经常出现的展示细胞marker的图:tsne/umap图.热图.堆叠小提琴图.气泡图,每个图我都会用两种方法绘制. 使用的数据来自文献:Single-cell transcr ...

  10. Liunx运维(十)-网络管理命令

    文档目录: 一.ifconfig:配置或显示网络接口信息 二.ifup:激活网络接口 三.ifdown:禁用网络接口 四.route:显示或管理理由表 五.arp:管理系统的arp缓存 六.ip:网络 ...