Zookeeper原理与Curator使用
近期打算实现一个基于Zookeeper的分布式的集群状态一致性控制, 对Zookeeper的原理不太了解, 正好学习一下, 网上找到了几篇文章, 先贴在这边, 等我熟读官方文档后, 再来补充自己的见解
-----------------------------我是分割线-------------------------------------
最近基于Zk实现了一套公司风控系统的规则管理和集群管理, 对zk和curator有了更加深入的认识, 下面就踩过的坑记录下
1. curator 有两套监听机制, 一个是封装了zk自身的watcher, 一个是自己的listener, 坑来了:
a.listener 只能监听相同thread的client事件, 跨thread或者跨process则不行, 操作必须使用inbackground()模式才能触发listener
b.watcher 封装了zk原本的watcher 可以跨进程使用, 但是注意, 无法在 inbackground的情况下触发watcher
2. zk watcher 定义了4种事件
public enum EventType {
None (-1),
NodeCreated (1),
NodeDeleted (2),
NodeDataChanged (3),
NodeChildrenChanged (4);
}
坑来了
怎样才能得到自己想要的事件?
a. 想监听 NodeCreated, NodeDeleted, NodeDataChanged 可以使用 checkExist 或者 getData, 推荐使用checkExist, 因为getData 如果结点未创建则报错
b. 想监听 NodeChildrenChanged 只能使用 getChildren, 但是注意不能监听嵌套内层的子节点, 如 /test/1 不能获得 /test/1/2/3 的变动 , 可以获得 /test/1/2 的变动, 而且每次变动的path 永远都是你监听的那个path, 不要妄想用它来获得子节点的path
这里有篇文章不错, http://blog.csdn.net/lzx1104/article/details/6968802
http://liuqunying.blog.51cto.com/3984207/1407455
http://www.ibm.com/developerworks/cn/opensource/os-cn-zookeeper/
https://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_advancedConfiguration
https://github.com/Netflix/curator
The ZooKeeper Data Model
ZooKeeper has a hierarchal name space, much like a distributed file system. The only difference is that each node in the namespace can have data associated with it as well as children. It is like having a file system that allows a file to also be a directory. Paths to nodes are always expressed as canonical, absolute, slash-separated paths; there are no relative reference. Any unicode character can be used in a path subject to the following constraints:
Zk的node可以含有数据也可含有子节点, 路径不支持的unicode如下
The null character (\u0000) cannot be part of a path name. (This causes problems with the C binding.)
The following characters can't be used because they don't display well, or render in confusing ways: \u0001 - \u001F and \u007F - \u009F.
The following characters are not allowed: \ud800 - uF8FF, \uFFF0 - uFFFF.
The "." character can be used as part of another name, but "." and ".." cannot alone be used to indicate a node along a path, because ZooKeeper doesn't use relative paths. The following would be invalid: "/a/b/./c" or "/a/b/../c".
The token "zookeeper" is reserved.
ZNodes
Every node in a ZooKeeper tree is referred to as a znode. Znodes maintain a stat structure that includes version numbers for data changes, acl changes. The stat structure also has timestamps. The version number, together with the timestamp, allows ZooKeeper to validate the cache and to coordinate updates. Each time a znode's data changes, the version number increases. For instance, whenever a client retrieves data, it also receives the version of the data. And when a client performs an update or a delete, it must supply the version of the data of the znode it is changing. If the version it supplies doesn't match the actual version of the data, the update will fail. (This behavior can be overridden. For more information see... )[tbd...]
每一个Zk节点树上的节点被认为是一个Znode, Znode维护了一个状态结构, 包含数据版本号(data, acl--Access Control List), timestamp. versionNumber和Timestamp 相结合, 来验证Zk的Cache, 并在更新时保证数据一致性.
每当Znode的数据发生变化, versionNumber自增, 比如, 每当一个client获得数据, 它同时会获得数据的版本号, 当client尝试去更新或删除, 它必须提供版本号. 如果提供的版本号和Zk的不一致, 更新将会失败. [类似于数据库的乐观锁]
Note
In distributed application engineering, the word node can refer to a generic host machine, a server, a member of an ensemble, a client process, etc. In the ZooKeeper documentation, znodes refer to the data nodes. Servers refer to machines that make up the ZooKeeper service; quorum peers refer to the servers that make up an ensemble; client refers to any host or process which uses a ZooKeeper service.
Znodes are the main enitity that a programmer access. They have several characteristics that are worth mentioning here.
Watches
Clients can set watches on znodes. Changes to that znode trigger the watch and then clear the watch. When a watch triggers, ZooKeeper sends the client a notification. More information about watches can be found in the section ZooKeeper Watches.
Data Access
The data stored at each znode in a namespace is read and written atomically. Reads get all the data bytes associated with a znode and a write replaces all the data. Each node has an Access Control List (ACL) that restricts who can do what.
ZooKeeper was not designed to be a general database or large object store. Instead, it manages coordination data. This data can come in the form of configuration, status information, rendezvous, etc. A common property of the various forms of coordination data is that they are relatively small: measured in kilobytes. The ZooKeeper client and the server implementations have sanity checks to ensure that znodes have less than 1M of data, but the data should be much less than that on average. Operating on relatively large data sizes will cause some operations to take much more time than others and will affect the latencies of some operations because of the extra time needed to move more data over the network and onto storage media. If large data storage is needed, the usually pattern of dealing with such data is to store it on a bulk storage system, such as NFS or HDFS, and store pointers to the storage locations in ZooKeeper.
Ephemeral Nodes
ZooKeeper also has the notion of ephemeral nodes. These znodes exists as long as the session that created the znode is active. When the session ends the znode is deleted. Because of this behavior ephemeral znodes are not allowed to have children.
Sequence Nodes -- Unique Naming
When creating a znode you can also request that ZooKeeper append a monotonically increasing counter to the end of path. This counter is unique to the parent znode. The counter has a format of %010d -- that is 10 digits with 0 (zero) padding (the counter is formatted in this way to simplify sorting), i.e. "<path>0000000001". See Queue Recipe for an example use of this feature. Note: the counter used to store the next sequence number is a signed int (4bytes) maintained by the parent node, the counter will overflow when incremented beyond 2147483647 (resulting in a name "<path>-2147483647").
Time in ZooKeeper
ZooKeeper tracks time multiple ways:
Zxid
Every change to the ZooKeeper state receives a stamp in the form of a zxid (ZooKeeper Transaction Id). This exposes the total ordering of all changes to ZooKeeper. Each change will have a unique zxid and if zxid1 is smaller than zxid2 then zxid1 happened before zxid2.
Version numbers
Every change to a node will cause an increase to one of the version numbers of that node. The three version numbers are version (number of changes to the data of a znode), cversion (number of changes to the children of a znode), and aversion (number of changes to the ACL of a znode).
Ticks
When using multi-server ZooKeeper, servers use ticks to define timing of events such as status uploads, session timeouts, connection timeouts between peers, etc. The tick time is only indirectly exposed through the minimum session timeout (2 times the tick time); if a client requests a session timeout less than the minimum session timeout, the server will tell the client that the session timeout is actually the minimum session timeout.
Real time
ZooKeeper doesn't use real time, or clock time, at all except to put timestamps into the stat structure on znode creation and znode modification.
ZooKeeper Stat Structure
The Stat structure for each znode in ZooKeeper is made up of the following fields:
czxid--创建id
The zxid of the change that caused this znode to be created.
mzxid--更新id
The zxid of the change that last modified this znode.
ctime--创建时间
The time in milliseconds from epoch when this znode was created.
mtime--更新时间
The time in milliseconds from epoch when this znode was last modified.
version
The number of changes to the data of this znode.
cversion
The number of changes to the children of this znode.
aversion
The number of changes to the ACL of this znode.
ephemeralOwner
The session id of the owner of this znode if the znode is an ephemeral node. If it is not an ephemeral node, it will be zero.
dataLength
The length of the data field of this znode.
numChildren
The number of children of this znode.
ZooKeeper Watches
All of the read operations in ZooKeeper - getData(), getChildren(), and exists() - have the option of setting a watch as a side effect. Here is ZooKeeper's definition of a watch: a watch event is one-time trigger, sent to the client that set the watch, which occurs when the data for which the watch was set changes. There are three key points to consider in this definition of a watch:
所有的zookeeper读操作, 都可以设置一个watch getData(), getChildren(), exists(), zookeeper的定义如下: 一个watch event是一个一次性trigger, 被发送到设置它的client, 当数据变化时, 对应的watch起效.
One-time trigger
One watch event will be sent to the client when the data has changed. For example, if a client does a getData("/znode1", true) and later the data for /znode1 is changed or deleted, the client will get a watch event for /znode1. If /znode1 changes again, no watch event will be sent unless the client has done another read that sets a new watch.
一个watch event将被发送到client当data变化, 例如, 一个client调用getData("/znode1", true), 当/znode1的数据发生变化, 如果znode再次发生变化, 将不会有event发送, 除非client再次获取数据并设置新的watch
Sent to the client
This implies that an event is on the way to the client, but may not reach the client before the successful return code to the change operation reaches the client that initiated the change. Watches are sent asynchronously to watchers. ZooKeeper provides an ordering guarantee: a client will never see a change for which it has set a watch until it first sees the watch event. Network delays or other factors may cause different clients to see watches and return codes from updates at different times. The key point is that everything seen by the different clients will have a consistent order.
zk会保证watch event的顺序, 防止网络延迟或其他原因导致的异步时序问题
The data for which the watch was set
This refers to the different ways a node can change. It helps to think of ZooKeeper as maintaining two lists of watches: data watches and child watches. getData() and exists() set data watches. getChildren() sets child watches. Alternatively, it may help to think of watches being set according to the kind of data returned. getData() and exists() return information about the data of the node, whereas getChildren() returns a list of children. Thus, setData() will trigger data watches for the znode being set (assuming the set is successful). A successful create() will trigger a data watch for the znode being created and a child watch for the parent znode. A successful delete() will trigger both a data watch and a child watch (since there can be no more children) for a znode being deleted as well as a child watch for the parent znode.
zk维持两个watch list, data和child的watch, 用getData(), exist()设置Data的watch, 用getChildren设置child watch.
它有助于帮助我们思考返回数据问题, getdata(),exist()返回node节点信息, getChildren()返回子节点数组, 因此setData将会触发znode的Data watch(watch返回znode的节点信息, 前提是set成功). 成功的create()将会触发znode的Data watch和父节点的childWatch, 成功的delete()将会触发data watch和child watch和父节点的child watch
Watches are maintained locally at the ZooKeeper server to which the client is connected. This allows watches to be lightweight to set, maintain, and dispatch. When a client connects to a new server, the watch will be triggered for any session events. Watches will not be received while disconnected from a server. When a client reconnects, any previously registered watches will be reregistered and triggered if needed. In general this all occurs transparently. There is one case where a watch may be missed: a watch for the existence of a znode not yet created will be missed if the znode is created and deleted while disconnected.
watches在zookeeper节点维护, 如果client端重连不会导致watches失效, 这一切对client端透明, 但是除了一种情况, 如果znode在连接丢失时被创建或者删除, 判断这个Znode存在与否的watch将会miss
Semantics of Watches
We can set watches with the three calls that read the state of ZooKeeper: exists, getData, and getChildren. The following list details the events that a watch can trigger and the calls that enable them:
Created event:
Enabled with a call to exists.
Deleted event:
Enabled with a call to exists, getData, and getChildren.
Changed event:
Enabled with a call to exists and getData.
Child event:
Enabled with a call to getChildren.
Remove Watches
We can remove the watches registered on a znode with a call to removeWatches. Also, a ZooKeeper client can remove watches locally even if there is no server connection by setting the local flag to true. The following list details the events which will be triggered after the successful watch removal.
Child Remove event:
Watcher which was added with a call to getChildren.
Data Remove event:
Watcher which was added with a call to exists or getData.
What ZooKeeper Guarantees about Watches
With regard to watches, ZooKeeper maintains these guarantees:
Watches are ordered with respect to other events, other watches, and asynchronous replies. The ZooKeeper client libraries ensures that everything is dispatched in order.
A client will see a watch event for a znode it is watching before seeing the new data that corresponds to that znode.
The order of watch events from ZooKeeper corresponds to the order of the updates as seen by the ZooKeeper service.
Things to Remember about Watches
Watches are one time triggers; if you get a watch event and you want to get notified of future changes, you must set another watch.
Because watches are one time triggers and there is latency between getting the event and sending a new request to get a watch you cannot reliably see every change that happens to a node in ZooKeeper. Be prepared to handle the case where the znode changes multiple times between getting the event and setting the watch again. (You may not care, but at least realize it may happen.)
A watch object, or function/context pair, will only be triggered once for a given notification. For example, if the same watch object is registered for an exists and a getData call for the same file and that file is then deleted, the watch object would only be invoked once with the deletion notification for the file.
When you disconnect from a server (for example, when the server fails), you will not get any watches until the connection is reestablished. For this reason session events are sent to all outstanding watch handlers. Use session events to go into a safe mode: you will not be receiving events while disconnected, so your process should act conservatively in that mode.
1. watch保证有序, watch是一次性的
2. 相同类型的watch在同一个znode被设置多次, 但只会触发一次
3. zk与client的连接丢失, client不会得到任何的watch event直到连接重新被建立, 因此session event
4. 由于watch是一次性的. 会有这种潜在情况, 获取event和发送request去获取watch, 不一定会获取这个节点的每一次变动, 所以要准备去处理这种case, 起码要有这种意识.
Gotchas: Common Problems and Troubleshooting
So now you know ZooKeeper. It's fast, simple, your application works, but wait ... something's wrong. Here are some pitfalls that ZooKeeper users fall into:
If you are using watches, you must look for the connected watch event. When a ZooKeeper client disconnects from a server, you will not receive notification of changes until reconnected. If you are watching for a znode to come into existance, you will miss the event if the znode is created and deleted while you are disconnected.
用watches, 你必须注意连接问题, 如果client连接断开, 你不会收到任何event除非重连, 如果你在监听一个znode的exist event, 那么连接中断你将miss掉这个节点的watch event
You must test ZooKeeper server failures. The ZooKeeper service can survive failures as long as a majority of servers are active. The question to ask is: can your application handle it? In the real world a client's connection to ZooKeeper can break. (ZooKeeper server failures and network partitions are common reasons for connection loss.) The ZooKeeper client library takes care of recovering your connection and letting you know what happened, but you must make sure that you recover your state and any outstanding requests that failed. Find out if you got it right in the test lab, not in production - test with a ZooKeeper service made up of a several of servers and subject them to reboots.
必须测试Zkserver失败的情况, 看看application是否能够正常工作
The list of ZooKeeper servers used by the client must match the list of ZooKeeper servers that each ZooKeeper server has. Things can work, although not optimally, if the client list is a subset of the real list of ZooKeeper servers, but not if the client lists ZooKeeper servers not in the ZooKeeper cluster.
client端使用的zkServer列表, 必须和ZkServer本身配置的列表匹配, 否则有可能出现client的ZkServer不在Zk集群中的情况
Be careful where you put that transaction log. The most performance-critical part of ZooKeeper is the transaction log. ZooKeeper must sync transactions to media before it returns a response. A dedicated transaction log device is key to consistent good performance. Putting the log on a busy device will adversely effect performance. If you only have one storage device, put trace files on NFS and increase the snapshotCount; it doesn't eliminate the problem, but it can mitigate it.
Set your Java max heap size correctly. It is very important to avoid swapping. Going to disk unnecessarily will almost certainly degrade your performance unacceptably. Remember, in ZooKeeper, everything is ordered, so if one request hits the disk, all other queued requests hit the disk.
To avoid swapping, try to set the heapsize to the amount of physical memory you have, minus the amount needed by the OS and cache. The best way to determine an optimal heap size for your configurations is to run load tests. If for some reason you can't, be conservative in your estimates and choose a number well below the limit that would cause your machine to swap. For example, on a 4G machine, a 3G heap is a conservative estimate to start with.
正确设置java max heap size , 对于防止swaping很重要, 频繁的进行磁盘交换将会大幅影响性能, 由于Zk是有序的, 如果一个request hit到磁盘, 那么其他后续的一定也是到磁盘
防止swapping, 尝试设置heap zise到物理内存大小, 留给OS和cache一点空间, 最好的做法是进行性能测试, 如果不行的话, 建议是4G的机器, 3G的heap size, 大约3/4左右
Zookeeper原理与Curator使用的更多相关文章
- Apache ZooKeeper原理剖析及分布式理论名企高频面试v3.7.0
概述 **本人博客网站 **IT小神 www.itxiaoshen.com 定义 Apache ZooKeeper官网 https://zookeeper.apache.org/ 最新版本3.7.0 ...
- zookeeper入门之Curator的使用之几种监听器的使用
package com.git.zookeeper.passwordmanager.listener; import java.util.ArrayList; import java.util.Lis ...
- Zookeeper(三) Zookeeper原理与应用
一.zookeeper原理解析 1.进群角色描述 2.Paxos 算法概述( ZAB 协议) 分布式一致性算法 3.Zookeeper 的选主(恢复模式) 以一个简单的例子来说明整个选举的过程. ...
- Zookeeper原理和实战开发经典视频教程 百度云网盘下载
Zookeeper原理和实战开发 经典视频教程 百度云网盘下载 资源下载地址:http://pan.baidu.com/s/1o7ZjPeM 密码:r5yf
- 八:Zookeeper开源客户端Curator的api测试
curator是Netflix公司开源的一套ZooKeeper客户端,Curator解决了很多ZooKeeper客户端非常底层的细节开发工作.包括连接重连,反复注册Watcher等.实现了Fluent ...
- 8.8.ZooKeeper 原理和选举机制
1.ZooKeeper原理 Zookeeper虽然在配置文件中并没有指定master和slave但是,zookeeper工作时,是有一个节点为leader,其他则为follower,Leader是通 ...
- Zookeeper客户端Apache Curator
本文不对Zookeeper进行介绍,主要介绍Curator怎么操作Zookeeper. Apache Curator是Apache ZooKeeper的Java / JVM客户端库,Apache Zo ...
- ZooKeeper 分布式锁 Curator 源码 02:可重入锁重复加锁和锁释放
ZooKeeper 分布式锁 Curator 源码 02:可重入锁重复加锁和锁释放 前言 加锁逻辑已经介绍完毕,那当一个线程重复加锁是如何处理的呢? 锁重入 在上一小节中,可以看到加锁的过程,再回头看 ...
- ZooKeeper 分布式锁 Curator 源码 03:可重入锁并发加锁
前言 在了解了加锁和锁重入之后,最需要了解的还是在分布式场景下或者多线程并发加锁是如何处理的? 并发加锁 先来看结果,在多线程对 /locks/lock_01 加锁时,是在后面又创建了新的临时节点. ...
随机推荐
- xss跨站脚本攻击与防御读书笔记(原创)
XSS在客户端执行 可以任意执行js代码 0x01 xss 的利用方式 1. 钓鱼 案例:http://www.wooyun.org/bugs/wooyun-2014-076685 我是 ...
- 《Linux设备驱动开发具体解释(第3版)》(即《Linux设备驱动开发具体解释:基于最新的Linux 4.0内核》)网购链接
<Linux设备驱动开发具体解释:基于最新的Linux 4.0内核> china-pub spm=a1z10.3-b.w4011-10017777404.30.kvceXB&i ...
- 成都传智播客Java/PHP培训就业率高
依据传智播客的数据统计,传智播客的学员有五分之中的一个的能在毕业前找到惬意的工作,一半的学员能在毕业后一个月之内找到惬意的工作,一般在毕业后两个月之内绝大多数同学都能找到惬意的工作.而且传智播客毕业学 ...
- html的table使用div创建
午休时间写了一个使用div创建table的案例 1.样式 <style> .table { display: table; } .tableRow { display: table-row ...
- (九)jQuery中的动画(载)
原文链接:http://blog.csdn.net/zfy865628361/article/details/50358367 首先,用jQuery做动画效果要求在标准模式下,否则可能会引起动画抖动. ...
- spring源码解析之IOC容器(一)
学习优秀框架的源码,是提升个人技术水平必不可少的一个环节.如果只是停留在知道怎么用,但是不懂其中的来龙去脉,在技术的道路上注定走不长远.最近,学习了一段时间的spring源码,现在整理出来,以便日后温 ...
- canvas drawImage方法不显示图片的解决方案
先复习一下用法: context.drawImage(img,sx,sy,swidth,sheight,x,y,width,height); 各个参数说明: 参数 描述 img 规定要使用的图像.画布 ...
- IE8 "开发人员工具" 无法使用,无法显示
经常使用IE8开发工具的开发人员可能会遇到这么一种去情况:按F12时任务栏里出现开发人员工具的任务,但是开发人员工具窗体不弹出,也不出现在IE8里,重装IE88后还是存在此问题. 解决办法其实非常简单 ...
- Python结合NC.exe 实现模拟登录&批量填表
1.工作需求 有很多事项,每个事项分为:名称.种类.时间等,需要把每个事项逐个输入到网页中并提交. 如果用人肉操作的话,流程就是先登录到网站后台,点击“添加”——>输入各项内容——>点击“ ...
- 概率dp HDU 4405
Aeroplane chess Time Limit:1000MS Memory Limit:32768KB 64bit IO Format:%I64d & %I64u Sub ...