dubbo、zookeeper心跳相关参数解析与测试
写在开头,zk客户端、服务器对负载比较敏感,对于类似大数据处理的应用,zk心跳时间设置和监测很关键,否则非常容易系统不稳定,建议可能长时间高负载导致GC时间过长的非OLTP的尽量不使用zk或rpc,而是使用MQ或HTTP。
dubbo consumer和provider的心跳机制
dubbo客户端和dubbo服务端之间存在心跳,目的是维持provider和consumer之间的长连接。由dubbo客户端主动发起,可参见dubbo源码 HeartbeatTask。dubbo心跳时间heartbeat默认是60s,超过heartbeat时间没有收到消息,就发送心跳消息(provider,consumer一样),如果连着3次(heartbeatTimeout为heartbeat*3)没有收到心跳响应(所以如果是批处理的话,很可能就会无响应导致被踢掉(例如gc时间超过1分钟,亦或是并发过高,cpu长时间100%使得心跳线程无法被调度),此时就需要加长超时次数或心跳值(我司用的是改过的版本,超时时间默认15秒,所以LZ改源码读application.properties配置自己实现了)【本质上,rpc长连接不适合于服务需要长时间完成的场景,只不过历史问题,应该轮询或发消息】),provider会关闭channel,而consumer会进行重连;不论是provider还是consumer的心跳检测都是通过启动定时任务的方式实现。
- provider绑定和consumer连接的入口:
public class HeaderExchanger implements Exchanger { public static final String NAME = "header"; @Override
public ExchangeClient connect(URL url, ExchangeHandler handler) throws RemotingException {
return new HeaderExchangeClient(Transporters.connect(url, new DecodeHandler(new HeaderExchangeHandler(handler))), true);
} @Override
public ExchangeServer bind(URL url, ExchangeHandler handler) throws RemotingException {
return new HeaderExchangeServer(Transporters.bind(url, new DecodeHandler(new HeaderExchangeHandler(handler))));
} }
- provider启动心跳检测
public HeaderExchangeServer(Server server) {
if (server == null) {
throw new IllegalArgumentException("server == null");
}
this.server = server;
this.heartbeat = server.getUrl().getParameter(Constants.HEARTBEAT_KEY, 0);
//心跳超时时间默认为心跳时间的3倍
this.heartbeatTimeout = server.getUrl().getParameter(Constants.HEARTBEAT_TIMEOUT_KEY, heartbeat * 3);
//如果心跳超时时间小于心跳时间的两倍则抛异常
if (heartbeatTimeout < heartbeat * 2) {
throw new IllegalStateException("heartbeatTimeout < heartbeatInterval * 2");
}
startHeatbeatTimer();
}
- startHeatbeatTimer的实现
- 先停止已有的定时任务,启动新的定时任务
private void startHeatbeatTimer() {
// 停止原有定时任务
stopHeartbeatTimer();
// 发起新的定时任务
if (heartbeat > 0) {
heatbeatTimer = scheduled.scheduleWithFixedDelay(
new HeartBeatTask(new HeartBeatTask.ChannelProvider() {
public Collection<Channel> getChannels() {
return Collections.unmodifiableCollection(HeaderExchangeServer.this.getChannels());
}
}, heartbeat, heartbeatTimeout),
heartbeat, heartbeat, TimeUnit.MILLISECONDS);
}
}
- HeartBeatTask的实现
- 遍历所有的channel,检测心跳间隔,如果超过心跳间隔没有读或写,则发送需要回复的心跳消息,最有判断是否心跳超时(heartbeatTimeout),如果超时,provider关闭channel,consumer进行重连
public void run() {
try {
long now = System.currentTimeMillis();
for (Channel channel : channelProvider.getChannels()) {
if (channel.isClosed()) {
continue;
}
try {
Long lastRead = (Long) channel.getAttribute(HeaderExchangeHandler.KEY_READ_TIMESTAMP);
Long lastWrite = (Long) channel.getAttribute(HeaderExchangeHandler.KEY_WRITE_TIMESTAMP);
// 读写的时间,任一超过心跳间隔,发送心跳
if ((lastRead != null && now - lastRead > heartbeat)
|| (lastWrite != null && now - lastWrite > heartbeat)) {
Request req = new Request();
req.setVersion("2.0.0");
req.setTwoWay(true); // 需要响应的心跳事件
req.setEvent(Request.HEARTBEAT_EVENT);
channel.send(req);
if (logger.isDebugEnabled()) {
logger.debug("Send heartbeat to remote channel " + channel.getRemoteAddress()
+ ", cause: The channel has no data-transmission exceeds a heartbeat period: " + heartbeat + "ms");
}
}
// 最后读的时间,超过心跳超时时间
if (lastRead != null && now - lastRead > heartbeatTimeout) {
logger.warn("Close channel " + channel
+ ", because heartbeat read idle time out: " + heartbeatTimeout + "ms");
// 客户端侧,重新连接服务端
if (channel instanceof Client) {
try {
((Client) channel).reconnect();
} catch (Exception e) {
//do nothing
}
// 服务端侧,关闭客户端连接
} else {
channel.close();
}
}
} catch (Throwable t) {
logger.warn("Exception when heartbeat to remote channel " + channel.getRemoteAddress(), t);
}
}
} catch (Throwable t) {
logger.warn("Unhandled exception when heartbeat, cause: " + t.getMessage(), t);
}
}
- consumer端的实现
- 默认需要心跳检测
public HeaderExchangeClient(Client client, boolean needHeartbeat) {
if (client == null) {
throw new IllegalArgumentException("client == null");
}
this.client = client;
// 创建 HeaderExchangeChannel 对象
this.channel = new HeaderExchangeChannel(client);
// 读取心跳相关配置
String dubbo = client.getUrl().getParameter(Constants.DUBBO_VERSION_KEY);
this.heartbeat = client.getUrl().getParameter(Constants.HEARTBEAT_KEY, dubbo != null && dubbo.startsWith("1.0.") ? Constants.DEFAULT_HEARTBEAT : 0);
this.heartbeatTimeout = client.getUrl().getParameter(Constants.HEARTBEAT_TIMEOUT_KEY, heartbeat * 3);
if (heartbeatTimeout < heartbeat * 2) { // 避免间隔太短
throw new IllegalStateException("heartbeatTimeout < heartbeatInterval * 2");
}
// 发起心跳定时器
if (needHeartbeat) {
startHeatbeatTimer();
}
dubbo客户端/服务端和注册中心(zk)存在心跳
由dubbo客户端或服务端发起,这是基于zk集群和zk客户端之间的心跳机制。由zk服务器参数tickTime(这个时间是作为Zookeeper服务器之间或客户端与服务器之间维持心跳的时间间隔,每隔tickTime时间就会发送一个心跳;最小的session过期时间为2倍tickTime)控制间隔,但是实际情况是我们发现心跳间隔是tickTime的1/2(此例中服务器太忙,以至于zk客户端没有及时给服务器发心跳),如下:
[] 2019-08-07 14:51:46 [5189311] [ERROR] Curator-Framework-0 org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:200) Connection timed out for connection string (localhost:2181) and timeout (5000) / elapsed (27053)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197) [curator-client-2.10.0.jar!/:?]
at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88) [curator-client-2.10.0.jar!/:?]
at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:116) [curator-client-2.10.0.jar!/:?]
at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:835) [curator-framework-2.10.0.jar!/:?]
at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809) [curator-framework-2.10.0.jar!/:?]
at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64) [curator-framework-2.10.0.jar!/:?]
at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267) [curator-framework-2.10.0.jar!/:?]
at java.util.concurrent.FutureTask.run(Unknown Source) [?:1.8.0_211]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source) [?:1.8.0_211]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) [?:1.8.0_211]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:1.8.0_211]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:1.8.0_211]
at java.lang.Thread.run(Unknown Source) [?:1.8.0_211]
[] 2019-08-07 14:51:46 [5190082] [WARN] Curator-Framework-0-SendThread(0:0:0:0:0:0:0:1:2181) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1102) Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_211]
at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source) ~[?:1.8.0_211]
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[zookeeper-3.4.6.jar!/:3.4.6-1569965]
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) [zookeeper-3.4.6.jar!/:3.4.6-1569965]
[] 2019-08-07 14:51:47 [5190312] [ERROR] Curator-Framework-0 org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:200) Connection timed out for connection string (localhost:2181) and timeout (5000) / elapsed (28054)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197) [curator-client-2.10.0.jar!/:?]
at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88) [curator-client-2.10.0.jar!/:?]
at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:116) [curator-client-2.10.0.jar!/:?]
at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:835) [curator-framework-2.10.0.jar!/:?]
at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809) [curator-framework-2.10.0.jar!/:?]
at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64) [curator-framework-2.10.0.jar!/:?]
at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267) [curator-framework-2.10.0.jar!/:?]
at java.util.concurrent.FutureTask.run(Unknown Source) [?:1.8.0_211]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source) [?:1.8.0_211]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) [?:1.8.0_211]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:1.8.0_211]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:1.8.0_211]
at java.lang.Thread.run(Unknown Source) [?:1.8.0_211]
[] 2019-08-07 14:51:47 [5191182] [INFO] Curator-Framework-0-SendThread(k3ctest.yidooo.com:2181) org.apache.zookeeper.ClientCnxn$SendThread.logStartConnect(ClientCnxn.java:975) Opening socket connection to server k3ctest.yidooo.com/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
[] 2019-08-07 14:51:48 [5191314] [ERROR] Curator-Framework-0 org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:200) Connection timed out for connection string (localhost:2181) and timeout (5000) / elapsed (29056)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197) [curator-client-2.10.0.jar!/:?]
at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88) [curator-client-2.10.0.jar!/:?]
at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:116) [curator-client-2.10.0.jar!/:?]
at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:835) [curator-framework-2.10.0.jar!/:?]
at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809) [curator-framework-2.10.0.jar!/:?]
at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64) [curator-framework-2.10.0.jar!/:?]
at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267) [curator-framework-2.10.0.jar!/:?]
at java.util.concurrent.FutureTask.run(Unknown Source) [?:1.8.0_211]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source) [?:1.8.0_211]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) [?:1.8.0_211]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:1.8.0_211]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:1.8.0_211]
at java.lang.Thread.run(Unknown Source) [?:1.8.0_211]
在启动的时候,是可以看到客户端设置的超时时间及和服务端协商后确定的超时时间。如下:
21:28:57.442 [main] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=localhost:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@5afd2f4e
21:28:57.475 [main-SendThread(localhost:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error)
21:28:57.477 [main-SendThread(localhost:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to localhost/0:0:0:0:0:0:0:1:2181, initiating session
21:28:57.508 [main-SendThread(localhost:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x10025cfdf590000, negotiated timeout = 60000
21:28:57.515 [main-EventThread] INFO org.apache.curator.framework.state.ConnectionStateManager - State change: CONNECTED
zk集群节点间的同步时间控制
(1)initLimit
此配置表示,允许follower(相对于Leader言的“客户端”)连接并同步到Leader的初始化连接时间,以tickTime(这是最基础的参数,设置了所有时间相关的参数的基本单位)为单位。当初始化连接时间超过该值,则表示连接失败。
(2)syncLimit
此配置项表示Leader与Follower之间发送消息时,请求和应答时间长度。如果follower在设置时间内不能与leader通信,那么此follower将会被丢弃。
客户端的会话超时时间
对于会话的超时时间,客户端将sessionTimeout的值传给zk时,zk还会根据minSessionTimeout(默认为tickTime的2倍)与maxSessionTimeout(默认为tickTime的20倍)两个参数重新调整最后的超时值,所以默认40秒就会超时了,所以对于超长的问题例如gc导致服务器STW时间较长,应确保一次gc的STW时间小于等于tickTime,可以极大的缓解zk重连,也可以考虑加大maxSessionTimeout和minSessionTimeout。
public int getMinSessionTimeout() {
return minSessionTimeout == -1 ? tickTime * 2 : minSessionTimeout;
} public int getMaxSessionTimeout() {
return maxSessionTimeout == -1 ? tickTime * 20 : maxSessionTimeout;
}
int minSessionTimeout = zk.getMinSessionTimeout();
if (sessionTimeout < minSessionTimeout) {
sessionTimeout= minSessionTimeout;
}
int maxSessionTimeout = zk.getMaxSessionTimeout();
if (sessionTimeout > maxSessionTimeout) {
sessionTimeout = maxSessionTimeout;
}
这俩参数的含义如下:
minSessionTimeout
(No Java system property)
New in 3.3.0: the minimum session timeout in milliseconds that the server will allow the client to negotiate. Defaults to 2 times the tickTime.
maxSessionTimeout
(No Java system property)
New in 3.3.0: the maximum session timeout in milliseconds that the server will allow the client to negotiate. Defaults to 20 times the tickTime.
最后要知道Leader节点是单点的,负责所有事务的协调,如果leader挂掉,需要知道它如何被重新选举出,可以参考:zookeeper核心原理详解。
dubbo、zookeeper心跳相关参数解析与测试的更多相关文章
- Dubbo 泛化调用的参数解析问题及一个强大的参数解析工具 PojoUtils
排查了3个多小时,因为一个简单的错误,发现一个强大的参数解析工具,记录一下. 背景 Nodejs 通过 tether 调用 Java Dubbo 服务.请求类的某个参数对象 EsCondition 有 ...
- Dubbo+zookeeper构建高可用分布式集群(二)-集群部署
在Dubbo+zookeeper构建高可用分布式集群(一)-单机部署中我们讲了如何单机部署.但没有将如何配置微服务.下面分别介绍单机与集群微服务如何配置注册中心. Zookeeper单机配置:方式一. ...
- Zookeeper + Hadoop2.6 集群HA + spark1.6完整搭建与所有参数解析
废话就不多说了,直接开始啦~ 安装环境变量: 使用linx下的解压软件,解压找到里面的install 或者 ls 运行这个进行安装 yum install gcc yum install gcc-c+ ...
- mysql之 binlog维护详细解析(开启、binlog相关参数作用、mysqlbinlog解读、binlog删除)
binary log 作用:主要实现三个重要的功能:用于复制,用于恢复,用于审计.binary log 相关参数:log_bin设置此参数表示启用binlog功能,并指定路径名称log_bin_ind ...
- 电机噪声之谐波分析(内附simulink中FFT分析的相关参数配置与解析)
电机噪声之谐波分析(内附simulink中FFT分析的相关参数配置与解析) 目录 电机噪声之谐波分析(内附simulink中FFT分析的相关参数配置与解析) 写在前面 正文 电机噪声 谐波的产生 什么 ...
- SpringBoot分布式:Dubbo+zookeeper
西部开源-秦疆老师:SpringBoot + Dubbo + zookeeper 秦老师交流Q群号: 664386224 未授权禁止转载!编辑不易 , 转发请注明出处!防君子不防小人,共勉! 基础知识 ...
- Dubbo服务引用源码解析③
上一章分析了服务暴露的源码,这一章继续分析服务引用的源码.在Dubbo中有两种引用方式:第一种是服务直连,第二种是基于注册中心进行引用.服务直连一般用在测试的场景下,线上更多的是基于注册中心的方式 ...
- dubbo+zookeeper+springBoot框架整合与dubbo泛型调用演示
dubbo + zookeeper + spring Boot框架整合与dubbo泛型调用演示 By:客 授客 QQ:1033553122 欢迎加入全国软件测试交流 QQ 群:7156436 ...
- SpringBoot分布式 - Dubbo+ZooKeeper
一:介绍 ZooKeeper是一个分布式的,开放源码的分布式应用程序协调服务.它是一个为分布式应用提供一致性服务的软件,提供的功能包括:配置维护.域名服务.分布式同步.组服务等. Dubbo是Alib ...
随机推荐
- 后端接收json数据交互
学习记录,后端接收json数据几种方式 1.直接接收或者通过HttpServletRequest接收 public void test(String userid, HttpServletReques ...
- mongodb replica-set
mongodb 主从复制 MongoDB 是一个基于分布式文件存储的数据库.由 C++ 语言编写.旨在为 WEB 应用提供可扩展的高性能数据存储解决方案. MongoDB 是一个介于关系数据库和非关系 ...
- 系统调用之fork()用法及陷阱
Fork System Call The fork system call is used to create a new processes. The newly created process i ...
- Mysql【第一课】
- waitpid()函数
waitpid函数 作用同于wait,但可指定pid进程清理,可以不阻塞. pid_t waitpid(pid_t pid,int *status,int options);成功:返回清理掉的子进程I ...
- root用户ssh可以登录,xftp通过sftp不能登录链接CentOS解决办法
xftp显示无法连接到xx.xx.xx(服务器地址) 解决办法: 把/etc/ssh/sshd_config文件中的Subsystem sftp /usr/libexec/openssh/sftp-s ...
- 超详细的Hadoop2配置详解
1. 集群环境 Master 192.168.2.100 Slave1 192.168.2.101 Slave2 192.168.2.102 2. 下载安装包 Master wget http://m ...
- 了解一下Elasticsearch的基本概念
一.前文介绍 Elasticsearch(简称ES)是一个基于Apache Lucene(TM)的开源搜索引擎,无论在开源还是专有领域,Lucene 可以被认为是迄今为止最先进.性能最好的.功能最全的 ...
- [转载]XML非法字符的处理
https://blog.csdn.net/qq_36330228/article/details/84779390 static void Main(string[] args) { string ...
- 【洛谷P4319】 变化的道路 线段树分治+LCT
最近学了一下线段树分治,感觉还蛮好用... 如果正常动态维护最大生成树的话用 LCT 就行,但是这里还有时间这一维的限制. 所以,我们就把每条边放到以时间为轴的线段树的节点上,然后写一个可撤销 LCT ...