dubbo、zookeeper心跳相关参数解析与测试
写在开头,zk客户端、服务器对负载比较敏感,对于类似大数据处理的应用,zk心跳时间设置和监测很关键,否则非常容易系统不稳定,建议可能长时间高负载导致GC时间过长的非OLTP的尽量不使用zk或rpc,而是使用MQ或HTTP。
dubbo consumer和provider的心跳机制
dubbo客户端和dubbo服务端之间存在心跳,目的是维持provider和consumer之间的长连接。由dubbo客户端主动发起,可参见dubbo源码 HeartbeatTask。dubbo心跳时间heartbeat默认是60s,超过heartbeat时间没有收到消息,就发送心跳消息(provider,consumer一样),如果连着3次(heartbeatTimeout为heartbeat*3)没有收到心跳响应(所以如果是批处理的话,很可能就会无响应导致被踢掉(例如gc时间超过1分钟,亦或是并发过高,cpu长时间100%使得心跳线程无法被调度),此时就需要加长超时次数或心跳值(我司用的是改过的版本,超时时间默认15秒,所以LZ改源码读application.properties配置自己实现了)【本质上,rpc长连接不适合于服务需要长时间完成的场景,只不过历史问题,应该轮询或发消息】),provider会关闭channel,而consumer会进行重连;不论是provider还是consumer的心跳检测都是通过启动定时任务的方式实现。
- provider绑定和consumer连接的入口:
public class HeaderExchanger implements Exchanger { public static final String NAME = "header"; @Override
public ExchangeClient connect(URL url, ExchangeHandler handler) throws RemotingException {
return new HeaderExchangeClient(Transporters.connect(url, new DecodeHandler(new HeaderExchangeHandler(handler))), true);
} @Override
public ExchangeServer bind(URL url, ExchangeHandler handler) throws RemotingException {
return new HeaderExchangeServer(Transporters.bind(url, new DecodeHandler(new HeaderExchangeHandler(handler))));
} }
- provider启动心跳检测
public HeaderExchangeServer(Server server) {
if (server == null) {
throw new IllegalArgumentException("server == null");
}
this.server = server;
this.heartbeat = server.getUrl().getParameter(Constants.HEARTBEAT_KEY, 0);
//心跳超时时间默认为心跳时间的3倍
this.heartbeatTimeout = server.getUrl().getParameter(Constants.HEARTBEAT_TIMEOUT_KEY, heartbeat * 3);
//如果心跳超时时间小于心跳时间的两倍则抛异常
if (heartbeatTimeout < heartbeat * 2) {
throw new IllegalStateException("heartbeatTimeout < heartbeatInterval * 2");
}
startHeatbeatTimer();
}
- startHeatbeatTimer的实现
- 先停止已有的定时任务,启动新的定时任务
private void startHeatbeatTimer() {
// 停止原有定时任务
stopHeartbeatTimer();
// 发起新的定时任务
if (heartbeat > 0) {
heatbeatTimer = scheduled.scheduleWithFixedDelay(
new HeartBeatTask(new HeartBeatTask.ChannelProvider() {
public Collection<Channel> getChannels() {
return Collections.unmodifiableCollection(HeaderExchangeServer.this.getChannels());
}
}, heartbeat, heartbeatTimeout),
heartbeat, heartbeat, TimeUnit.MILLISECONDS);
}
}
- HeartBeatTask的实现
- 遍历所有的channel,检测心跳间隔,如果超过心跳间隔没有读或写,则发送需要回复的心跳消息,最有判断是否心跳超时(heartbeatTimeout),如果超时,provider关闭channel,consumer进行重连
public void run() {
try {
long now = System.currentTimeMillis();
for (Channel channel : channelProvider.getChannels()) {
if (channel.isClosed()) {
continue;
}
try {
Long lastRead = (Long) channel.getAttribute(HeaderExchangeHandler.KEY_READ_TIMESTAMP);
Long lastWrite = (Long) channel.getAttribute(HeaderExchangeHandler.KEY_WRITE_TIMESTAMP);
// 读写的时间,任一超过心跳间隔,发送心跳
if ((lastRead != null && now - lastRead > heartbeat)
|| (lastWrite != null && now - lastWrite > heartbeat)) {
Request req = new Request();
req.setVersion("2.0.0");
req.setTwoWay(true); // 需要响应的心跳事件
req.setEvent(Request.HEARTBEAT_EVENT);
channel.send(req);
if (logger.isDebugEnabled()) {
logger.debug("Send heartbeat to remote channel " + channel.getRemoteAddress()
+ ", cause: The channel has no data-transmission exceeds a heartbeat period: " + heartbeat + "ms");
}
}
// 最后读的时间,超过心跳超时时间
if (lastRead != null && now - lastRead > heartbeatTimeout) {
logger.warn("Close channel " + channel
+ ", because heartbeat read idle time out: " + heartbeatTimeout + "ms");
// 客户端侧,重新连接服务端
if (channel instanceof Client) {
try {
((Client) channel).reconnect();
} catch (Exception e) {
//do nothing
}
// 服务端侧,关闭客户端连接
} else {
channel.close();
}
}
} catch (Throwable t) {
logger.warn("Exception when heartbeat to remote channel " + channel.getRemoteAddress(), t);
}
}
} catch (Throwable t) {
logger.warn("Unhandled exception when heartbeat, cause: " + t.getMessage(), t);
}
}
- consumer端的实现
- 默认需要心跳检测
public HeaderExchangeClient(Client client, boolean needHeartbeat) {
if (client == null) {
throw new IllegalArgumentException("client == null");
}
this.client = client;
// 创建 HeaderExchangeChannel 对象
this.channel = new HeaderExchangeChannel(client);
// 读取心跳相关配置
String dubbo = client.getUrl().getParameter(Constants.DUBBO_VERSION_KEY);
this.heartbeat = client.getUrl().getParameter(Constants.HEARTBEAT_KEY, dubbo != null && dubbo.startsWith("1.0.") ? Constants.DEFAULT_HEARTBEAT : 0);
this.heartbeatTimeout = client.getUrl().getParameter(Constants.HEARTBEAT_TIMEOUT_KEY, heartbeat * 3);
if (heartbeatTimeout < heartbeat * 2) { // 避免间隔太短
throw new IllegalStateException("heartbeatTimeout < heartbeatInterval * 2");
}
// 发起心跳定时器
if (needHeartbeat) {
startHeatbeatTimer();
}
dubbo客户端/服务端和注册中心(zk)存在心跳
由dubbo客户端或服务端发起,这是基于zk集群和zk客户端之间的心跳机制。由zk服务器参数tickTime(这个时间是作为Zookeeper服务器之间或客户端与服务器之间维持心跳的时间间隔,每隔tickTime时间就会发送一个心跳;最小的session过期时间为2倍tickTime)控制间隔,但是实际情况是我们发现心跳间隔是tickTime的1/2(此例中服务器太忙,以至于zk客户端没有及时给服务器发心跳),如下:
[] 2019-08-07 14:51:46 [5189311] [ERROR] Curator-Framework-0 org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:200) Connection timed out for connection string (localhost:2181) and timeout (5000) / elapsed (27053)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197) [curator-client-2.10.0.jar!/:?]
at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88) [curator-client-2.10.0.jar!/:?]
at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:116) [curator-client-2.10.0.jar!/:?]
at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:835) [curator-framework-2.10.0.jar!/:?]
at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809) [curator-framework-2.10.0.jar!/:?]
at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64) [curator-framework-2.10.0.jar!/:?]
at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267) [curator-framework-2.10.0.jar!/:?]
at java.util.concurrent.FutureTask.run(Unknown Source) [?:1.8.0_211]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source) [?:1.8.0_211]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) [?:1.8.0_211]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:1.8.0_211]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:1.8.0_211]
at java.lang.Thread.run(Unknown Source) [?:1.8.0_211]
[] 2019-08-07 14:51:46 [5190082] [WARN] Curator-Framework-0-SendThread(0:0:0:0:0:0:0:1:2181) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1102) Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_211]
at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source) ~[?:1.8.0_211]
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[zookeeper-3.4.6.jar!/:3.4.6-1569965]
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) [zookeeper-3.4.6.jar!/:3.4.6-1569965]
[] 2019-08-07 14:51:47 [5190312] [ERROR] Curator-Framework-0 org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:200) Connection timed out for connection string (localhost:2181) and timeout (5000) / elapsed (28054)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197) [curator-client-2.10.0.jar!/:?]
at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88) [curator-client-2.10.0.jar!/:?]
at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:116) [curator-client-2.10.0.jar!/:?]
at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:835) [curator-framework-2.10.0.jar!/:?]
at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809) [curator-framework-2.10.0.jar!/:?]
at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64) [curator-framework-2.10.0.jar!/:?]
at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267) [curator-framework-2.10.0.jar!/:?]
at java.util.concurrent.FutureTask.run(Unknown Source) [?:1.8.0_211]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source) [?:1.8.0_211]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) [?:1.8.0_211]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:1.8.0_211]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:1.8.0_211]
at java.lang.Thread.run(Unknown Source) [?:1.8.0_211]
[] 2019-08-07 14:51:47 [5191182] [INFO] Curator-Framework-0-SendThread(k3ctest.yidooo.com:2181) org.apache.zookeeper.ClientCnxn$SendThread.logStartConnect(ClientCnxn.java:975) Opening socket connection to server k3ctest.yidooo.com/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
[] 2019-08-07 14:51:48 [5191314] [ERROR] Curator-Framework-0 org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:200) Connection timed out for connection string (localhost:2181) and timeout (5000) / elapsed (29056)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197) [curator-client-2.10.0.jar!/:?]
at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88) [curator-client-2.10.0.jar!/:?]
at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:116) [curator-client-2.10.0.jar!/:?]
at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:835) [curator-framework-2.10.0.jar!/:?]
at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:809) [curator-framework-2.10.0.jar!/:?]
at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:64) [curator-framework-2.10.0.jar!/:?]
at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:267) [curator-framework-2.10.0.jar!/:?]
at java.util.concurrent.FutureTask.run(Unknown Source) [?:1.8.0_211]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source) [?:1.8.0_211]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) [?:1.8.0_211]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:1.8.0_211]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:1.8.0_211]
at java.lang.Thread.run(Unknown Source) [?:1.8.0_211]
在启动的时候,是可以看到客户端设置的超时时间及和服务端协商后确定的超时时间。如下:
21:28:57.442 [main] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=localhost:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@5afd2f4e
21:28:57.475 [main-SendThread(localhost:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error)
21:28:57.477 [main-SendThread(localhost:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to localhost/0:0:0:0:0:0:0:1:2181, initiating session
21:28:57.508 [main-SendThread(localhost:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = 0x10025cfdf590000, negotiated timeout = 60000
21:28:57.515 [main-EventThread] INFO org.apache.curator.framework.state.ConnectionStateManager - State change: CONNECTED
zk集群节点间的同步时间控制
(1)initLimit
此配置表示,允许follower(相对于Leader言的“客户端”)连接并同步到Leader的初始化连接时间,以tickTime(这是最基础的参数,设置了所有时间相关的参数的基本单位)为单位。当初始化连接时间超过该值,则表示连接失败。
(2)syncLimit
此配置项表示Leader与Follower之间发送消息时,请求和应答时间长度。如果follower在设置时间内不能与leader通信,那么此follower将会被丢弃。
客户端的会话超时时间
对于会话的超时时间,客户端将sessionTimeout的值传给zk时,zk还会根据minSessionTimeout(默认为tickTime的2倍)与maxSessionTimeout(默认为tickTime的20倍)两个参数重新调整最后的超时值,所以默认40秒就会超时了,所以对于超长的问题例如gc导致服务器STW时间较长,应确保一次gc的STW时间小于等于tickTime,可以极大的缓解zk重连,也可以考虑加大maxSessionTimeout和minSessionTimeout。
public int getMinSessionTimeout() {
return minSessionTimeout == -1 ? tickTime * 2 : minSessionTimeout;
} public int getMaxSessionTimeout() {
return maxSessionTimeout == -1 ? tickTime * 20 : maxSessionTimeout;
}
int minSessionTimeout = zk.getMinSessionTimeout();
if (sessionTimeout < minSessionTimeout) {
sessionTimeout= minSessionTimeout;
}
int maxSessionTimeout = zk.getMaxSessionTimeout();
if (sessionTimeout > maxSessionTimeout) {
sessionTimeout = maxSessionTimeout;
}
这俩参数的含义如下:
minSessionTimeout
(No Java system property)
New in 3.3.0: the minimum session timeout in milliseconds that the server will allow the client to negotiate. Defaults to 2 times the tickTime.
maxSessionTimeout
(No Java system property)
New in 3.3.0: the maximum session timeout in milliseconds that the server will allow the client to negotiate. Defaults to 20 times the tickTime.
最后要知道Leader节点是单点的,负责所有事务的协调,如果leader挂掉,需要知道它如何被重新选举出,可以参考:zookeeper核心原理详解。
dubbo、zookeeper心跳相关参数解析与测试的更多相关文章
- Dubbo 泛化调用的参数解析问题及一个强大的参数解析工具 PojoUtils
排查了3个多小时,因为一个简单的错误,发现一个强大的参数解析工具,记录一下. 背景 Nodejs 通过 tether 调用 Java Dubbo 服务.请求类的某个参数对象 EsCondition 有 ...
- Dubbo+zookeeper构建高可用分布式集群(二)-集群部署
在Dubbo+zookeeper构建高可用分布式集群(一)-单机部署中我们讲了如何单机部署.但没有将如何配置微服务.下面分别介绍单机与集群微服务如何配置注册中心. Zookeeper单机配置:方式一. ...
- Zookeeper + Hadoop2.6 集群HA + spark1.6完整搭建与所有参数解析
废话就不多说了,直接开始啦~ 安装环境变量: 使用linx下的解压软件,解压找到里面的install 或者 ls 运行这个进行安装 yum install gcc yum install gcc-c+ ...
- mysql之 binlog维护详细解析(开启、binlog相关参数作用、mysqlbinlog解读、binlog删除)
binary log 作用:主要实现三个重要的功能:用于复制,用于恢复,用于审计.binary log 相关参数:log_bin设置此参数表示启用binlog功能,并指定路径名称log_bin_ind ...
- 电机噪声之谐波分析(内附simulink中FFT分析的相关参数配置与解析)
电机噪声之谐波分析(内附simulink中FFT分析的相关参数配置与解析) 目录 电机噪声之谐波分析(内附simulink中FFT分析的相关参数配置与解析) 写在前面 正文 电机噪声 谐波的产生 什么 ...
- SpringBoot分布式:Dubbo+zookeeper
西部开源-秦疆老师:SpringBoot + Dubbo + zookeeper 秦老师交流Q群号: 664386224 未授权禁止转载!编辑不易 , 转发请注明出处!防君子不防小人,共勉! 基础知识 ...
- Dubbo服务引用源码解析③
上一章分析了服务暴露的源码,这一章继续分析服务引用的源码.在Dubbo中有两种引用方式:第一种是服务直连,第二种是基于注册中心进行引用.服务直连一般用在测试的场景下,线上更多的是基于注册中心的方式 ...
- dubbo+zookeeper+springBoot框架整合与dubbo泛型调用演示
dubbo + zookeeper + spring Boot框架整合与dubbo泛型调用演示 By:客 授客 QQ:1033553122 欢迎加入全国软件测试交流 QQ 群:7156436 ...
- SpringBoot分布式 - Dubbo+ZooKeeper
一:介绍 ZooKeeper是一个分布式的,开放源码的分布式应用程序协调服务.它是一个为分布式应用提供一致性服务的软件,提供的功能包括:配置维护.域名服务.分布式同步.组服务等. Dubbo是Alib ...
随机推荐
- Vue+SpringBoot后端接收包含单属性和List数组的json对象
这次主要是针对springboot后台接收的json中包含多对象(如List数组/单属性)所写的一篇文章.虽然网上类似情况很多,尝试了一个晚上,都没有解决问题,最后还是在师兄的帮助下完美解决. vue ...
- Solr基础知识二(导入数据)
上一篇讲述了solr的安装启动过程,这一篇讲述如何导入数据到solr里. 一.准备数据 1.1 学生相关表 创建学生表.学生专业关联表.专业表.学生行业关联表.行业表.基础信息表,并创建一条小白的信息 ...
- MySQL创建用户和加限权
目录 1.权限管理 1.1对新用户增删改 1.2对当前的用户授权管理 1.权限管理 我们知道我们的最高权限管理者是root用户,它拥有着最高的权限操作.包括select.update.delete ...
- Python并发编程-concurrent包
Python并发编程-concurrent包 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.concurrent.futures包概述 3.2版本引入的模块. 异步并行任务编程 ...
- Gerrit服务器权限管理
Gerrit服务器权限管理 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.Gerrit权限概述 1>.对象 Gerrit识别单个或多个人员集合. Gerrit不允许使用单 ...
- 百度编译器ueditor插入视频的时候。在预览的窗口提示 “输入的视频地址有误,请检查后再试!
ueditor.all.js: 搜索me.commands["insertvideo"] 把html.push(creatInsertStr( vi.url, vi.width | ...
- sq分页
create proc RowNumber @pageindex int,@pagesize int AS BEGIN select * from (select ROW_NUMBER() OVER( ...
- CentOS7安装Chrome
1. 进入官网:https://www.google.cn/intl/zh-CN/chrome/2. 点击下载3. 直接安装:sudo yum localinstall google-chrome-s ...
- poi读写doc和docx
https://www.cnblogs.com/always-online/p/4800131.html POI是 Apache 旗下一款读写计算机中的 word 以及 excel 文件的工具. po ...
- danci5
foss community 自由软体社区 可理解为开源 program 英 ['prəʊɡræm] 美 ['proɡræm] n. 程序:计划:大纲 vt. 用程序指令:为…制订计划:为…安排节目 ...