KingbaseES V8R6 集群运维系列 -- trusted_server
案例说明:
在KingbaseES V8R3及V8R6早期的版本,对于读写分离的集群如果网关地址无法连通,将会导致整个集群关闭,数据库服务无法访问。在后期版本的改进中,降低了对网关的依赖性,当网关地址不通时,会影响集群的部分高可用功能比如failover切换,但集群可以正常对外提供数据库访问服务。如下图所示:

适用版本:
KingbaseES V8R6
集群网关配置:
[kingbase@node101 bin]$ cat ../etc/repmgr.conf |grep trust
trusted_servers='192.168.1.1'
running_under_failure_trusted_servers='on'
一、查看集群节点状态
[kingbase@node101 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+---------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | standby | running | node2 | default | 100 | 4 | 0 bytes | host=192.168.1.102 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node2 | primary | * running | | default | 100 | 4 | | host=192.168.1.101 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
二、模拟网关故障
[kingbase@node101 ~]$ ping 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
From 192.168.1.101 icmp_seq=10 Destination Host Unreachable
From 192.168.1.101 icmp_seq=11 Destination Host Unreachable
From 192.168.1.101 icmp_seq=12 Destination Host Unreachable
.....
---如上所示,所有集群节点已经无法ping通网关地址。
三、查看网关失败后集群状态
1、集群节点状态
[kingbase@node101 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+---------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | standby | running | node2 | default | 100 | 4 | 0 bytes | host=192.168.1.102 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node2 | primary | * running | | default | 100 | 4 | | host=192.168.1.101 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2、数据库连接测试
[kingbase@node102 bin]$ ./ksql -U system test
ksql (V8.0)
Type "help" for help.
version
----------------------------------------------------------------------------------------------------------------------
KingbaseES V008R006C007B0012 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
(1 row)
---如上所示,网关无法连通后,集群节点状态及数据库服务仍都正常。
3、查看kbha.log日志
Tips:
KingbaseES V8R6集群通过kbha进程每过三秒执行一次网关连通性的测试。
[2023-04-10 15:57:30] [WARNING] ping host"192.168.1.1" failed
[2023-04-10 15:57:31] [NOTICE] PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
--- 192.168.1.1 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 999ms
pipe 2
[2023-04-10 15:57:31] [WARNING] ping host"192.168.1.1" failed
[2023-04-10 15:57:31] [DETAIL] average RTT value is not greater than zero
[2023-04-10 15:57:31] [DEBUG] ping process end early. usleep(994400)
----如上所示,kbha.log日志记录了网关地址连接失败的日志。
四、集群failover切换测试
1、关闭主库数据库服务
[kingbase@node101 bin]$ ./sys_ctl stop -D ../../data
2、查看备库hamgr.log日志
[2023-04-10 16:13:41] [DEBUG] monitoring node in degraded state for 640 seconds
[2023-04-10 16:13:43] [DEBUG] connecting to: "user=esrep connect_timeout=10 dbname=esrep host=192.168.1.101 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr options=-csearch_path="
[2023-04-10 16:13:43] [DEBUG] monitoring node in degraded state for 642 seconds
[2023-04-10 16:13:45] [DEBUG] connecting to: "user=esrep connect_timeout=10 dbname=esrep host=192.168.1.101 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr options=-csearch_path="
[2023-04-10 16:13:45] [DEBUG] monitoring node in degraded state for 644 seconds
[2023-04-10 16:13:47] [DEBUG] connecting to: "user=esrep connect_timeout=10 dbname=esrep host=192.168.1.101 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr options=-csearch_path="
[2023-04-10 16:13:47] [DEBUG] monitoring node in degraded state for 646 seconds
[2023-04-10 16:13:49] [DEBUG] connecting to: "user=esrep connect_timeout=10 dbname=esrep host=192.168.1.101 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr options=-csearch_path="
[2023-04-10 16:13:49] [DEBUG] monitoring node in degraded state for 648 seconds
---如以上所示,备库检测到主库连接失败,但是并没有触发主备切换。
3、查看集群节点状态
[kingbase@node102 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+---------------+----------+----------+----------+----------+---------+---------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | standby | running | ? node2 | default | 100 | 4 | ? | host=192.168.1.102 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node2 | primary | ? unreachable | ? | default | 100 | | | host=192.168.1.101 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
[WARNING] following issues were detected
- unable to connect to node "node1" (ID: 1)'s upstream node "node2" (ID: 2)
- unable to determine if node "node1" (ID: 1) is attached to its upstream node "node2" (ID: 2)
- unable to connect to node "node2" (ID: 2)
- node "node2" (ID: 2) is registered as an active primary but is unreachable
[HINT] execute with --verbose option to see connection error messages
如下图所示,primary处于不可连接状态,未产生failover切换:

五、总结
KingbaseES集群节点通过ping网关地址,测试集群节点之间的网络的互通,如果网关失败,会影响到集群的正常运行,可以在集群中配置多个网关,保证网关地址的高可用。
[kingbase@node101 bin]$ cat ../etc/repmgr.conf |grep trust
trusted_servers='192.168.1.1,192.168.1.254'
running_under_failure_trusted_servers='on'
KingbaseES V8R6 集群运维系列 -- trusted_server的更多相关文章
- KingbaseES V8R6集群运维案例之---repmgr standby promote应用案例
案例说明: 在容灾环境中,跨区域部署的异地备节点不会自主提升为主节点,在主节点发生故障或者人为需要切换时需要手动执行切换操作.若主节点已经失效,希望将异地备机提升为主节点. $bin/repmgr s ...
- KingbaseES V8R6集群管理运维案例之---repmgr standby switchover故障
案例说明: 在KingbaseES V8R6集群备库执行"repmgr standby switchover"时,切换失败,并且在执行过程中,伴随着"repmr stan ...
- KingbaseES V8R6集群维护案例之---停用集群node_export进程
案例说明: 在KingbaseES V8R6集群启动时,会启动node_exporter进程,此进程主要用于向kmonitor监控服务输出节点状态信息.在系统安全漏洞扫描中,提示出现以下安全漏洞: 对 ...
- kingbaseES V8R6集群备份恢复案例之---备库作为repo主机执行物理备份
案例说明: 此案例是在KingbaseES V8R6集群环境下,当主库磁盘空间不足时,执行sys_rman备份,将集群的备库节点作为repo主机,执行备份,并将备份存储在备库的磁盘空间. 集群架构 ...
- KingbaseES V8R6集群维护之--修改数据库服务端口案例
案例说明: 对于KingbaseES数据库单实例环境,只需要修改kingbase.conf文件的'port'参数即可,但是对于KingbaseES V8R6集群中涉及到多个配置文件的修改,并且在应 ...
- KingbaseES V8R6集群外部备份案例
案例说明: 本案例采用sys_backup.sh执行物理备份,备份使用如下逻辑架构:集群采用CentOS 7系统,repo采用kylin V10 Server. 一主一备+外部备份 此场景为主备双机常 ...
- KingbaseES V8R3集群运维案例之---主库系统down failover切换过程分析
案例说明: KingbaseES V8R3集群failover时两个cluster都会触发,但只有一个cluster会调用脚本去执行真正的切换流程,另一个有对应的打印,但不会调用脚本,只是走相关的 ...
- KingbaseES V8R3集群运维案例之---kingbase_monitor.sh启动”two master“案例
案例说明: KingbaseES V8R3集群,执行kingbase_monitor.sh启动集群,出现"two master"节点的故障,启动集群失败:通过手工sys_ctl启动 ...
- KingbaseES V8R3集群运维案例之---cluster.log ERROR: md5 authentication failed
案例说明: 在KingbaseES V8R3集群的cluster.log日志中,经常会出现"ERROR: md5 authentication failed:DETAIL: password ...
- KingbaseES V8R6集群维护案例之---将securecmdd通讯改为ssh案例
案例说明: 在KingbaseES V8R6的后期版本中,为了解决有的主机之间不允许root用户ssh登录的问题,使用了securecmdd作为集群部署分发和通讯的服务,有生产环境通过漏洞扫描,在88 ...
随机推荐
- IDEA从o开始的一系列操作及修改配置-快捷键汇总
IDEA从o开始的一系列操作及修改配置-快捷键汇总 下载IDEA 启动idea 安装svn插件 功能快捷键 先设置提示快捷键(纯属个人喜好) 入门快捷键 查找 编辑 小功能 自动代码提示 自动导包 T ...
- C++ 将filesystem::path转换为const BYTE*
std::string s = fs::temp_directory_path().append(filename).string(); LPCBYTE str = reinterpret_cast& ...
- file.deleteOnExit()与file.delete()的区别
之前踩过一个坑,下载过的文件在我第二次打开app的时候奇迹的找不到了.难道是没有下载成功?为此我特地查看了我的本地文件路径的目录.事实证明文件的确是下载到了本地路径下,但是第二次进入app的时候,路径 ...
- 项目实战:Qt监测操作系统cpu温度v1.1.0(支持windows、linux、国产麒麟系统)
需求 使用Qt软件开发一个检测cpu温度的功能. 兼容windows.linux,国产麒麟系统(同为linux) Demo windows上运行(需要管理员权限): 国产麒麟操作上运 ...
- 名校 AI 课程|斯坦福 CS25:Transformers United 专题讲座
自 2017 年提出后,Transformer 名声大噪,不仅颠覆了自然语言处理(NLP)领域,而且在计算机视觉(CV).强化学习(RL).生成对抗网络(GANs).语音甚至是生物学等领域也大显锋芒, ...
- 【Azure Logic App】添加 Storage Account 来提升 Logic App 的性能
文章原文:https://techcommunity.microsoft.com/t5/azure-integration-services-blog/scaling-logic-app-standa ...
- 【Azure 应用服务】Web App Service 中的 应用程序配置(Application Setting) 怎么获取key vault中的值
问题描述 App Service中,如何通过 Application Setting 来配置 Key Vault中的值呢? 问题解答 首先,App Service服务可以直接通过引用的方式,无需代码的 ...
- 【Azure 环境】IntelliJ IDEA Community Edition 2021.2.3登陆Azure账号时,无法切换到中国区
问题描述 在IntelliJ IDEA Community Edition 2021.2.3中开发Azure Function程序,最后准备部署到中国区 Azure Function中.如下,在Int ...
- 【Azure API 管理】是否可以将Swagger 的API定义导入导Azure API Management中
问题描述 是否可以将Swagger 的API定义导入导Azure API Management中? 操作步骤 是的,可以通过APIM门户导入单个的API Swagger定义文件.具体步骤如下: 第一步 ...
- C++ //类模板与继承 //类模板与继承 //注意: //1.当子类继承父类是一个类模板时,子类在声名的时候,要指定出父类中T的类型 //2.如果不指定,编译器无法给子类分配内存 //3.如果想灵活指定出父类中的T的类型,子类也需要变为类模板
1 #include <iostream> 2 #include <string> 3 #include<fstream> 4 using namespace st ...