KingbaseES V8R6 集群运维系列 -- trusted

案例说明：

在KingbaseES V8R3及V8R6早期的版本，对于读写分离的集群如果网关地址无法连通，将会导致整个集群关闭，数据库服务无法访问。在后期版本的改进中，降低了对网关的依赖性，当网关地址不通时，会影响集群的部分高可用功能比如failover切换，但集群可以正常对外提供数据库访问服务。如下图所示：

适用版本：

KingbaseES V8R6

集群网关配置：

[kingbase@node101 bin]$ cat ../etc/repmgr.conf |grep trust

trusted_servers='192.168.1.1'

running_under_failure_trusted_servers='on'

一、查看集群节点状态

[kingbase@node101 bin]$ ./repmgr cluster show

 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string

----+-------+---------+-----------+----------+----------+----------+----------+---------+---------------------------------------------------------------------------------------------------------------------------------------------------

 1  | node1 | standby |   running | node2    | default  | 100      | 4        | 0 bytes | host=192.168.1.102 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

 2  | node2 | primary | * running |          | default  | 100      | 4        |         | host=192.168.1.101 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

二、模拟网关故障

[kingbase@node101 ~]$ ping 192.168.1.1

PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.

From 192.168.1.101 icmp_seq=10 Destination Host Unreachable

From 192.168.1.101 icmp_seq=11 Destination Host Unreachable

From 192.168.1.101 icmp_seq=12 Destination Host Unreachable

.....

---如上所示，所有集群节点已经无法ping通网关地址。

三、查看网关失败后集群状态

1、集群节点状态

[kingbase@node101 bin]$ ./repmgr cluster show

 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string

----+-------+---------+-----------+----------+----------+----------+----------+---------+---------------------------------------------------------------------------------------------------------------------------------------------------

 1  | node1 | standby |   running | node2    | default  | 100      | 4        | 0 bytes | host=192.168.1.102 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

 2  | node2 | primary | * running |          | default  | 100      | 4        |         | host=192.168.1.101 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

2、数据库连接测试

[kingbase@node102 bin]$ ./ksql -U system test

ksql (V8.0)

Type "help" for help.

                                                       version

----------------------------------------------------------------------------------------------------------------------

 KingbaseES V008R006C007B0012 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit

(1 row)

---如上所示，网关无法连通后，集群节点状态及数据库服务仍都正常。

3、查看kbha.log日志

Tips：

KingbaseES V8R6集群通过kbha进程每过三秒执行一次网关连通性的测试。

[2023-04-10 15:57:30] [WARNING] ping host"192.168.1.1" failed

[2023-04-10 15:57:31] [NOTICE] PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.

--- 192.168.1.1 ping statistics ---

2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 999ms

pipe 2

[2023-04-10 15:57:31] [WARNING] ping host"192.168.1.1" failed

[2023-04-10 15:57:31] [DETAIL] average RTT value is not greater than zero

[2023-04-10 15:57:31] [DEBUG] ping process end early. usleep(994400)

----如上所示，kbha.log日志记录了网关地址连接失败的日志。

四、集群failover切换测试

1、关闭主库数据库服务

[kingbase@node101 bin]$ ./sys_ctl stop -D ../../data

2、查看备库hamgr.log日志

[2023-04-10 16:13:41] [DEBUG] monitoring node in degraded state for 640 seconds

[2023-04-10 16:13:43] [DEBUG] connecting to: "user=esrep connect_timeout=10 dbname=esrep host=192.168.1.101 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr options=-csearch_path="

[2023-04-10 16:13:43] [DEBUG] monitoring node in degraded state for 642 seconds

[2023-04-10 16:13:45] [DEBUG] connecting to: "user=esrep connect_timeout=10 dbname=esrep host=192.168.1.101 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr options=-csearch_path="

[2023-04-10 16:13:45] [DEBUG] monitoring node in degraded state for 644 seconds

[2023-04-10 16:13:47] [DEBUG] connecting to: "user=esrep connect_timeout=10 dbname=esrep host=192.168.1.101 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr options=-csearch_path="

[2023-04-10 16:13:47] [DEBUG] monitoring node in degraded state for 646 seconds

[2023-04-10 16:13:49] [DEBUG] connecting to: "user=esrep connect_timeout=10 dbname=esrep host=192.168.1.101 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr options=-csearch_path="

[2023-04-10 16:13:49] [DEBUG] monitoring node in degraded state for 648 seconds

---如以上所示，备库检测到主库连接失败，但是并没有触发主备切换。

3、查看集群节点状态

[kingbase@node102 bin]$ ./repmgr cluster show

 ID | Name  | Role    | Status        | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string

----+-------+---------+---------------+----------+----------+----------+----------+---------+---------------------------------------------------------------------------------------------------------------------------------------------------

 1  | node1 | standby |   running     | ? node2  | default  | 100      | 4        | ?       | host=192.168.1.102 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

 2  | node2 | primary | ? unreachable | ?        | default  | 100      |          |         | host=192.168.1.101 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

[WARNING] following issues were detected

  - unable to connect to node "node1" (ID: 1)'s upstream node "node2" (ID: 2)

  - unable to determine if node "node1" (ID: 1) is attached to its upstream node "node2" (ID: 2)

  - unable to connect to node "node2" (ID: 2)

  - node "node2" (ID: 2) is registered as an active primary but is unreachable

[HINT] execute with --verbose option to see connection error messages

如下图所示，primary处于不可连接状态，未产生failover切换：

五、总结

KingbaseES集群节点通过ping网关地址，测试集群节点之间的网络的互通，如果网关失败，会影响到集群的正常运行，可以在集群中配置多个网关，保证网关地址的高可用。

[kingbase@node101 bin]$ cat ../etc/repmgr.conf |grep trust

trusted_servers='192.168.1.1,192.168.1.254'

running_under_failure_trusted_servers='on'

KingbaseES V8R6 集群运维系列 -- trusted_server的更多相关文章

KingbaseES V8R6集群运维案例之---repmgr standby promote应用案例
案例说明: 在容灾环境中,跨区域部署的异地备节点不会自主提升为主节点,在主节点发生故障或者人为需要切换时需要手动执行切换操作.若主节点已经失效,希望将异地备机提升为主节点. $bin/repmgr s ...
KingbaseES V8R6集群管理运维案例之---repmgr standby switchover故障
案例说明: 在KingbaseES V8R6集群备库执行"repmgr standby switchover"时,切换失败,并且在执行过程中,伴随着"repmr stan ...
KingbaseES V8R6集群维护案例之---停用集群node_export进程
案例说明: 在KingbaseES V8R6集群启动时,会启动node_exporter进程,此进程主要用于向kmonitor监控服务输出节点状态信息.在系统安全漏洞扫描中,提示出现以下安全漏洞: 对 ...
kingbaseES V8R6集群备份恢复案例之---备库作为repo主机执行物理备份
案例说明: 此案例是在KingbaseES V8R6集群环境下,当主库磁盘空间不足时,执行sys_rman备份,将集群的备库节点作为repo主机,执行备份,并将备份存储在备库的磁盘空间. 集群架构 ...
KingbaseES V8R6集群维护之--修改数据库服务端口案例
案例说明: 对于KingbaseES数据库单实例环境,只需要修改kingbase.conf文件的'port'参数即可,但是对于KingbaseES V8R6集群中涉及到多个配置文件的修改,并且在应 ...
KingbaseES V8R6集群外部备份案例
案例说明: 本案例采用sys_backup.sh执行物理备份,备份使用如下逻辑架构:集群采用CentOS 7系统,repo采用kylin V10 Server. 一主一备+外部备份此场景为主备双机常 ...
KingbaseES V8R3集群运维案例之---主库系统down failover切换过程分析
案例说明: KingbaseES V8R3集群failover时两个cluster都会触发,但只有一个cluster会调用脚本去执行真正的切换流程,另一个有对应的打印,但不会调用脚本,只是走相关的 ...
KingbaseES V8R3集群运维案例之---kingbase_monitor.sh启动”two master“案例
案例说明: KingbaseES V8R3集群,执行kingbase_monitor.sh启动集群,出现"two master"节点的故障,启动集群失败:通过手工sys_ctl启动 ...
KingbaseES V8R3集群运维案例之---cluster.log ERROR: md5 authentication failed
案例说明: 在KingbaseES V8R3集群的cluster.log日志中,经常会出现"ERROR: md5 authentication failed:DETAIL: password ...
KingbaseES V8R6集群维护案例之---将securecmdd通讯改为ssh案例
案例说明: 在KingbaseES V8R6的后期版本中,为了解决有的主机之间不允许root用户ssh登录的问题,使用了securecmdd作为集群部署分发和通讯的服务,有生产环境通过漏洞扫描,在88 ...

随机推荐

Java的协程Quasar
协程是对函数和线程进一步优化的产物, 是一种函数的编排方式, 将传统意义上的函数拆成更小粒度的过程. 简单说, 就是比函数粒度还要小的可手动控制的过程. 协程可以通过yield 来调用其它协程,接下来 ...
ERROR 1820 (HY000): You must reset your password using ALTER USER statement
新安装好的mysql5.7数据库,用root登录以后执行操作报这个错. 解决方法: mysql> alter user 'root'@'localhost' identified by 'roo ...
项目实战：Qt终端命令模拟工具 v1.0.0（实时获取命令行输出，执行指令，模拟ctrl+c中止操作）
需求在Qt软件中实现部分终端控制命令行功能,使软件内可以又好的模拟终端控制,提升软件整体契合度. Demo演示运行包下载地址: CSDNf粉丝0积分下载:https: ...
linux下安装nginx（编译安装）及反向代理及负载均衡
首先卸载掉之前用yum命令下载的nginx yum remove nginx 安装nginx需要的依赖库 yum install -y gcc patch libffi-devel python-de ...
【Azure 环境】Azure门户中 Metrics 图表的聚合指标每项具体代表什么意思呢？
问题描述下图中,指标里的每项聚合指标具体代表什么呢? 问题解答 Azure Metrics 指标中提供了五种基本的聚合类型. Sum - 在聚合间隔内捕获的所有值的总和. 有时称为总聚合. Coun ...
java 考试易考识记题目（一）
笔者擅长 C# 语言,4月份要考试,学习 JAVA 是为了考试罢了. 如何在最短时间内学习 JAVA 基础语法和通过考试考核呢~ 学习 JAVA ,要为了应付考试,判断.循环这部分,C.C++.C#. ...
linux基本文件命令复习笔记
https://www.bilibili.com/video/BV1ex411x7Em/?p=4&spm_id_from=pageDriver&vd_source=92305fa48e ...
STL-RBTree模拟实现
#pragma once #include<assert.h> #include<iostream> using std::cout; using std::endl; usi ...
方便快速的看到C/C++代码汇编 objdump 英特尔语法
目录概述 Objdump 所有参数其他的概述因为奇怪的考试要求,最近经常有奇怪的问题,例如为什么(++a)+(++a)=14 发现反编译出汇编之后,就能解释很多奇怪的问题 Objdump 一次 ...
Java //内存解析

KingbaseES V8R6 集群运维系列 -- trusted_server

KingbaseES V8R6 集群运维系列 -- trusted_server的更多相关文章

随机推荐

热门专题