KingbaseES V8R6 集群运维案例之 -- VIP配置错误导致集群切换失败
案例说明:
KingbaseES V8R6集群的vip在repmgr.conf中配置,本案例测试了手工卸载和加载vip的操作,对failover切换时vip的卸载和加载的影响。
适用版本:
KingbaseES V8R6
一、集群节点状态
[kingbase@node101 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
1 | node101 | primary | * running | | default | 100 | 51 | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node102 | standby | running | node101 | default | 100 | 50 | host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
二、集群vip配置
1、查看主机vip加载配置
[kingbase@node101 bin]$ ip add sh
.......
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:bd:83:57 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.101/24 brd 192.168.1.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet 192.168.1.254/24 scope global secondary enp0s3:3
valid_lft forever preferred_lft forever
---如上所示,主库主机加载vip:192.168.1.254/24
2、查看集群vip配置
[kingbase@node101 bin]$ cat ../etc/repmgr.conf|grep -i vir
virtual_ip='192.168.1.254/24'
三、手工卸载vip测试
1、卸载主库vip
# 如下所示,在卸载vip时需要指定ip掩码
[root@node101 cron.d]# ip add delete 192.168.1.254/24 dev enp0s3
[root@node101 cron.d]# ip add sh
.......
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:bd:83:57 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.101/24 brd 192.168.1.255 scope global noprefixroute enp0s3
2、查看集群节点状态
Tips:
如下所示, 主库vip卸载不影响集群状态,集群状态正常。
[kingbase@node101 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
1 | node101 | primary | * running | | default | 100 | 51 | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node102 | standby | running | node101 | default | 100 | 50 | host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
3、vip自动加载
如下所示,当集群探测到主库vip缺失时,会自动加载vip。
1)查看主机vip
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:bd:83:57 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.101/24 brd 192.168.1.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet 192.168.1.254/24 scope global secondary enp0s3:3
valid_lft forever preferred_lft forever
---如上所示,在vip被手工卸载后,又被集群自动加载。
2)查看集群日志
如下所示,通过ping vip发现vip丢失时,集群会尝试自动加载vip。
[2023-03-09 17:47:05] [NOTICE] found primary node lost virtual_ip, try to acquire virtual_ip
[2023-03-09 17:47:07] [NOTICE] PING 192.168.1.254 (192.168.1.254) 56(84) bytes of data.
--- 192.168.1.254 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1005ms
[2023-03-09 17:47:07] [WARNING] ping host"192.168.1.254" failed
[2023-03-09 17:47:07] [DETAIL] average RTT value is not greater than zero
[2023-03-09 17:47:07] [DEBUG] executing:
/home/kingbase/cluster/R6HA/kha/kingbase/bin/kbha -A loadvip
[2023-03-09 17:47:07] [DEBUG] result of command was 0 (0)
[2023-03-09 17:47:07] [DEBUG] local_command(): no output returned
[2023-03-09 17:47:07] [DEBUG] executing:
/home/kingbase/cluster/R6HA/kha/kingbase/bin/kbha -A arping
[2023-03-09 17:47:07] [DEBUG] result of command was 0 (0)
[2023-03-09 17:47:07] [DEBUG] local_command(): no output returned
[2023-03-09 17:47:07] [INFO] loadvip result: 1, arping result: 1
[2023-03-09 17:47:07] [NOTICE] acquire the virtual ip 192.168.1.254/24 success on localhost
四、手工加载vip测试(子网掩码变化)
1、加载不同子网掩码的vip
[root@node101 cron.d]# ip add delete 192.168.1.254/24 dev enp0s3
[root@node101 cron.d]# ip add add 192.168.1.254/32 dev enp0s3:3
[root@node101 cron.d]# ip add sh
......
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:bd:83:57 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.101/24 brd 192.168.1.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet 192.168.1.254/32 scope global enp0s3
valid_lft forever preferred_lft forever
---如上所示,vip被手工卸载并加载不同子网掩码的vip(192.168.1.254/32)。
2、执行failover切换测试
1) 关闭主库数据库服务
[kingbase@node101 bin]$ ./sys_ctl stop -D /data/kingbase/r6ha/data/
waiting for server to shut down.... done
server stopped
2) 查看主库ip配置
如下所示,主库vip未被卸载。
[root@node101 cron.d]# ip add sh
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:bd:83:57 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.101/24 brd 192.168.1.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet 192.168.1.254/32 scope global enp0s3
3) 查看备库hamgr.log
[2023-03-09 17:52:28] [INFO] try to ping the trusted_servers "192.168.1.1" before execute promote_command
[2023-03-09 17:52:30] [NOTICE] PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
--- 192.168.1.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1008ms
rtt min/avg/max/mdev = 0.231/0.287/0.343/0.056 ms
[2023-03-09 17:52:30] [NOTICE] successfully ping one or more of the trusted_servers "192.168.1.1"
[2023-03-09 17:52:30] [DEBUG] test_ssh_connection(): executing ssh -o Batchmode=yes -q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22 192.168.1.101 /bin/true 2>/dev/null
[2023-03-09 17:52:30] [NOTICE] try to stop old primary db (host: "192.168.1.101")
[2023-03-09 17:52:30] [DEBUG] remote_command():
ssh -o Batchmode=yes -q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22 192.168.1.101 /home/kingbase/cluster/R6HA/kha/kingbase/bin/kbha -A stopdb
[2023-03-09 17:52:30] [DEBUG] remote_command(): no output returned
[2023-03-09 17:52:32] [NOTICE] PING 192.168.1.254 (192.168.1.254) 56(84) bytes of data.
--- 192.168.1.254 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.472/0.505/0.538/0.033 ms
[2023-03-09 17:52:32] [WARNING] the virtual ip is already on other host, try to release it on old primary node (host: "192.168.1.101")
[2023-03-09 17:52:32] [DEBUG] test_ssh_connection(): executing ssh -o Batchmode=yes -q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22 192.168.1.101 /bin/true 2>/dev/null
[2023-03-09 17:52:32] [INFO] ES connection to host "192.168.1.101" succeeded, ready to release vip on it
[2023-03-09 17:52:32] [DEBUG] remote_command():
ssh -o Batchmode=yes -q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22 192.168.1.101 /home/kingbase/cluster/R6HA/kha/kingbase/bin/kbha -A check_ip --ip 192.168.1.254
[2023-03-09 17:52:32] [DEBUG] remote_command(): output returned was:
1
[2023-03-09 17:52:32] [DEBUG] remote_command():
ssh -o Batchmode=yes -q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22 192.168.1.101 /home/kingbase/cluster/R6HA/kha/kingbase/bin/kbha -A unloadvip
RTNETLINK answers: Cannot assign requested address
[2023-03-09 17:52:32] [DEBUG] remote_command(): no output returned
[2023-03-09 17:52:32] [WARNING] old primary node (host: "192.168.1.101") release the virtual ip 192.168.1.254/24 failed
[2023-03-09 17:52:32] [NOTICE] the time from the first failure to acquire VIP is 2 seconds (max 60 seconds), try agian
[2023-03-09 17:52:32] [NOTICE] will acquire the virtual ip again
[2023-03-09 17:52:34] [NOTICE] PING 192.168.1.254 (192.168.1.254) 56(84) bytes of data.
如下图所示,failover切换时,备库远程连接主库后,执行vip的卸载,备库从repmgr.conf中读取的vip地址为:192.168.1.254/24,而主库此时加载的vip地址是:192.168.1.254/32,vip地址不匹配,因此无法卸载vip地址,导致切换失败。
五、总结
1、如果在主库上vip被手工卸载,集群不会发生切换,集群会自动判断并加载vip地址到主库。
2、如果主库上配置了和repmgr.conf中不一致的vip地址,在集群切换时,将无法执行vip地址的卸载,会导致集群切换失败。
```****
KingbaseES V8R6 集群运维案例之 -- VIP配置错误导致集群切换失败的更多相关文章
- KingbaseES V8R6集群运维案例之---repmgr standby promote应用案例
案例说明: 在容灾环境中,跨区域部署的异地备节点不会自主提升为主节点,在主节点发生故障或者人为需要切换时需要手动执行切换操作.若主节点已经失效,希望将异地备机提升为主节点. $bin/repmgr s ...
- KingbaseES V8R3集群运维案例之---主库系统down failover切换过程分析
案例说明: KingbaseES V8R3集群failover时两个cluster都会触发,但只有一个cluster会调用脚本去执行真正的切换流程,另一个有对应的打印,但不会调用脚本,只是走相关的 ...
- KingbaseES V8R3集群运维案例之---用户自定义表空间管理
案例说明: KingbaseES 数据库支持用户自定义表空间的创建,并建议表空间的文件存储路径配置到数据库的data目录之外.本案例复现了,当用户自定义表空间存储路径配置到data下时,出现的故障问 ...
- KingbaseES V8R3集群运维案例之---kingbase_monitor.sh启动”two master“案例
案例说明: KingbaseES V8R3集群,执行kingbase_monitor.sh启动集群,出现"two master"节点的故障,启动集群失败:通过手工sys_ctl启动 ...
- KingbaseES V8R3集群运维案例之---cluster.log ERROR: md5 authentication failed
案例说明: 在KingbaseES V8R3集群的cluster.log日志中,经常会出现"ERROR: md5 authentication failed:DETAIL: password ...
- PB 级大规模 Elasticsearch 集群运维与调优实践
PB 级大规模 Elasticsearch 集群运维与调优实践 https://mp.weixin.qq.com/s/PDyHT9IuRij20JBgbPTjFA | 导语 腾讯云 Elasticse ...
- 优化系统资源ulimit《高性能Linux服务器构建实战:运维监控、性能调优与集群应用》
优化系统资源ulimit<高性能Linux服务器构建实战:运维监控.性能调优与集群应用> 假设有这样一种情况,一台Linux 主机上同时登录了10个用户,在没有限制系统资源的情况下,这10 ...
- 优化Linux内核参数/etc/sysctl.conf sysctl 《高性能Linux服务器构建实战:运维监控、性能调优与集群应用》
优化Linux内核参数/etc/sysctl.conf sysctl <高性能Linux服务器构建实战:运维监控.性能调优与集群应用> http://book.51cto.com/ar ...
- 集群运维ansible
ssh免密登录 集群运维 生成秘钥,一路enter cd ~/.ssh/ ssh-keygen -t rsa 讲id_rsa.pub文件追加到授权的key文件中 cat ~/.ssh/id_rsa.p ...
- 阿里巴巴大规模神龙裸金属 Kubernetes 集群运维实践
作者 | 姚捷(喽哥)阿里云容器平台集群管理高级技术专家 本文节选自<不一样的 双11 技术:阿里巴巴经济体云原生实践>一书,点击即可完成下载. 导读:值得阿里巴巴技术人骄傲的是 2019 ...
随机推荐
- flex布局-20201028
改版自阮一峰的网络日志-Flex 布局教程:语法篇 在flex容器上设置的(即父元素上设置); flex-direction属性决定主轴的方向(即项目的排列方向). flex-direction: r ...
- java学生管理系统(界面版)
运行截图 项目说明: 本系统界面我个人就从简设计了,本来打算使用windowbuilder插件设计的,可想到使用windowbuilder插件之后导致代码冗余,会影响到代码可读性,可能对小白不友好.虽 ...
- 解析Spring中的循环依赖问题:再探三级缓存(AOP)
前言 在之前的内容中,我们简要探讨了循环依赖,并指出仅通过引入二级缓存即可解决此问题.然而,你可能会好奇为何在Spring框架中还需要引入三级缓存singletonFactories.在前述总结中,我 ...
- python基础安装虚拟环境
1.pip install virtualenv或者pip3 install virtualenv 2.在要存放虚拟环境的地方创建一个venv文件夹,用来存放所有创建的虚拟环境,方便查找与管理 3.m ...
- Html飞机大战(十): 消灭敌机
好家伙,本篇是带着遗憾写完的. 很遗憾,我找了很久,找到了bug但并没有成功修复bug 再上一篇中我们看到 子弹射中了敌机,但是敌机并没有消失,所以这篇我们要来完善这个功能 按照惯例我们来捋一下思路: ...
- maven配置全局私服地址和阿里云仓库
直接上配置代码 <?xml version="1.0" encoding="UTF-8"?> <!-- Licensed to the Apa ...
- 私有git服务器搭建-gitlib版
目录 环境 centos6.5 这里有官网安装地址教程: 这里有机器配置安装需求 CPU Memory 安装步骤 安装配置依赖项 添加GitLab仓库,并安装到服务器上 启动GitLab 配置 git ...
- 【Azure 应用服务】Azure Function Python函数中,如何获取Event Hub Trigger的消息Event所属于的PartitionID呢?
问题描述 在通过Azure Function消费Event Hub中的消息时,我们从Function 的 Trigger Details 日志中,可以获得当前Funciton中处理的消息是哪一个分区( ...
- 树形dp套路
我们知道dp也就是动态规划的思想就是先解决小问题,通过不断的解决小问题,最终解决大问题.那么能够应用树形dp套路的题目都应该符合一个条件,那就是通过解决每个子树的小问题,最终解决整棵树的大问题. 套路 ...
- 快速复习JDBC(超详细)
第一章 JDBC概述 之前我们学习了JavaSE,编写了Java程序,数据保存在变量.数组.集合等中,无法持久化,后来学习了IO流可以将数据写入文件,但不方便管理数据以及维护数据的关系: 后来我们学 ...