KingbaseES V8R6集群管理运维案例之---repmgr standby switchover故障
案例说明:
在KingbaseES V8R6集群备库执行“repmgr standby switchover”时,切换失败,并且在执行过程中,伴随着“repmr standby follow”操作,本案例详细记录了解决此问题的过程。
适用版本:
KingbaseES V8R6
集群节点信息:
一、备库执行switchover操作
1、执行switchover切换
[kingbase@node101 bin]$ ./repmgr standby switchover -h 192.168.1.102 -U esrep -d esrep
WARNING: following problems with command line parameters detected:
database connection parameters not required when executing UNKNOWN ACTION
NOTICE: executing switchover on node "node101" (ID: 1)
ERROR: local node "node101" (ID: 1) is not a downstream of demotion candidate primary "node102" (ID: 2)
DETAIL: local node has no registered upstream node
HINT: execute "repmgr standby register --force" to update the local node's metadata
2、切换失败信息
3、查看集群节点状态
[kingbase@node101 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
1 | node101 | standby | running | | default | 100 | 6 | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node102 | primary | * running | | default | 100 | 6 | host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
=如下所示,standby节点的upstream为空,无法执行switchover。=
二、配置standby节点的upstream(repmgr standby follow)
1、执行“repmgr standby follow”
[kingbase@node101 bin]$ ./repmgr standby follow -h 192.168.1.102 -U esrep -d esrep
NOTICE: attempting to find and follow current primary
INFO: timelines are same, this server is not ahead
DETAIL: local node lsn is 1/CE004F50, follow target lsn is 1/CE004F50
ERROR: slot "repmgr_slot_1" already exists as an active slot
NOTICE: STANDBY FOLLOW failed
DETAIL: slot "repmgr_slot_1" already exists as an active slot
# standby的replication slot是active状态
test=# select * from sys_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confir
med_flush_lsn
---------------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+-------
--------------
repmgr_slot_1 | | physical | | | f | t | 8596 | 1917 | | 1/CE005038 |
(1 row)
2、停止数据库删除standby的replication slot
# 关闭备库数据库服务
[kingbase@node101 bin]$ ./sys_ctl stop -D ../data
waiting for server to shut down.... done
server stopped
# 注释kingbase.auto.conf中slot参数
[kingbase@node101 bin]$ cat ../data/kingbase.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
enable_upper_colname = 'on'
client_idle_timeout = '0'
synchronous_standby_names = ''
wal_retrieve_retry_interval = '5000'
primary_conninfo = 'user=system connect_timeout=10 host=192.168.1.102 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 application_name=node101'
recovery_target_timeline = 'latest'
# primary_slot_name = 'repmgr_slot_1'
# 查看slot状态
test=# select * from sys_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confir
med_flush_lsn
---------------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+-------
--------------
repmgr_slot_1 | | physical | | | f | f | | 1922 | | 1/CE005038 |
(1 row)
# 删除备库replication slot
test=# select sys_drop_replication_slot('repmgr_slot_1');
sys_drop_replication_slot
---------------------------
(1 row)
3、启动数据库服务执行"repmgr standby follow"
[kingbase@node101 bin]$ ./sys_ctl start -D ../data
waiting for server to start....2022-08-09 10:39:50.600 CST [6829] WARNING: enable_upper_colname can only be opened
......
server started
[kingbase@node101 bin]$ ./repmgr standby follow -h 192.168.1.102 -U esrep -d esrep
NOTICE: attempting to find and follow current primary
INFO: timelines are same, this server is not ahead
DETAIL: local node lsn is 1/CE0052E0, follow target lsn is 1/CE0052E0
NOTICE: setting node 1's upstream to node 2
NOTICE: begin to stopp server at 2022-08-09 10:39:55.101228
NOTICE: stopping server using "/home/kingbase/cluster/R6HA/kha/kingbase/bin/sys_ctl -D '/home/kingbase/cluster/R6HA/kha/kingbase/data' -l /home/kingbase/cluster/R6HA/kha/kingbase/bin/logfile -w -t 90 -m fast stop"
NOTICE: stopp server finish at 2022-08-09 10:39:55.205646
NOTICE: begin to start server at 2022-08-09 10:39:55.205705
NOTICE: starting server using "/home/kingbase/cluster/R6HA/kha/kingbase/bin/sys_ctl -w -t 90 -D '/home/kingbase/cluster/R6HA/kha/kingbase/data' -l /home/kingbase/cluster/R6HA/kha/kingbase/bin/logfile start"
NOTICE: start server finish at 2022-08-09 10:39:55.316793
NOTICE: STANDBY FOLLOW successful
DETAIL: standby attached to upstream node "node102" (ID: 2)
# 集群节点状态正常
[kingbase@node101 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
1 | node101 | standby | running | node102 | default | 100 | 6 | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node102 | primary | * running | | default | 100 | 6 | host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
三、执行‘repmgr standby switchover’
[kingbase@node101 bin]$ ./repmgr standby switchover -h 192.168.1.102 -U esrep -d esrep
WARNING: following problems with command line parameters detected:
database connection parameters not required when executing UNKNOWN ACTION
NOTICE: executing switchover on node "node101" (ID: 1)
INFO: The output from primary check cmd "repmgr node check --terse -LERROR --archive-ready --optformat" is: "--status=OK --files=0
"
INFO: pausing repmgrd on node "node101" (ID 1)
INFO: pausing repmgrd on node "node102" (ID 2)
NOTICE: local node "node101" (ID: 1) will be promoted to primary; current primary "node102" (ID: 2) will be demoted to standby
NOTICE: stopping current primary node "node102" (ID: 2)
NOTICE: issuing CHECKPOINT
DETAIL: executing server command "/home/kingbase/cluster/R6HA/kha/kingbase/bin/sys_ctl -D '/home/kingbase/cluster/R6HA/kha/kingbase/data' -l /home/kingbase/cluster/R6HA/kha/kingbase/bin/logfile -W -m fast stop"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
INFO: checking for primary shutdown; 2 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 1/D0000028
NOTICE: promoting standby to primary
DETAIL: promoting server "node101" (ID: 1) using sys_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "node101" (ID: 1) was successfully promoted to primary
NOTICE: issuing CHECKPOINT
INFO: local node 2 can attach to rejoin target node 1
DETAIL: local node's recovery point: 1/D0000028; rejoin target node's fork point: 1/D00000A0
NOTICE: setting node 2's upstream to node 1
WARNING: unable to ping "host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: begin to start server at 2022-08-09 10:46:36.382548
NOTICE: starting server using "/home/kingbase/cluster/R6HA/kha/kingbase/bin/sys_ctl -w -t 90 -D '/home/kingbase/cluster/R6HA/kha/kingbase/data' -l /home/kingbase/cluster/R6HA/kha/kingbase/bin/logfile start"
NOTICE: start server finish at 2022-08-09 10:46:36.488870
NOTICE: replication slot "repmgr_slot_1" deleted on node 2
NOTICE: NODE REJOIN successful
DETAIL: node 2 is now attached to node 1
NOTICE: switchover was successful
DETAIL: node "node101" is now primary and node "node102" is attached as standby
INFO: unpausing repmgrd on node "node101" (ID 1)
INFO: unpause node "node101" (ID 1) successfully
INFO: unpausing repmgrd on node "node102" (ID 2)
INFO: unpause node "node102" (ID 2) successfully
NOTICE: STANDBY SWITCHOVER has completed successfully
# 集群节点状态信息
[kingbase@node101 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
1 | node101 | primary | * running | | default | 100 | 7 | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node102 | standby | running | node101 | default | 100 | 7 | host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
=如上所示,switchover切换完成。=
KingbaseES V8R6集群管理运维案例之---repmgr standby switchover故障的更多相关文章
- KingbaseES V8R3集群管理和维护案例之---failover切换wal日志变化分析
案例说明: 本案例通过对KingbaseES V8R3集群failover切换过程进行观察,分析了主备库切换后wal日志的变化,对应用者了解KingbaseES V8R3(R6) failover ...
- KingbaseES V8R6集群维护案例之---停用集群node_export进程
案例说明: 在KingbaseES V8R6集群启动时,会启动node_exporter进程,此进程主要用于向kmonitor监控服务输出节点状态信息.在系统安全漏洞扫描中,提示出现以下安全漏洞: 对 ...
- KingbaseES V8R6集群外部备份案例
案例说明: 本案例采用sys_backup.sh执行物理备份,备份使用如下逻辑架构:集群采用CentOS 7系统,repo采用kylin V10 Server. 一主一备+外部备份 此场景为主备双机常 ...
- kingbaseES V8R6集群备份恢复案例之---备库作为repo主机执行物理备份
案例说明: 此案例是在KingbaseES V8R6集群环境下,当主库磁盘空间不足时,执行sys_rman备份,将集群的备库节点作为repo主机,执行备份,并将备份存储在备库的磁盘空间. 集群架构 ...
- KingbaseES V8R6集群维护之--修改数据库服务端口案例
案例说明: 对于KingbaseES数据库单实例环境,只需要修改kingbase.conf文件的'port'参数即可,但是对于KingbaseES V8R6集群中涉及到多个配置文件的修改,并且在应 ...
- vivo大规模 Kubernetes 集群自动化运维实践
作者:vivo 互联网服务器团队-Zhang Rong 一.背景 随着vivo业务迁移到K8s的增长,我们需要将K8s部署到多个数据中心.如何高效.可靠的在数据中心管理多个大规模的K8s集群是我们面临 ...
- hadoop记录-hadoop集群日常运维命令
hadoop集群日常运维命令 #1.namenode hadoop namenode -format #格式化,慎用 su hdfs hadoop-daemon.sh start namenode h ...
- KingbaseES V8R6集群维护案例之--单实例数据迁移到集群案例
案例说明: 生产环境是单实例,测试环境是集群,现需要将生产环境的数据迁移到集群中运行,本文档详细介绍了从单实例环境恢复数据到集群环境的操作步骤,可以作为生产环境迁移数据的参考. 适用版本: Kingb ...
- KingbaseES V8R6集群维护案例之---将securecmdd通讯改为ssh案例
案例说明: 在KingbaseES V8R6的后期版本中,为了解决有的主机之间不允许root用户ssh登录的问题,使用了securecmdd作为集群部署分发和通讯的服务,有生产环境通过漏洞扫描,在88 ...
随机推荐
- ACM-01背包问题-Python
日后完善 二维数组实现 if __name__ == '__main__': # 背包空间 space = 10 # 默认第一个元素为 0, 仅仅是为了方便理解 weights = [0, 2, 2, ...
- HDLBits->Circuits->Arithmetic Circuitd->3-bit binary adder
Verilog实例数组 对于一个定义好的简单module,例如加法器之类,如果我们要对其进行几十次几百次的例化,并且这些例化基本都是相同的形式,那么我们肯定不能一个个的单独对其进行例化,此时我们就可以 ...
- Spring Security自定义认证器
在了解过Security的认证器后,如果想自定义登陆,只要实现AuthenticationProvider还有对应的Authentication就可以了 Authentication 首先要创建一个自 ...
- SAP 实例 12 List Box with Value List from PBO Module
REPORT demo_dynpro_dropdown_listbox. DATA: name TYPE vrm_id, list TYPE vrm_values, value LIKE LINE O ...
- React技巧之中断map循环
正文从这开始~ 总览 在React中,中断map()循环: 在数组上调用slice()方法,来得到数组的一部分. 在部分数组上调用map()方法. 遍历部分数组. export default fun ...
- NC20439 [SHOI2017]期末考试
NC20439 [SHOI2017]期末考试 题目 题目描述 有 \(n\) 位同学,每位同学都参加了全部的 \(m\) 门课程的期末考试,都在焦急的等待成绩的公布.第 \(i\) 位同学希望在第 \ ...
- 北京市行政村边界shp数据/北京市乡镇边界/北京市土地利用分类数据/北京市气象数据/降雨量分布数据/太阳辐射数据
数据下载链接:数据下载链接 北京是一座有着三千多年历史的古都,在不同的朝代有着不同的称谓,大致算起来有二十多个别称.北京地势西北高.东南低.西部.北部和东北部三面环山,东南部是一片缓缓向渤海倾斜的 ...
- @Async注解的坑,小心
大家好,我是三友. 背景 前段时间,一个同事小姐姐跟我说她的项目起不来了,让我帮忙看一下,本着助人为乐的精神,这个忙肯定要去帮. 于是,我在她的控制台发现了如下的异常信息: Exception in ...
- 串口通信:接受数据(仿真task写法)
1.功能描述 设计一个串口数据接收模块.能够以设定的波特率(与发射端口速率匹配)接收数据,并输出保存到一个寄存器中. 2.过程描述 ①边沿检测器,识别出起始位时让接收使能端有效.这里需要排除边沿脉冲的 ...
- Object类中wait代餐方法和notifyAll方法和线程间通信
Object类中wait代餐方法和notifyAll方法 package com.yang.Test.ThreadStudy; import lombok.SneakyThrows; /** * 进入 ...