使用见证服务器:

见证服务器是一个正常的KingbaseES实例,不是流复制群集的一部分; 其目的是,如果发生故障转移情况,则提供证明它是主服务器本身不可用的证据,而不是例如在不同物理位置之间的网络分裂。见证服务器的典型用例是双节点流复制设置,其中主要和备用服务器位于不同的位置(数据中心)。通过在与主服务器相同的位置(数据中心)中创建见证服务器,如果主服务器变得不可用,则备用服务器可以决定是否可以在不“脑裂”情况的情况下提升为主:如果它无法看到见证人或主服务器,它可能存在网络级中断,它不应该提升为主。如果它可以看到见证人但不能看到主节点,这证明没有网络中断且主本身不可用,因此它可以提升自己为主。

对于更复杂的复制方案,例如使用多个数据中心,最好使用基于位置的故障转移,这可确保只有与主服务器位于同一位置的节点才能成为主节点。

要创建见证服务器,请在与群集的主服务器位于同一物理位置的服务器上设置普通的PostgreSQL实例。不应该在与主服务器同一个物理主机创建见证服务器,否则如果主服务器由于硬件问题失败,见证服务器会失效。

数据库版本:

test=# select version();
version
----------------------------------------------------------------------------------------------------------------------
KingbaseES V008R006C003B0010 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
(1 row)

repmgr cluster原架构:

[kingbase@node2 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+----------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------
1 | node248 | primary | * running | | default | 100 | 18 | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
3 | node243 | standby | running | node248 | default | 100 | 18 | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
5 | node243B | standby | running | node248 | default | 100 | 18 | host=192.168.7.243 user=esrep dbname=esrep port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2

一、创建witness服务器

=注意:witness服务器,应该是一个独立的主机节点不能和主库或备库在同一个主机上,并且witness和其他主机之间不构成流复制,所以witness是一个独立的primary实例,其数据库systemID,不应该和其他数据库一致,需要单独initdb一个实例,不能是通过clone或copy生成数据库。=

1)初始化实例(node2节点)

=将cluster其他节点的软件安装文件,拷贝到witness节点,然后重新初始一个实例=

 [kingbase@node2 bin]$ ./initdb -D /home/kingbase/cluster/R6HA/KHA/kingbase/data -E utf8 -U system -W
......

配置repmgr extension:

启动数据库服务:

[kingbase@node2 bin]$ ./sys_ctl -D /home/kingbase/cluster/R6HA/KHA/kingbase/data start

......

server started

2)创建repmgr元数据库和schema

[kingbase@node2 bin]$ ./ksql -U system test
ksql (V8.0)
Type "help" for help. # 创建esrep用户
test=# create user esrp with superuser;
CREATE ROLE
test=# alter user esrep with password 'Kingbaseha110';
ALTER ROLE #创建esrep数据库
test=# create database esrep owner esrep;
CREATE DATABASE
test=# \c esrep esrep
You are now connected to database "esrep" as user "esrep".
esrep=# \d
List of relations
Schema | Name | Type | Owner
--------+---------------------+------+--------
public | sys_stat_statements | view | system
(1 row) # 创建repmgr schema
esrep=# create schema repmgr;
CREATE SCHEMA

二、将witness加入repmgr cluster

1)配置repmgr.conf文件

[kingbase@node2 etc]$ cat repmgr.conf
on_bmj=off
node_id=2
node_name=node249
promote_command='/home/kingbase/cluster/R6HA/KHA/kingbase/bin/repmgr standby promote -f /home/kingbase/cluster/R6HA/KHA/kingbase/etc/repmgr.conf'
follow_command='/home/kingbase/cluster/R6HA/KHA/kingbase/bin/repmgr standby follow -f /home/kingbase/cluster/R6HA/KHA/kingbase/etc/repmgr.conf -W --upstream-node-id=%n'
conninfo='host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2' log_file='/home/kingbase/cluster/R6HA/KHA/kingbase/hamgr.log'
data_directory='/home/kingbase/cluster/R6HA/KHA/kingbase/data'
sys_bindir='/home/kingbase/cluster/R6HA/KHA/kingbase/bin'
ssh_options='-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22'
reconnect_attempts=2
reconnect_interval=3
failover='automatic'
recovery='automatic'
monitoring_history='no'
trusted_servers='192.168.7.1'
virtual_ip='192.168.7.240/24'
net_device='enp0s3'
ipaddr_path='/sbin'
arping_path='/sbin'
synchronous='quorum'
repmgrd_pid_file='/home/kingbase/cluster/R6HA/KHA/kingbase/hamgrd.pid'
ping_path='/usr/bin'
#priority=0

2)注册witness到repmgr cluster

[kingbase@node2 bin]$ ./repmgr witness register -h 192.168.7.248
# -h 指向主库节点ip
INFO: connecting to witness node "node249" (ID: 2)
INFO: connecting to primary node
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
INFO: witness registration complete
NOTICE: witness node "node249" (ID: 2) successfully registered [kingbase@node2 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+----------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------
1 | node248 | primary | * running | | default | 100 | 18 | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
2 | node249 | witness | * running | node248 | default | 0 | 1 | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
3 | node243 | standby | running | node248 | default | 100 | 18 | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
5 | node243B | standby | running | node248 | default | 100 | 18 | host=192.168.7.243 user=esrep dbname=esrep port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2

3)查看witness元数据库数据信息

=witness注册到repmgr cluster后,自动在esrep数据库的repmgr schema下创建repmgr元数据对象=

 [kingbase@node2 bin]$ ./ksql -U esrep esrep
ksql (V8.0)
Type "help" for help. esrep=# \d repmgr.*
Table "repmgr.events"
Column | Type | Collation | Nullable | Default
-----------------+--------------------------+-----------+----------+-------------------
node_id | integer | | not null |
event | text | | not null |
successful | boolean | | not null | true
event_timestamp | timestamp with time zone | | not null | CURRENT_TIMESTAMP
details | text | | | Index "repmgr.idx_monitoring_history_time"
Column | Type | Key? | Definition
-------------------+--------------------------+------+-------------------
last_monitor_time | timestamp with time zone | yes | last_monitor_time
standby_node_id | integer | yes | standby_node_id
btree, for table "repmgr.monitoring_history" Table "repmgr.monitoring_history"
Column | Type | Collation | Nullable | Default
---------------------------+--------------------------+-----------+----------+---------
primary_node_id | integer | | not null |
standby_node_id | integer | | not null |
last_monitor_time | timestamp with time zone | | not null |
last_apply_time | timestamp with time zone | | |
last_wal_primary_location | pg_lsn | | not null |
last_wal_standby_location | pg_lsn | | |
replication_lag | bigint | | not null |
apply_lag | bigint | | not null |
Indexes:
"idx_monitoring_history_time" btree (last_monitor_time, standby_node_id) Table "repmgr.nodes"
Column | Type | Collation | Nullable | Default
------------------+----------------------------+-----------+----------+-----------------
node_id | integer | | not null |
upstream_node_id | integer | | |
active | boolean | | not null | true
node_name | text | | not null |
type | text | | not null |
location | text | | not null | 'default'::text
priority | integer | | not null | 100
conninfo | text | | not null |
repluser | character varying(63 char) | | not null |
slot_name | text | | |
config_file | text | | not null |
Indexes:
"nodes_pkey" PRIMARY KEY, btree (node_id)
Check constraints:
"nodes_type_check" CHECK (type = ANY (ARRAY['primary'::text, 'standby'::text, 'witness'::text, 'bdr'::text]))
Foreign-key constraints:
"nodes_upstream_node_id_fkey" FOREIGN KEY (upstream_node_id) REFERENCES repmgr.nodes(node_id) DEFERRABLE
Referenced by:
TABLE "repmgr.nodes" CONSTRAINT "nodes_upstream_node_id_fkey" FOREIGN KEY (upstream_node_id) REFERENCES repmgr.nodes(node_id) DEFERRABLE Index "repmgr.nodes_pkey"
Column | Type | Key? | Definition
---------+---------+------+------------
node_id | integer | yes | node_id
primary key, btree, for table "repmgr.nodes" View "repmgr.replication_status"
Column | Type | Collation | Nullable | Default
---------------------------+--------------------------+-----------+----------+---------
primary_node_id | integer | | |
standby_node_id | integer | | |
standby_name | text | | |
node_type | text | | |
active | boolean | | |
last_monitor_time | timestamp with time zone | | |
last_wal_primary_location | pg_lsn | | |
last_wal_standby_location | pg_lsn | | |
replication_lag | text | | |
replication_time_lag | interval | | |
apply_lag | text | | |
communication_time_lag | interval | | | View "repmgr.show_nodes"
Column | Type | Collation | Nullable | Default
--------------------+---------+-----------+----------+---------
node_id | integer | | |
node_name | text | | |
active | boolean | | |
upstream_node_id | integer | | |
upstream_node_name | text | | |
type | text | | |
priority | integer | | |
conninfo | text | | | Table "repmgr.voting_term"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
term | integer | | not null |
Indexes:
"voting_term_restrict" UNIQUE, btree ((true))
Rules:
voting_term_delete AS
ON DELETE TO repmgr.voting_term DO INSTEAD NOTHING Index "repmgr.voting_term_restrict"
Column | Type | Key? | Definition
--------+---------+------+------------
bool | boolean | yes | (true)
unique, btree, for table "repmgr.voting_term"

三、witness节点注册故障分析

=如下所示,witness在其他节点的状态为“? unreachable ”。=

[kingbase@node1 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+----------+---------+---------------+----------+----------+----------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------
1 | node248 | primary | * running | | default | 100 | 18 | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
2 | node249 | witness | ? unreachable | node248 | default | 0 | ? | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
3 | node243 | standby | running | node248 | default | 100 | 18 | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
5 | node243B | standby | running | node248 | default | 100 | 18 | host=192.168.7.243 user=esrep dbname=esrep port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2 WARNING: following issues were detected
- unable to connect to node "node249" (ID: 2)

1)测试ksql到witness节点的连接(连接失败)

[kingbase@node1 bin]$ ./ksql -h 192.168.7.249 -U esrep esrep
ksql: error: could not connect to server: could not connect to server: No route to host
Is the server running on host "192.168.7.249" and accepting
TCP/IP connections on port 54321?
# 节点ping
[kingbase@node1 bin]$ ping 192.168.7.249
PING 192.168.7.249 (192.168.7.249) 56(84) bytes of data.
64 bytes from 192.168.7.249: icmp_seq=1 ttl=64 time=0.513 ms
64 bytes from 192.168.7.249: icmp_seq=2 ttl=64 time=0.390 ms
64 bytes from 192.168.7.249: icmp_seq=3 ttl=64 time=0.478 ms
^C
--- 192.168.7.249 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.390/0.460/0.513/0.054 ms

2)查看witness服务器防火墙配置

[root@node2 shell]# iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
ACCEPT udp -- anywhere anywhere udp dpt:domain
ACCEPT tcp -- anywhere anywhere tcp dpt:domain
ACCEPT udp -- anywhere anywhere udp dpt:bootps
ACCEPT tcp -- anywhere anywhere tcp dpt:bootps
ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED
ACCEPT all -- anywhere anywhere
INPUT_direct all -- anywhere anywhere
INPUT_ZONES_SOURCE all -- anywhere anywhere
INPUT_ZONES all -- anywhere anywhere
ACCEPT icmp -- anywhere anywhere
REJECT all -- anywhere anywhere reject-with icmp-host-prohibited Chain FORWARD (policy ACCEPT)
target prot opt source destination
ACCEPT all -- anywhere bogon/24 ctstate RELATED,ESTABLISHED
ACCEPT all -- 192.168.122.0/24 anywhere
ACCEPT all -- anywhere anywhere
REJECT all -- anywhere anywhere reject-with icmp-port-unreachable
REJECT all -- anywhere anywhere reject-with icmp-port-unreachable
ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED
ACCEPT all -- anywhere anywhere
FORWARD_direct all -- anywhere anywhere
FORWARD_IN_ZONES_SOURCE all -- anywhere anywhere
FORWARD_IN_ZONES all -- anywhere anywhere
FORWARD_OUT_ZONES_SOURCE all -- anywhere anywhere
FORWARD_OUT_ZONES all -- anywhere anywhere
ACCEPT icmp -- anywhere anywhere
REJECT all -- anywhere anywhere reject-with icmp-host-prohibited
......

=== 有以上可知,witness服务器节点防火墙被启动===

3)清理witness主机防火墙规则

[root@node2 shell]# iptables -F

4)测试witness主机数据库连接

[kingbase@node1 bin]$ ./ksql -h 192.168.7.249 -U system test
ksql (V8.0)
Type "help" for help.

5)查看集群节点状态

[kingbase@node1 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+----------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------
1 | node248 | primary | * running | | default | 100 | 18 | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
2 | node249 | witness | * running | node248 | default | 0 | 1 | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
3 | node243 | standby | running | node248 | default | 100 | 18 | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
5 | node243B | standby | running | node248 | default | 100 | 18 | host=192.168.7.243 user=esrep dbname=esrep port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2

四、集群failover 切换后

1)查看集群节点状态

[kingbase@node2 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+----------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------
1 | node248 | primary | * running | | default | 100 | 18 | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
2 | node249 | witness | * running | node248 | default | 0 | 1 | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
3 | node243 | standby | running | node248 | default | 100 | 18 | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
5 | node243B | standby | running | node248 | default | 100 | 18 | host=192.168.7.243 user=esrep dbname=esrep port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2

2)集群主备切换后,witness重新注册连接新的主库

[kingbase@node2 bin]$ ./repmgr witness register --force -h 192.168.7.243
INFO: connecting to witness node "node249" (ID: 2)
INFO: connecting to primary node
INFO: "repmgr" extension is already installed
INFO: witness registration complete
NOTICE: witness node "node249" (ID: 2) successfully registered [kingbase@node2 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+----------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------
1 | node248 | standby | running | node243 | default | 100 | 18 | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
2 | node249 | witness | * running | node243 | default | 0 | 1 | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
3 | node243 | primary | * running | | default | 100 | 19 | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
5 | node243B | standby | running | node243 | default | 100 | 18 | host=192.168.7.243 user=esrep dbname=esrep port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2[kingbase@node2 bin]$ ./repmgr witness register --force -h 192.168.7.243
INFO: connecting to witness node "node249" (ID: 2)
INFO: connecting to primary node
INFO: "repmgr" extension is already installed
INFO: witness registration complete
NOTICE: witness node "node249" (ID: 2) successfully registered [kingbase@node2 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+----------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------
1 | node248 | standby | running | node243 | default | 100 | 18 | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
2 | node249 | witness | * running | node243 | default | 0 | 1 | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
3 | node243 | primary | * running | | default | 100 | 19 | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2
5 | node243B | standby | running | node243 | default | 100 | 18 | host=192.168.7.243 user=esrep dbname=esrep port=54322 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2

新主库hamgr.log日志:

 [2021-03-01 12:49:05] [WARNING] unable to ping "host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2"
[2021-03-01 12:49:05] [DETAIL] PQping() returned "PQPING_REJECT"
[2021-03-01 12:49:05] [WARNING] unable to connect to upstream node "node248" (ID: 1)
[2021-03-01 12:49:05] [INFO] sleeping 3 seconds until next reconnection attempt
[2021-03-01 12:49:08] [INFO] checking state of node 1, 1 of 2 attempts
[2021-03-01 12:49:08] [WARNING] unable to ping "user=esrep connect_timeout=10 dbname=esrep host=192.168.7.248 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2 fallback_application_name=repmgr"
[2021-03-01 12:49:08] [DETAIL] PQping() returned "PQPING_REJECT"
[2021-03-01 12:49:08] [INFO] sleeping 3 seconds until next reconnection attempt
[2021-03-01 12:49:11] [INFO] checking state of node 1, 2 of 2 attempts
[2021-03-01 12:49:11] [WARNING] unable to ping "user=esrep connect_timeout=10 dbname=esrep host=192.168.7.248 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2 fallback_application_name=repmgr"
[2021-03-01 12:49:11] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2021-03-01 12:49:11] [WARNING] unable to reconnect to node 1 after 2 attempts
[2021-03-01 12:49:11] [NOTICE] setting "wal_retrieve_retry_interval" to 86405000 milliseconds
[2021-03-01 12:49:12] [WARNING] wal receiver not running
[2021-03-01 12:49:12] [NOTICE] WAL receiver disconnected on all sibling nodes
[2021-03-01 12:49:12] [INFO] WAL receiver disconnected on all 2 sibling nodes
[2021-03-01 12:49:12] [INFO] 2 active sibling nodes registered
[2021-03-01 12:49:12] [INFO] primary and this node have the same location ("default")
[2021-03-01 12:49:12] [INFO] local node's last receive lsn: 5/640000A0
[2021-03-01 12:49:12] [INFO] checking state of sibling node "node249" (ID: 2)
[2021-03-01 12:49:12] [INFO] node "node249" (ID: 2) reports its upstream is node 1, last seen 7 second(s) ago
[2021-03-01 12:49:12] [INFO] node 2 last saw primary node 7 second(s) ago
[2021-03-01 12:49:12] [INFO] checking state of sibling node "node243B" (ID: 5)
[2021-03-01 12:49:12] [WARNING] repmgrd not running on node "node243B" (ID: 5), skipping
[2021-03-01 12:49:12] [INFO] visible nodes: 3; total nodes: 3; no nodes have seen the primary within the last 4 seconds
[2021-03-01 12:49:12] [NOTICE] promotion candidate is "node243" (ID: 3)
[2021-03-01 12:49:12] [NOTICE] setting "wal_retrieve_retry_interval" to 5000 ms
[2021-03-01 12:49:12] [NOTICE] this node is the winner, will now promote itself and inform other nodes
[2021-03-01 12:49:12] [INFO] try to ping the trusted_servers "192.168.7.1" before execute promote_command
[2021-03-01 12:49:14] [NOTICE] PING 192.168.7.1 (192.168.7.1) 56(84) bytes of data. --- 192.168.7.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 2.450/2.460/2.471/0.050 ms
A
[2021-03-01 12:49:14] [NOTICE] successfully ping one or more of the trusted_servers "192.168.7.1"
[2021-03-01 12:49:14] [NOTICE] try to stop old primary db (host: "192.168.7.248")
ERROR: connection to database failed
DETAIL:
could not connect to server: Connection refused
Is the server running on host "192.168.7.248" and accepting
TCP/IP connections on port 54321? DETAIL: attempted to connect using:
user=esrep connect_timeout=10 dbname=esrep host=192.168.7.248 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=2 fallback_application_name=repmgr
[2021-03-01 12:49:16] [NOTICE] PING 192.168.7.240 (192.168.7.240) 56(84) bytes of data. --- 192.168.7.240 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 0.357/0.365/0.374/0.020 ms [2021-03-01 12:49:16] [WARNING] the virtual ip is already on other host, try to release it on old primary node (host: "192.168.7.248")
[2021-03-01 12:49:16] [INFO] SSH connection to host "192.168.7.248" succeeded, ready to release vip on it
[2021-03-01 12:49:17] [NOTICE] old primary node (host: "192.168.7.248") release the virtual ip 192.168.7.240/24 success
[2021-03-01 12:49:17] [NOTICE] will acquire the virtual ip again
[2021-03-01 12:49:18] [NOTICE] PING 192.168.7.240 (192.168.7.240) 56(84) bytes of data. --- 192.168.7.240 ping statistics ---
2 packets transmitted, 0 received, +1 errors, 100% packet loss, time 999ms [2021-03-01 12:49:18] [WARNING] ping host"192.168.7.240" failed
[2021-03-01 12:49:18] [DETAIL] average RTT value is not greater than zero
[2021-03-01 12:49:19] [NOTICE] new primary node (ID: 3) acquire the virtual ip 192.168.7.240/24 success
[2021-03-01 12:49:19] [INFO] promote_command is:
"/home/kingbase/cluster/R6HA/KHA/kingbase/bin/repmgr standby promote -f /home/kingbase/cluster/R6HA/KHA/kingbase/etc/repmgr.conf"
WARNING: 2 sibling nodes found, but option "--siblings-follow" not specified
DETAIL: these nodes will remain attached to the current primary:
node249 (node ID: 2, witness server)
node243B (node ID: 5)
NOTICE: promoting standby to primary
DETAIL: promoting server "node243" (ID: 3) using sys_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
INFO: SET synchronous TO "async" on primary host
NOTICE: STANDBY PROMOTE successful
DETAIL: server "node243" (ID: 3) was successfully promoted to primary

KingbaseES R6 集群repmgr witness 手工配置案例的更多相关文章

  1. KingbaseES R6 集群repmgr.conf参数'recovery'测试案例(一)

    KingbaseES R6集群repmgr.conf参数'recovery'测试案例(一) 案例说明: 在KingbaseES R6集群中,主库节点出现宕机(如重启或关机),会产生主备切换,但是当主库 ...

  2. KingbaseES R6 集群repmgr.conf参数'recovery'测试案例(三)

    案例三:测试'recovery = manual' 1.查看集群节点状态信息: [kingbase@node1 bin]$ ./repmgr cluster show ID | Name | Role ...

  3. KingbaseES R6 集群repmgr.conf参数'recovery'测试案例(二)

    案例二:测试'recovery = automatic' 1.查看集群节点状态信息: [kingbase@node1 bin]$ ./repmgr cluster show ID | Name | R ...

  4. KingbaseES R6集群归档备份故障分析解决案例

    案例说明: 在使用ps工具查看主库进程,发现主库'archiver'进程失败,检查sys_log日志可以发现归档失败的信息.通过sys_log日志提取归档语句手工执行归档操作,提示"当前数据 ...

  5. KingbaseES R6 集群主库网卡down测试案例

    数据库版本: test=# select version(); version ------------------------------------------------------------ ...

  6. KingbaseES R6 集群“双主”故障解决案例

    实际工作中,可能会碰到集群脑裂的情况,在脑裂时,会出现双 primary情况.这时,需要用户介入,人工判断哪个节点的数据最新,减少数据丢失. 一.测试环境信息 操作系统: [kingbase@node ...

  7. KingbaseES R6 集群 recovery 参数对切换的影响

    案例说明:在KingbaseES R6集群中,主库节点出现宕机(如重启或关机),会产生主备切换,但是当主库节点系统恢复正常后,如何对原主库节点进行处理,保证集群数据的一致性和安全,可以通过对repmg ...

  8. KingbaseES R6 集群修改data目录

    案例说明: 本案例是在部署完成KingbaseES R6集群后,由于业务的需求,集群需要修改data(数据存储)目录的测试.本案例分两种修改方式,第一种是离线修改data目录,即关闭整个集群后,修改数 ...

  9. KingbaseES R6 集群修改物理IP和VIP案例

    在用户的实际环境里,可能有时需要修改主机的IP,这就涉及到集群的配置修改.以下以例子的方式,介绍下KingbaseES R6集群如何修改IP. 一.案例测试环境 操作系统: [KINGBASE@nod ...

随机推荐

  1. Event Loop我知道,宏任务微任务是什么鬼?

    在介绍宏任务和微任务之前,先抛出一个问题.相信大家在面试的时候,会遇到这样的相似的问题: setTimeout(function(){undefined console.log('1') }); ne ...

  2. HDLBits->Circuits->Multiplexers->Mux256to1v

    Verilog切片语法 题目要求如下 Create a 4-bit wide, 256-to-1 multiplexer. The 256 4-bit inputs are all packed in ...

  3. docker 映射端口穿透内置防火墙

    一.问题现象 1.现象举例: # 自制的springboot项目的dockerfile # springboot 其实就是一个简单的hello-world程序,写了一个HelloController ...

  4. 交替方向乘子法(Alternating Direction Multiplier Method,ADMM)

    交替方向乘子法(Alternating Direction Multiplier Method,ADMM)是一种求解具有可分结构的凸优化问题的重要方法,其最早由Gabay和Mercier于1967年提 ...

  5. Python语言之面向对象

    Python语言之面向对象 前言 面向对象 -- Object Oriented 简写:OO 面向对象设计 -- Object Oriented Design 简写:OOD 面向对象编程 -- Obj ...

  6. 5-3 Dubbo | 负载均衡

    Dubbo概述 什么是RPC RPC是Remote Procedure Call的缩写 翻译为:远程过程调用 目标是为了实现两台(多台)计算机\服务器,互相调用方法\通信的解决方案 RPC的概念主要定 ...

  7. 分享|2022数字安全产业大数据白皮书(附PDF)

    内容摘要: 2021年以来,数字安全赛道的受关注程度达到一个历史新高度.<数据安全法><个人信息保护法><关键信息基础设施安全保护条例>,一个接一个重磅的法规接连出 ...

  8. SpringCloud微服务实战——搭建企业级开发框架(四十四):【微服务监控告警实现方式一】使用Actuator + Spring Boot Admin实现简单的微服务监控告警系统

      业务系统正常运行的稳定性十分重要,作为SpringBoot的四大核心之一,Actuator让你时刻探知SpringBoot服务运行状态信息,是保障系统正常运行必不可少的组件.   spring-b ...

  9. vue-resource && axios

    1 # axios 2 # 1.安装:npm i axios 3 # 2.使用: 4 import axios from 'axios' 5 axios.get(URL).then(response= ...

  10. docker + Umami + Postgresql 网站访问分析

    1 # docker + Umami + Postgresql 2 # 官方安装文档:https://umami.is/docs/install 3 # 一.创建数据库 4 # 1.创建用户 5 CR ...