案例说明:

KingbaseES R6集群启动时,出现“incorrect command permissions for the virtual ip”故障,本案例介绍了如何分析和解决此案例方法和步骤。

数据库版本:

test=# select version();
version
----------------------------------------------------------------------------------------------------------------------
KingbaseES V008R006C005B0023 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
(1 row)

集群架构:

一、集群启动失败

[kingbase@node3 bin]$ ./sys_monitor.sh start
2021-03-01 13:27:26 Ready to start all DB ...
2021-03-01 13:27:26 begin to start DB on "[192.168.7.243]".
incorrect command permissions for the virtual ip.
waiting for server to start..... done
server started
2021-03-01 13:27:30 execute to start DB on "[192.168.7.243]" success, connect to check it.
2021-03-01 13:27:31 DB on "[192.168.7.243]" start success.
2021-03-01 13:27:32 Try to ping trusted_servers on host 192.168.7.248 ...
2021-03-01 13:27:34 Try to ping trusted_servers on host 192.168.7.243 ...
2021-03-01 13:27:37 begin to start DB on "[192.168.7.248]".
incorrect command permissions for the virtual ip.
waiting for server to start..... done
server started
2021-03-01 13:27:40 execute to start DB on "[192.168.7.248]" success, connect to check it.
2021-03-01 13:27:41 DB on "[192.168.7.248]" start success.
ERROR: No execute permission for "/home/kingbase/cluster/R6C5/R6C5R//kingbase/bin/arping"
incorrect command permissions for the virtual ip.
2021-03-01 13:27:41 There is no primary DB running, will do nothing and exit.

=从以上错误信息可知,在加载vip时访问arping时,出现权限问题=

二、故障分析

1、查看repmgr配置信息

[kingbase@node3 bin]$ cat ../etc/repmgr.conf
on_bmj=off
node_id=1
node_name='node243'
promote_command='/home/kingbase/cluster/R6C5/R6C5R/kingbase/bin/repmgr standby promote -f /home/kingbase/cluster/R6C5/R6C5R/kingbase/etc/repmgr.conf'
follow_command='/home/kingbase/cluster/R6C5/R6C5R/kingbase/bin/repmgr standby follow -f /home/kingbase/cluster/R6C5/R6C5R/kingbase/etc/repmgr.conf -W --upstream-node-id=%n'
conninfo='host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3'
log_file='/home/kingbase/cluster/R6C5/R6C5R/kingbase/log/hamgr.log'
kbha_log_file='/home/kingbase/cluster/R6C5/R6C5R/kingbase/log/kbha.log'
data_directory='/home/kingbase/cluster/R6C5/R6C5R/kingbase/data'
sys_bindir='/home/kingbase/cluster/R6C5/R6C5R/kingbase/bin'
ssh_options='-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22'
reconnect_attempts=10
reconnect_interval=6
failover='automatic'
recovery='standby'
monitoring_history='no'
trusted_servers='192.168.7.1'
virtual_ip='192.168.7.241/24'
net_device='enp0s3'
net_device_ip='192.168.7.243'
ipaddr_path='/sbin'
arping_path='/home/kingbase/cluster/R6C5/R6C5R//kingbase/bin'
synchronous='sync'
repmgrd_pid_file='/home/kingbase/cluster/R6C5/R6C5R/kingbase/etc/hamgrd.pid'
kbha_pid_file='/home/kingbase/cluster/R6C5/R6C5R/kingbase/etc/kbha.pid'
ping_path='/usr/bin'
auto_cluster_recovery_level=1
use_check_disk=off

=此版本使用的arping是数据库软件包自带的工具=

2、查看arping版本

3、查看arping权限

[kingbase@node1 bin]$ ls -lh arping
-rwxr-xr-x 1 kingbase root 11K Nov 5 2021 arping

三、问题解决步骤

1、配置arping所有者为kingbase用户

1)配置权限

[kingbase@node1 bin]$ chown -R kingbase.kingbase arping
[kingbase@node1 bin]$ ls -lh arping
-rwxr-xr-x 1 kingbase kingbase 11K Nov 5 2021 arping

2)启动集群(故障依旧)

2、配置arping所有者为root并分配setuid权限

1)配置权限

[root@node3 ~]# cd /home/kingbase/cluster/R6C5/R6C5R//kingbase/bin
[root@node3 bin]# chown -R root.root arping
[root@node3 bin]# chmod u+s arping
[root@node3 bin]# ls -lh arping
-rwsr-xr-x 1 root root 11K Nov 5 2021 arping

2)启动集群

[kingbase@node3 bin]$ ./sys_monitor.sh start
2021-03-01 13:38:04 Ready to start all DB ...
2021-03-01 13:38:04 begin to start DB on "[192.168.7.243]".
2021-03-01 13:38:05 DB on "[192.168.7.243]" already started, connect to check it.
2021-03-01 13:38:06 DB on "[192.168.7.243]" start success.
2021-03-01 13:38:06 Try to ping trusted_servers on host 192.168.7.248 ...
2021-03-01 13:38:08 Try to ping trusted_servers on host 192.168.7.243 ...
2021-03-01 13:38:11 begin to start DB on "[192.168.7.248]".
2021-03-01 13:38:12 DB on "[192.168.7.248]" already started, connect to check it.
2021-03-01 13:38:13 DB on "[192.168.7.248]" start success.
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------
1 | node243 | primary | * running | | default | 100 | 3 | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node248 | standby | running | node243 | default | 100 | 3 | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2021-03-01 13:38:13 The primary DB is started.
2021-03-01 13:38:13 check synchronous_standby_names ...
t
2021-03-01 13:38:24 Success to load virtual ip [192.168.7.241/24] on primary host [192.168.7.243].
2021-03-01 13:38:24 Try to ping vip on host 192.168.7.248 ...
2021-03-01 13:38:26 Try to ping vip on host 192.168.7.243 ...
2021-03-01 13:38:29 begin to start repmgrd on "[192.168.7.248]".
[2021-03-01 13:40:52] [NOTICE] using provided configuration file "/home/kingbase/cluster/R6C5/R6C5R/kingbase/bin/../etc/repmgr.conf"
[2021-03-01 13:40:52] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6C5/R6C5R/kingbase/log/hamgr.log" 2021-03-01 13:38:30 execute to start repmgrd on "[192.168.7.248]" failed.
2021-03-01 13:38:30 begin to start repmgrd on "[192.168.7.243]".
[2021-03-01 13:38:30] [NOTICE] using provided configuration file "/home/kingbase/cluster/R6C5/R6C5R/kingbase/bin/../etc/repmgr.conf"
[2021-03-01 13:38:30] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6C5/R6C5R/kingbase/log/hamgr.log" 2021-03-01 13:38:32 repmgrd on "[192.168.7.243]" start success.
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+---------+---------+-----------+----------+-------------+-------+---------+--------------------
1 | node243 | primary | * running | | running | 12552 | no | n/a
2 | node248 | standby | running | node243 | not running | n/a | n/a | n/a
[2021-03-01 13:40:56] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6C5/R6C5R/kingbase/log/kbha.log" [2021-03-01 13:38:37] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6C5/R6C5R/kingbase/log/kbha.log" 2021-03-01 13:38:39 Done. [kingbase@node3 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------------------------------------------------------------------
1 | node243 | primary | * running | | default | 100 | 3 | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node248 | standby | running | node243 | default | 100 | 3 | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

=== 由以上可知,集群启动成功。===

四、总结

对于kingbaseES R6集群使用数据库系统自带的arping软件包,一般不会出现版本不匹配的问题;对于arping工具的属主应该是root,不是kingbase用户,但为了kingbase用户也能执行arping,必须配置arping的setuid权限。

KingbaseES R6 集群启动‘incorrect command permissions for the virtual ip’故障案例的更多相关文章

  1. KingbaseES R6 集群手工配置VIP案例

    经常有用户问,V8R6集群搭建时没有配置VIP,搭建完成后,如何添加VIP?以下向大家介绍下手动添加VIP 的过程. 一.操作系统环境 操作系统(UOS): root@uos01:~# cat /et ...

  2. KingbaseES R6 集群创建流复制只读副本库案例

    一.环境概述 [kingbase@node2 bin]$ ./ksql -U system test ksql (V8.0) Type "help" for help. test= ...

  3. KingbaseES R6 集群修改物理IP和VIP案例

    在用户的实际环境里,可能有时需要修改主机的IP,这就涉及到集群的配置修改.以下以例子的方式,介绍下KingbaseES R6集群如何修改IP. 一.案例测试环境 操作系统: [KINGBASE@nod ...

  4. KingbaseES R6 集群 recovery 参数对切换的影响

    案例说明:在KingbaseES R6集群中,主库节点出现宕机(如重启或关机),会产生主备切换,但是当主库节点系统恢复正常后,如何对原主库节点进行处理,保证集群数据的一致性和安全,可以通过对repmg ...

  5. KingbaseES R6 集群修改data目录

    案例说明: 本案例是在部署完成KingbaseES R6集群后,由于业务的需求,集群需要修改data(数据存储)目录的测试.本案例分两种修改方式,第一种是离线修改data目录,即关闭整个集群后,修改数 ...

  6. KingbaseES R6 集群通过备库clone在线添加新节点

    案例说明: KingbaseES R6集群可以通过图形化方式在线添加新节点,但是在添加新节点clone环节时,是从主库copy数据到新的节点,这样在生产环境,如果数据量大,将会对主库的网络I/O造成压 ...

  7. KingbaseES R6 集群repmgr.conf参数'recovery'测试案例(一)

    KingbaseES R6集群repmgr.conf参数'recovery'测试案例(一) 案例说明: 在KingbaseES R6集群中,主库节点出现宕机(如重启或关机),会产生主备切换,但是当主库 ...

  8. KingbaseES R6 集群sys_monitor.sh change_password一键修改集群用户密码

    案例说明: kingbaseES R6集群用户密码修改,需要修改两处: 1)修改数据库用户密码(alter user): 2)修改.encpwd文件中用户密码: 可以通过sys_monitor.sh ...

  9. KingbaseES R6 集群禁用 root ssh 后需要修改集群为es_server 案例

    案例说明: 在生产环境下,由于安全需要,主机间不允许建立root用户的ssh信任连接,这样导致KingbaseES R6 repmgr集群,通过sys_monitor.sh脚本启动集群时,节点之间不能 ...

随机推荐

  1. python基础知识-day8(函数实战)

    1 def out(): 2 username=input("请输入用户名:\n") 3 password=input("请输入密码:\n") 4 return ...

  2. SAP 文本框多行输入

    REPORT zjw_test01. CONSTANTS: gc_text_line_length TYPE i VALUE 72. TYPES: text_table_type(gc_text_li ...

  3. UiPath图片操作截图的介绍和使用

    一.截图(Take Screenshot)的介绍 截取指定的UI元素屏幕截图的一种活动,输出量仅支持图像变量(image) 二.Take Screenshot在UiPath中的使用 1. 打开设计器, ...

  4. Bitbucket 使用 SSH 拉取仓库失败的问题

    问题 在 Bitbucket 使用 Linux 机器上 ssh-keygen 工具生成的公钥作为 API KEY,然后在 Jenkins 里面存储对应的 SSH 私钥,最后执行 Job 的时候,Win ...

  5. c# 国内外ORM 框架 dapper efcore sqlsugar freesql hisql sqlserver数据常规插入测试性能对比

    c# 国内外ORM 框架 dapper efcore sqlsugar freesql hisql sqlserver数据常规插入测试性能对比对比 在6.22 号发布了 c# sqlsugar,his ...

  6. [零基础学IoT Pwn] 复现Netgear WNAP320 RCE

    [零基础学IoT Pwn] 复现Netgear WNAP320 RCE 0x00 前言: 这是[零基础学IoT Pwn]的第二篇,上篇我们搭好了仿真环境后,模拟运行了Netgear WNAP320固件 ...

  7. 一文读懂数仓中的pg_stat

    摘要:GaussDB(DWS)在SQL执行过程中,会记录表增删改查相关的运行时统计信息,并在事务提交或回滚后记录到共享的内存中.这些信息可以通过 "pg_stat_all_tables视图& ...

  8. Git的历史和安装Git及环境配置

    Git历史同生活中的许多伟大事物一样,Git 诞生于一个极富纷争大举创新的年代. Linux 内核开源项目有着为数众广的参与者.绝大多数的 Linux 内核维护工作都花在了提交补丁和保存归档的繁琐事务 ...

  9. go将青龙面板里面的脚本文件都下载到本地

    纯粹练手用的,大家轻喷 青龙面板的脚本文件可以下载到本地,这样的话自己可以研究一下对应的脚本文件,能学到更多的知识,原理其实很简单,F12一下就知道了,青龙面板使用Request Headers里面放 ...

  10. odoo14 button 事件调用python方法如何传递参数

    1 <field name="user_ids" 2 mode="kanban" 3 nolabel="1" 4 options=&q ...