案例说明:

1、对于集群中的wal日志,除了需要在备库执行recovery外,在集群主备切换(switchover或failover)时,sys_rewind都要读取wal日志,将数据库恢复到一致性状态。

2、对于集群主备库中的wal日志,在清理时,经过测试,理论上在checkpoint所在的wal日志之前的都可以清理,但这是比较理想的状态,在生产环境中,建议保留3天到一个星期的wal日志,避免因为主备延迟,导致在集群切换时,因为缺少wal日志失败。

3、对于KingbaseES V8R6的集群,如果在主备库上通过sys_backup.sh工具建立了备份,归档日志会自动备份,应该也会随着历史备份的清理,自动被清理。如果节点没有建立sys_backup.sh的备份,可以通过 sys_archivecleanup工具清理,原则也是在生产环境中,建议保留3天到一个星期的归档日志。

数据库版本:

test=# select version;
version
------------------------------------------------------------------------------------------------------------------ KingbaseES V008R006C005B0023 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
(1 row)

集群节点信息:

[kingbase@node1 bin]$ cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.8.200 node1 #集群节点node200
192.168.8.201 node2 #集群节点node201 ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+-------+---------+--------------------
1 | node200 | primary | * running | | running | 29303 | no | n/a
2 | node201 | standby | running | node200 | running | 29748 | no | 1 second(s) ago

一、集群switchover切换测试

1、查看主备库控制文件信息

1)主库控制文件

[kingbase@node1 bin]$ ./sys_controldata -D ../data
sys_control version number: 1201
Catalog version number: 202110271
Database system identifier: 7094057752387829054
Database cluster state: in production
sys_control last modified: Tue 10 May 2022 12:33:09 PM CST
Latest checkpoint location: 1/29001768
Latest checkpoint's REDO location: 1/29001738
Latest checkpoint's REDO WAL file: 000000030000000100000029
Latest checkpoint's TimeLineID: 3

2)备库控制文件

[kingbase@node2 bin]$ ./sys_controldata -D ../data
sys_control version number: 1201
Catalog version number: 202110271
Database system identifier: 7094057752387829054
Database cluster state: in archive recovery
sys_control last modified: Thu 19 May 2022 12:05:06 PM CST
Latest checkpoint location: 1/29001768
Latest checkpoint's REDO location: 1/29001738
Latest checkpoint's REDO WAL file: 000000030000000100000029
Latest checkpoint's TimeLineID: 3

2、清理wal日志(将主备库日志都只保留checkpoint所在的wal日志文件(包括)及以后的)

# 主库保留wal日志

[kingbase@node1 sys_wal]$ ls -lh
total 49M -rw-------. 1 kingbase kingbase 16M May 10 13:19 000000030000000100000029
-rw-------. 1 kingbase kingbase 16M May 10 13:19 00000003000000010000002A
-rw-------. 1 kingbase kingbase 16M May 10 13:23 00000003000000010000002B
-rw-------. 1 kingbase kingbase 85 May 18 11:28 00000003.history
drwx------. 2 kingbase kingbase 24K May 10 13:19 archive_status
drwxrwxr-x. 2 kingbase kingbase 4.0K May 19 12:58 log_bk # 备库保留wal日志
[kingbase@node2 sys_wal]$ ls -lh
total 49M
-rw------- 1 kingbase kingbase 16M May 19 12:51 000000030000000100000029
-rw------- 1 kingbase kingbase 16M May 19 12:51 00000003000000010000002A
-rw------- 1 kingbase kingbase 16M May 19 12:55 00000003000000010000002B
-rw------- 1 kingbase kingbase 85 May 18 11:28 00000003.history
drwx------ 2 kingbase kingbase 12K May 19 12:51 archive_status
drwxrwxr-x 2 kingbase kingbase 4.0K May 19 13:00 log_bk

3、执行repmgr standby switchover

1)查看当前集群状态信息

[kingbase@node2 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------
1 | node200 | primary | * running | | default | 100 | 3 | host=192.168.8.200 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node201 | standby | running | node200 | default | 100 | 3 | host=192.168.8.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

2)执行switchover

[kingbase@node2 bin]$ ./repmgr standby switchover -h 192.168.8.200 -U esrep -d esrep
WARNING: following problems with command line parameters detected:
......
INFO: unpause node "node201" (ID 2) successfully
NOTICE: STANDBY SWITCHOVER has completed successfully [kingbase@node2 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------
1 | node200 | standby | running | node201 | default | 100 | 3 | host=192.168.8.200 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node201 | primary | * running | | default | 100 | 4 | host=192.168.8.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 执行回切测试:
[kingbase@node1 bin]$ ./repmgr standby switchover -h 192.168.8.201 -U esrep -d esrep
WARNING: following problems with command line parameters detected: INFO: unpause node "node201" (ID 2) successfully
NOTICE: STANDBY SWITCHOVER has completed successfully [kingbase@node1 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------
1 | node200 | primary | * running | | default | 100 | 5 | host=192.168.8.200 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node201 | standby | running | node200 | default | 100 | 4 | host=192.168.8.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

=== 如上所示,switchover切换成功!====

二、集群 failover 切换测试

1、查看当前集群状态信息

[kingbase@node2 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------
1 | node200 | standby | running | node201 | default | 100 | 5 | host=192.168.8.200 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node201 | primary | * running | | default | 100 | 6 | host=192.168.8.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

2、查看主备库控制文件信息

# 主库:
[kingbase@node2 bin]$ ./sys_controldata -D ../data
sys_control version number: 1201
Catalog version number: 202110271
Database system identifier: 7094057752387829054
Database cluster state: in production
sys_control last modified: Thu 19 May 2022 01:26:08 PM CST
Latest checkpoint location: 1/409BA150
Latest checkpoint's REDO location: 1/3EADD130
Latest checkpoint's REDO WAL file: 00000006000000010000003E

# 备库:
[kingbase@node1 bin]$ ./sys_controldata -D ../data
sys_control version number: 1201
Catalog version number: 202110271
Database system identifier: 7094057752387829054
Database cluster state: in archive recovery
sys_control last modified: Thu 19 May 2022 01:22:19 PM CST
Latest checkpoint location: 1/37000028
Latest checkpoint's REDO location: 1/37000028
Latest checkpoint's REDO WAL file: 000000050000000100000037

3、清理主备库wal日志(将主备库日志都只保留checkpoint所在的wal日志文件(包括)及以后的)

# 主库保留wal日志
[kingbase@node2 sys_wal]$ ls -lh
total 65M
-rw------- 1 kingbase kingbase 16M May 19 13:25 00000006000000010000003E
-rw------- 1 kingbase kingbase 16M May 19 13:26 00000006000000010000003F
-rw------- 1 kingbase kingbase 16M May 19 13:26 000000060000000100000040
-rw------- 1 kingbase kingbase 16M May 19 13:26 000000060000000100000041
-rw------- 1 kingbase kingbase 214 May 19 13:18 00000006.history
drwx------ 2 kingbase kingbase 16K May 19 13:26 archive_status
drwxrwxr-x 2 kingbase kingbase 4.0K May 19 13:30 log_bk # 备库保留wal日志
[kingbase@node1 sys_wal]$ ls -lh
total 193M
-rw-------. 1 kingbase kingbase 16M May 19 13:17 000000050000000100000037
-rw-------. 1 kingbase kingbase 171 May 19 13:03 00000005.history
-rw-------. 1 kingbase kingbase 16M May 19 13:23 000000060000000100000037
-rw-------. 1 kingbase kingbase 16M May 19 13:24 000000060000000100000038
-rw-------. 1 kingbase kingbase 16M May 19 13:24 000000060000000100000039
-rw-------. 1 kingbase kingbase 16M May 19 13:24 00000006000000010000003A
-rw-------. 1 kingbase kingbase 16M May 19 13:24 00000006000000010000003B
-rw-------. 1 kingbase kingbase 16M May 19 13:25 00000006000000010000003C
-rw-------. 1 kingbase kingbase 16M May 19 13:25 00000006000000010000003D
-rw-------. 1 kingbase kingbase 16M May 19 13:25 00000006000000010000003E
-rw-------. 1 kingbase kingbase 16M May 19 13:25 00000006000000010000003F
-rw-------. 1 kingbase kingbase 16M May 19 13:26 000000060000000100000040
-rw-------. 1 kingbase kingbase 16M May 19 13:26 000000060000000100000041
-rw-------. 1 kingbase kingbase 214 May 19 13:21 00000006.history
drwx------. 2 kingbase kingbase 24K May 19 13:26 archive_status
drwxrwxr-x. 2 kingbase kingbase 4.0K May 19 13:28 log_bk

4、执行failover切换测试

1)关闭主库数据库服务

[kingbase@node2 bin]$ ./sys_ctl stop -D ../data
waiting for server to shut down....... done
server stopped

2)查看切换结果

[kingbase@node1 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------
1 | node200 | primary | * running | | default | 100 | 7 | host=192.168.8.200 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node201 | standby | running | | default | 100 | 6 | host=192.168.8.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

=== 如上所示,failover切换成功!====

三、总结

手工清理wal日志,请参考《KingbaseES 单实例环境wal(xlog)日志清理案例》

https://www.cnblogs.com/kingbase/p/16263467.html

KingbaseES V8R6 集群环境wal日志清理的更多相关文章

  1. kingbaseES V8R6集群备份恢复案例之---备库作为repo主机执行物理备份

    ​ 案例说明: 此案例是在KingbaseES V8R6集群环境下,当主库磁盘空间不足时,执行sys_rman备份,将集群的备库节点作为repo主机,执行备份,并将备份存储在备库的磁盘空间. 集群架构 ...

  2. KingbaseES V8R6集群外部备份案例

    案例说明: 本案例采用sys_backup.sh执行物理备份,备份使用如下逻辑架构:集群采用CentOS 7系统,repo采用kylin V10 Server. 一主一备+外部备份 此场景为主备双机常 ...

  3. KingbaseES V8R6集群维护之--修改数据库服务端口案例

    ​ 案例说明: 对于KingbaseES数据库单实例环境,只需要修改kingbase.conf文件的'port'参数即可,但是对于KingbaseES V8R6集群中涉及到多个配置文件的修改,并且在应 ...

  4. KingbaseES V8R6集群维护案例之---停用集群node_export进程

    案例说明: 在KingbaseES V8R6集群启动时,会启动node_exporter进程,此进程主要用于向kmonitor监控服务输出节点状态信息.在系统安全漏洞扫描中,提示出现以下安全漏洞: 对 ...

  5. Oracle RAC 集群环境下日志文件结构

    Oracle RAC 集群环境下日志文件结构 在Oracle RAC环境中,对集群中的日志的定期检查是必不可少的.通过查看集群日志,可以早期定位集群环境中出现的问题,以便将问题消灭在萌芽状态.简单介绍 ...

  6. KingbaseES V8R6集群管理运维案例之---repmgr standby switchover故障

    案例说明: 在KingbaseES V8R6集群备库执行"repmgr standby switchover"时,切换失败,并且在执行过程中,伴随着"repmr stan ...

  7. KingbaseES V8R6集群部署案例之---Windows环境配置主备流复制(异机复制)

    案例说明: 目前KingbaseES V8R6的Windows版本不支持数据库sys_rman的物理备份,可以考虑通过建立主备流复制实现数据库的异机物理备份.本案例详细介绍了,在Windows环境下建 ...

  8. KingbaseES V8R6集群部署案例之---Windows环境配置主备流复制(同一主机)

    案例说明: 目前KingbaseES V8R6的Windows版本不支持数据库sys_rman的物理备份,可以考虑通过建立主备流复制实现数据库的异机物理备份.本案例详细介绍了,在Windows环境下建 ...

  9. KingbaseES V8R6集群维护案例之--单实例数据迁移到集群案例

    案例说明: 生产环境是单实例,测试环境是集群,现需要将生产环境的数据迁移到集群中运行,本文档详细介绍了从单实例环境恢复数据到集群环境的操作步骤,可以作为生产环境迁移数据的参考. 适用版本: Kingb ...

随机推荐

  1. Linux定时任务--Crond使用教程

    Linux定时任务--Crond使用教程 1. 介绍Crond crond是linux下用来周期性的执行某种任务或等待处理某些事件的一个守护进程,与windows下的计划任务类似,当安装完成操作系统后 ...

  2. uipath 如何利用函数split切割换行符?

    uipath 如何利用函数split切割换行符? 答案在这 https://rpazj.com/thread-178-1-1.html

  3. Metasploit(msf)利用ms17_010(永恒之蓝)出现Encoding::UndefinedConversionError问题

    Metasploit利用ms17_010(永恒之蓝) 利用流程 先确保目标靶机和kali处于同一网段,可以互相Ping通 目标靶机防火墙关闭,开启了445端口 输入search ms17_010 搜索 ...

  4. 使用Runnable和Callable接口实现多线程的区别

    使用Runnable和Callable接口实现多线程的区别 先看两种实现方式的步骤: 1.实现Runnable接口 public class ThreadDemo{ public static voi ...

  5. JDK9对集合添加的优化of方法和Debug追踪

    JDK9对集合添加的优化(of方法) JDK9的新特性: 1.List接口,Set接口,Map接口:里边增加了一个静态的方法of,可以给集合一次性添加多个元素 2.static List of (E- ...

  6. JAVA语言的跨平台性和JDK,JRE与JVM

    Java虚拟机--JVM ~JVM:java虚拟机简称JVM是运行所有java程序的假想计算机,是java程序的运行环境,是java最具有吸引力的特性之一,我们编写的java代码,都运行在JVM之上 ...

  7. 【跟着大佬学JavaScript】之lodash防抖节流合并

    前言 前面已经对防抖和节流有了介绍,这篇主要看lodash是如何将防抖和节流合并成一个函数的. 初衷是深入lodash,学习它内部的好代码并应用,同时也加深节流防抖的理解.这里会先从防抖开始一步步往后 ...

  8. C++基本数据类型范围和区别(详细)

    一.基本类型的大小及范围的总结(以下所讲都是默认在32位操作系统下):字节:byte:位:bit.1.短整型short:所占内存大小:2byte=16bit:所能表示范围:-32768~32767:( ...

  9. ArrayDeque(JDK双端队列)源码深度剖析

    ArrayDeque(JDK双端队列)源码深度剖析 前言 在本篇文章当中主要跟大家介绍JDK给我们提供的一种用数组实现的双端队列,在之前的文章LinkedList源码剖析当中我们已经介绍了一种双端队列 ...

  10. git fetch和git pull对比

    情景重现 你:面试官您好,我是xxx,毕业于xxx学校,工作xxx年,精通各种git命令. 面试官:您好您好,我问个常见的问题考察一下您的技术水平哈.请问,git pull和git fetch有什么区 ...