案例说明:

生产环境是集群环境,测试环境是集群,现需要将生产环境的数据迁移到测试集群中运行,本文档详细介绍了从集群环境迁移数据的操作步骤,可以作为生产环境迁移数据的参考。

适用版本:

KingbaseES V8R6

本案例数据库版本(集群使用相同的版本):

test=# select version();
version
----------------------------------------------------------------------------------------------------------------------
KingbaseES V008R006C005B0041 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
(1 row)

生产集群节点信息:

[kingbase@node1 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | host=192.168.1.201 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node2 | standby | running | node1 | default | 100 | 1 | host=192.168.1.202 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

测试集群节点信息:

[kingbase@node101 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
1 | node101 | primary | * running | | default | 100 | 13 | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node102 | standby | running | node101 | default | 100 | 13 | host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

一、生产环境迁移数据前的准备

1、生产环境数据信息

prod=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+--------+----------+----------+-------------+-------------------
esrep | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
prod | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
prod1 | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
prod2 | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
security | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
template0 | system | UTF8 | ci_x_icu | zh_CN.UTF-8 | =c/system +
| | | | | system=CTc/system
template1 | system | UTF8 | ci_x_icu | zh_CN.UTF-8 | =c/system +
| | | | | system=CTc/system
test | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
(8 rows) prod=# select count(*) from t2;
count
--------
100000
(1 row)

2、关闭生产集群

[kingbase@node1 bin]$ ./sys_monitor.sh stop
2022-06-20 16:00:46 Ready to stop all DB ...
.......
2022-06-20 16:01:02 DB on "[192.168.1.202]" stop success.
2022-06-20 16:01:02 begin to stop DB on "[192.168.1.101]".
waiting for server to shut down....... done
server stopped
2022-06-20 16:01:06 DB on "[192.168.1.201]" stop success.
2022-06-20 16:01:06 Done.

二、迁移生产数据到测试环境

Tips:

1)将生产数据迁移到集群,需要停止生产数据库服务,根据data目录数据的大小,要估算停机窗口时间。

2)在生产数据库前,建议手工创建检查点,如果wal日志比较大,建议备份后,清理wal日志,只需要保留最近一天的日志到最近检查点后即可。

3)需要跨主机将生产库主库data目录拷贝到集群的主备库节点,需 根据网络带宽和节点数,估算整个拷贝时间。

1、关闭测试集群

[kingbase@node101 bin]$ ./sys_monitor.sh stop
2022-06-20 16:10:46 Ready to stop all DB ... 2022-06-20 16:11:02 DB on "[192.168.1.102]" stop success.
2022-06-20 16:11:02 begin to stop DB on "[192.168.1.101]".
waiting for server to shut down....... done
server stopped
2022-06-20 16:11:06 DB on "[192.168.1.101]" stop success.
2022-06-20 16:11:06 Done.

2、将测试库数据备份

[kingbase@node101 kingbase]$ mv data data.bk

3、拷贝生产集群主库data到测试集群主备库(所有节点)

1)拷贝生产数据到测试库

[kingbase@node1 kingbase]$ scp -r data node101:/home/kingbase/cluster/R6HA/kha/kingbase/
[kingbase@node1 kingbase]$ scp -r data node102:/home/kingbase/cluster/R6HA/kha/kingbase/

2)备库创建standby.signal

[kingbase@node102 data]$ touch standby.signal

3)复制测试集群的数据库配置文件:(所有节点)

[kingbase@node101 data]$ cp ../data.bk/kingbase.auto.conf ./
[kingbase@node101 data]$ cp ../data.bk/kingbase.conf ./
[kingbase@node101 data]$ cp ../data.bk/es_rep.conf ./ # 查看kingbase.auto.conf
[kingbase@node101 data]$ cat kingbase.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
enable_upper_colname = 'on'
wal_retrieve_retry_interval = '5000'
primary_conninfo = 'user=system connect_timeout=10 host=192.168.1.102 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 application_name=node101'
recovery_target_timeline = 'latest'
primary_slot_name = 'repmgr_slot_1'
synchronous_standby_names = ''

三、重新注册集群节点

Tips:

因为data数据中存储的是原生产集群的节点信息(esrep库),所以要根据测试库的repmgr.conf文件重新注册节点。

1、启动主备库数据库服务

[kingbase@node101 bin]$ ./sys_ctl restart -D ../data
waiting for server to shut down.... done
.......
server started

2、查看节点状态信息

[kingbase@node101 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+---------------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | ? unreachable | | default | 100 | ? | host=192.168.1.201 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node2 | standby | - failed | node1 | default | 100 | ? | host=192.168.1.202 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 WARNING: following issues were detected
- unable to connect to node "node1" (ID: 1)
- node "node1" (ID: 1) is registered as an active primary but is unreachable
- unable to connect to node "node2" (ID: 2) # 如上所示,因为repmgr.conf和esrep库的注册信息不一致,现在集群节点处于非正常状态。

3、重新注册主备库

1)注册主库

[kingbase@node101 bin]$ ./repmgr primary register --force
INFO: connecting to primary database...
INFO: "repmgr" extension is already installed
NOTICE: primary node record (ID: 1) updated [kingbase@node101 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
1 | node101 | primary | * running | | default | 100 | 1 | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node2 | standby | - failed | node101 | default | 100 | ? | host=192.168.1.202 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 WARNING: following issues were detected
- unable to connect to node "node2" (ID: 2)

2)注册备库

[kingbase@node102 bin]$ ./repmgr standby register --force
INFO: connecting to local node "node102" (ID: 2)
INFO: connecting to primary database
INFO: standby registration complete NOTICE: standby node "node102" (ID: 2) successfully registered
[kingbase@node102 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
1 | node101 | primary | * running | | default | 100 | 1 | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node102 | standby | running | node101 | default | 100 | 1 | host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 # 如上所示,节点注册完成后,集群节点状态正常。

四、查看流复制状态

Tips:

如果生产集群和测试集群的流复制复制槽名称不一致,可能需要重建复制槽。

# 复制槽信息
test=# select * from sys_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
---------------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
repmgr_slot_2 | | physical | | | f | t | 26620 | 963 | | 0/4E000FB8 |
repmgr_slot_1 | | physical | | | f | f | | | | |
(2 rows) # 流复制状态信息
test=# select * from sys_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state | reply_time
-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+---- 26620 | 10 | system | node102 | 192.168.1.102 | | 49454 | 2022-06-20 17:08:15.347515+08 | | streaming | 0/4E000FB8 | 0/4E000FB8 | 0/4E000FB8 | 0/4E000FB8 | | | | 1 | sync | 2022-06-20 17:11:44.277666+08
(1 row)

五、验证数据

1、查看迁移后数据

test=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+--------+----------+----------+-------------+-------------------
esrep | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
prod | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
prod1 | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
prod2 | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
security | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
template0 | system | UTF8 | ci_x_icu | zh_CN.UTF-8 | =c/system +
| | | | | system=CTc/system
template1 | system | UTF8 | ci_x_icu | zh_CN.UTF-8 | =c/system +
| | | | | system=CTc/system
test | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
(8 rows) prod=# select count(*) from t2;
count
--------
100000
(1 row)

2、重启集群

[kingbase@node101 bin]$ ./sys_monitor.sh restart
2022-06-20 17:12:39 Ready to stop all DB ...
.......
2022-06-20 17:13:14 repmgrd on "[192.168.1.102]" start success.
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+-------+---------+--------------------
1 | node101 | primary | * running | | running | 28907 | no | n/a
2 | node102 | standby | running | node101 | running | 26280 | no | 2 second(s) ago
[2022-06-20 17:13:19] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/kha/kingbase/log/kbha.log" [2022-06-20 17:13:22] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/kha/kingbase/log/kbha.log" 2022-06-20 17:13:29 Done.

六、总结

 1、从集群环境迁移数据,如果需要保证数据一致,必须要将集群停库,对于生产环境,要考虑停机窗口。
2、如果需要将目标集群数据重新加载到新的集群,需要将目标集群数据做逻辑备份,但是在导入时如果有重复数据需注意处理。(如将测试数据再导入到迁移后的集群中,可能有许多数据会重复)。
3、申请停机窗口,要考虑源集群数据量的大小、主机间的网络带宽、集群节点数、集群配置时间、集群启动故障的处理时间等。

KingbbaseES V8R6集群维护案例之---集群之间数据迁移的更多相关文章

  1. KingbaseES V8R3集群维护案例之---在线添加备库管理节点

    案例说明: 在KingbaseES V8R3主备流复制的集群中 ,一般有两个节点是集群的管理节点,分为master和standby:如对于一主二备的架构,其中有两个节点是管理节点,三个数据节点:管理节 ...

  2. KingbaseES V8R6集群维护案例之--单实例数据迁移到集群案例

    案例说明: 生产环境是单实例,测试环境是集群,现需要将生产环境的数据迁移到集群中运行,本文档详细介绍了从单实例环境恢复数据到集群环境的操作步骤,可以作为生产环境迁移数据的参考. 适用版本: Kingb ...

  3. KingbaseES V8R3集群管理维护案例之---集群迁移单实例架构

    案例说明: 在生产中,需要将KingbaseES V8R3集群转换为单实例架构,可以采用以下方式快速完成集群架构的迁移. 适用版本: KingbaseES V8R3 当前数据库版本: TEST=# s ...

  4. KingbaseES V8R6集群维护案例之---停用集群node_export进程

    案例说明: 在KingbaseES V8R6集群启动时,会启动node_exporter进程,此进程主要用于向kmonitor监控服务输出节点状态信息.在系统安全漏洞扫描中,提示出现以下安全漏洞: 对 ...

  5. KingbaseES V8R6集群维护案例之---将securecmdd通讯改为ssh案例

    案例说明: 在KingbaseES V8R6的后期版本中,为了解决有的主机之间不允许root用户ssh登录的问题,使用了securecmdd作为集群部署分发和通讯的服务,有生产环境通过漏洞扫描,在88 ...

  6. KingbaseES V8R6集群维护案例之--修改securecmdd工具服务端口

    案例说明: 在一些生产环境,为了系统安全,不支持ssh互信,或限制root用户使用ssh登录,KingbaseES V8R6可以使用securecmdd工具支持主机之间的通讯.securecmdd工具 ...

  7. 手把手教你通过Ambari新建Hadoop集群图解案例

    手把手教你通过Ambari新建Hadoop集群图解案例 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 登陆系统之后,会看到Ambari空空如也的欢迎界面,接下来我们就需要介绍如何通 ...

  8. Redis集群维护、运营的相关命令与工具介绍

    Redis集群的搭建.维护.运营的相关命令与工具介绍 一.概述 此教程主要介绍redis集群的搭建(Linux),集群命令的使用,redis-trib.rb工具的使用,此工具是ruby语言写的,用于集 ...

  9. 基于Ambari的WebUI实现集群扩容案例

    基于Ambari的WebUI实现集群扩容案例 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.将HDP的服务托管给Ambari服务 1>.点击“Service Auto S ...

随机推荐

  1. Vue几行代码实现搜索功能

    <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8&quo ...

  2. 毕设之Python爬取天气数据及可视化分析

    写在前面的一些P话:(https://jq.qq.com/?_wv=1027&k=RFkfeU8j) 天气预报我们每天都会关注,我们可以根据未来的天气增减衣物.安排出行,每天的气温.风速风向. ...

  3. multiset 用法复习

    前言 看题解上有 \(\text{multiset}\) 然后发现没脑子的我又忘了... 正文 \(\text{multiset}\) 可以看成一个序列,插入一个数,删除一个数都能够在 \(O(\lo ...

  4. Java 常用Set集合和常用Map集合

    目录 常用Set集合 Set集合的特点 HashSet 创建对象 常用方法 遍历 常用Map集合 Map集合的概述 HashMap 创建对象 常用方法 遍历 HashMap的key去重原理 常用Set ...

  5. 利用噪声构建美妙的 CSS 图形

    在平时,我非常喜欢利用 CSS 去构建一些有意思的图形. 我们首先来看一个简单的例子.首先,假设我们实现一个 10x10 的格子: 此时,我们可以利用一些随机效果,优化这个图案.譬如,我们给它随机添加 ...

  6. cnetOS使用Docker

    设置DHCP vi /etc/sysconfig/network-scripts/ifcfg-ens32 (1)bootproto=dhcp (2)onboot=yes 重启网卡:systemctl ...

  7. 边缘计算 KubeEdge+EdgeMash

    简介 KubeEdge是面向边缘计算场景.专为边云协同设计的业界首个云原生边缘计算框架,在 Kubernetes 原生的容器编排调度能力之上实现了边云之间的应用协同.资源协同.数据协同和设备协同等能力 ...

  8. SQLZOO练习三--SELECT within SELECT Tutorial

    This tutorial looks at how we can use SELECT statements within SELECT statements to perform more com ...

  9. Test_day01月_总结

    1)Object是所有类的超类,在java.lang包中 2)标识符命名规则 3)八种基本数据类型有哪些?每种类型所占的字节数? 整数直接量默认为int类型 浮点数直接量默认为double类型 4)字 ...

  10. 【FAQ】接入HMS Core推送服务,服务端下发消息常见错误码原因分析及解决方法

    HMS Core推送服务支持开发者使用HTTPS协议接入Push服务端,可以从服务器发送下行消息给终端设备.这篇文章汇总了服务端下发消息最常见的6个错误码,并提供了原因分析和解决方法,有遇到类似问题的 ...