Oracle DG测试failover和后续恢复报告

一、概述

二、验证过程:

三、结论

一、概述

本文是针对在DG灾备环境进行failover操作以及后续恢复的报告。
**我这里的测试环境是:**
数据库版本:Oracle 11.2.0.4
Site A:主库 db_unique_name=jyzhao
Site B:备库(实时应用)db_unique_name=mynas
Site C:备库(延迟1小时应用)db_unique_name=jyzhao_s

以下章节涉及到的简称注释:

A库 => Site A:主库

B库 => Site B:备库(实时应用)

C库 => Site C:备库(延迟1小时应用)

验证:

当A库crash后,在B库进行failover将B切换为新的主库,确认failover之后,A库和C库应该如何处理才可以成为新的备库继续使用?是否需要重建?重建的话,是否需要重新备份来恢复,以前的备份是否可以用来创建备库?

二、验证过程:

2.1 A库异常关闭

A库:

  1. SQL> shutdown abort

2.2 B库进行failover切换为新主库

failover 标准步骤如下:

  1. #取消DG应用
  2. ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
  3. #重启下数据库(建议)
  4. shutdown immediate;
  5. startup
  6. #操作不可逆,确定实际情况需要failover
  7. ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH force;
  8. SELECT OPEN_MODE, DATABASE_ROLE, SWITCHOVER_STATUS, FORCE_LOGGING, DATAGUARD_BROKER, GUARD_STATUS FROM V$DATABASE;
  9. #尝试常规切换为主库
  10. ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WITH SESSION SHUTDOWN;
  11. 如果这一步的常规切换失败,提示需要介质恢复,那么:
  12. 1)恢复备库 recover standby database until cancel;
  13. 2)激活备库 alter database activate standby database;
  14. #最后重新启动数据库
  15. shutdown immediate;
  16. startup

查看此时B库的信息:

  1. SQL> select name, database_role, open_mode from gv$database;
  2. NAME DATABASE_ROLE OPEN_MODE
  3. --------- ---------------- --------------------
  4. JYZHAO PRIMARY READ WRITE
  5. SQL> select * from v$log;
  6. GROUP# THREAD# SEQUENCE# BYTES BLOCKSIZE MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIME NEXT_CHANGE# NEXT_TIME
  7. ---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- ------------ ------------ ------------
  8. 1 1 3 52428800 512 2 NO CURRENT 17882720 03-SEP-17 2.8147E+14
  9. 2 1 2 52428800 512 2 YES ACTIVE 17882488 03-SEP-17 17882720 03-SEP-17
  10. 3 2 0 52428800 512 2 YES UNUSED 0 0
  11. 4 2 0 52428800 512 2 YES UNUSED 0 0

可以看到,目前B库已成为新的主库,redo日志的sequence重新开始。

2.3 要求C库成为新主库的备库

现在要求C库成为新主库的备库。是否需要重建C库呢?答案是不需要。下面具体来看下验证过程。
**C库的alert日志:**

  1. Sun Sep 03 14:09:58 2017
  2. Archived Log entry 9 added for thread 1 sequence 1227 ID 0x97764e10 dest 1:
  3. ARC2: Archive log thread 1 sequence 1227 available in 60 minute(s)
  4. Sun Sep 03 14:09:58 2017
  5. RFS[3]: Selected log 11 for thread 1 sequence 1228 dbid -1785877518 branch 919999037
  6. Sun Sep 03 14:19:24 2017
  7. RFS[3]: Possible network disconnect with primary database
  8. Sun Sep 03 14:19:24 2017
  9. RFS[2]: Possible network disconnect with primary database
  10. Sun Sep 03 14:19:24 2017
  11. RFS[4]: Assigned to RFS process 10190
  12. RFS[4]: Possible network disconnect with primary database

可以看到,在A库crash之后,C库收到网络无法连接到A库的告警,说明C库目前没有新的操作。

接下来想要C库成为B库(新主库)的备库,就需要尝试在B库上配置DG参数,使得B库的归档可以传输到C库。

  1. show parameter log_archive_config
  2. show parameter log_archive_dest_3
  3. alter system set log_archive_config = 'DG_CONFIG=(jyzhao,mynas,jyzhao_s)';
  4. alter system set log_archive_dest_3 = 'SERVICE=jyzhao_s LGWR ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) db_unique_name=jyzhao_s';

同时在B库的tnsnames.ora文件中增加到C库的连接:

  1. #Standby Single Instance
  2. JYZHAO_S =
  3. (DESCRIPTION =
  4. (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.111)(PORT = 1521))
  5. (CONNECT_DATA =
  6. (SERVER = DEDICATED)
  7. (SERVICE_NAME = jyzhao_s)
  8. )
  9. )

在B库设置完成后,观察B库的告警:

  1. Sun Sep 03 14:37:41 2017
  2. ALTER SYSTEM SET log_archive_dest_3='SERVICE=jyzhao_s LGWR ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) db_unique_name=jyzhao_s' SCOPE=BOTH;
  3. ARC3: Standby redo logfile selected for thread 1 sequence 3 for destination LOG_ARCHIVE_DEST_3
  4. Thread 1 cannot allocate new log, sequence 5
  5. Checkpoint not complete
  6. Current log# 2 seq# 4 mem# 0: +DATA/mynas/onlinelog/group_2.278.953484865
  7. Current log# 2 seq# 4 mem# 1: +FRA/mynas/onlinelog/group_2.714.953484869
  8. Thread 1 advanced to log sequence 5 (LGWR switch)
  9. Current log# 1 seq# 5 mem# 0: +DATA/mynas/onlinelog/group_1.277.953484853
  10. Current log# 1 seq# 5 mem# 1: +FRA/mynas/onlinelog/group_1.720.953484859
  11. ******************************************************************
  12. LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_3
  13. ******************************************************************
  14. LNS: Standby redo logfile selected for thread 1 sequence 5 for destination LOG_ARCHIVE_DEST_3
  15. Archived Log entry 382 added for thread 1 sequence 4 ID 0x978ff56d dest 1:
  16. Sun Sep 03 14:37:55 2017
  17. ARC3: Standby redo logfile selected for thread 1 sequence 4 for destination LOG_ARCHIVE_DEST_3
  18. Sun Sep 03 14:39:49 2017
  19. Thread 1 advanced to log sequence 6 (LGWR switch)
  20. Current log# 2 seq# 6 mem# 0: +DATA/mynas/onlinelog/group_2.278.953484865
  21. Current log# 2 seq# 6 mem# 1: +FRA/mynas/onlinelog/group_2.714.953484869
  22. Sun Sep 03 14:39:49 2017
  23. LNS: Standby redo logfile selected for thread 1 sequence 6 for destination LOG_ARCHIVE_DEST_3
  24. Sun Sep 03 14:39:49 2017
  25. Archived Log entry 388 added for thread 1 sequence 5 ID 0x978ff56d dest 1:

然后返回C库操作,将C库开启实时日志应用:

  1. SQL> alter database recover managed standby database cancel;
  2. Database altered.
  3. SQL> alter database recover managed standby database using current logfile disconnect from session;
  4. Database altered.

此时再观察C库的告警日志:

  1. Sun Sep 03 14:38:10 2017
  2. Clearing online log 11 of thread 1 sequence number 0
  3. Sun Sep 03 14:38:19 2017
  4. MRP0: Incarnation has changed! Retry recovery...
  5. RFS[7]: Selected log 11 for thread 1 sequence 4 dbid -1785877518 branch 953735009
  6. Errors in file /u01/app/oracle/diag/rdbms/jyzhao_s/jyzhao/trace/jyzhao_pr00_10161.trc:
  7. ORA-19906: recovery target incarnation changed during recovery
  8. Recovery interrupted!
  9. Sun Sep 03 14:38:19 2017
  10. Archived Log entry 11 added for thread 1 sequence 4 ID 0x978ff56d dest 1:
  11. Sun Sep 03 14:38:20 2017
  12. started logmerger process
  13. Sun Sep 03 14:38:20 2017
  14. Managed Standby Recovery not using Real Time Apply
  15. Parallel Media Recovery started with 2 slaves
  16. Media Recovery start incarnation depth : 1, target inc# : 3, irscn : 17882484
  17. Waiting for all non-current ORLs to be archived...
  18. All non-current ORLs have been archived.
  19. Media Recovery Delayed for 60 minute(s) (thread 1 sequence 1226)
  20. Sun Sep 03 14:38:31 2017
  21. RFS[8]: Assigned to RFS process 10704
  22. RFS[8]: Opened log for thread 1 sequence 1228 dbid -1785877518 branch 919999037
  23. Archived Log entry 12 added for thread 1 sequence 1228 rlc 919999037 ID 0x97764e10 dest 3:
  24. RFS[8]: Opened log for thread 1 sequence 1 dbid -1785877518 branch 953735009
  25. Sun Sep 03 14:38:36 2017
  26. RFS[7]: Opened log for thread 1 sequence 2 dbid -1785877518 branch 953735009
  27. Archived Log entry 13 added for thread 1 sequence 1 rlc 953735009 ID 0x978ff56d dest 3:
  28. Archived Log entry 14 added for thread 1 sequence 2 rlc 953735009 ID 0x978ff56d dest 3:
  29. Sun Sep 03 14:40:13 2017
  30. Archived Log entry 15 added for thread 1 sequence 5 ID 0x978ff56d dest 1:
  31. Sun Sep 03 14:40:13 2017
  32. RFS[6]: Selected log 11 for thread 1 sequence 6 dbid -1785877518 branch 953735009
  33. Sun Sep 03 14:40:37 2017
  34. alter database recover managed standby database cancel
  35. Sun Sep 03 14:40:38 2017
  36. MRP0: Background Media Recovery cancelled with status 16037
  37. Errors in file /u01/app/oracle/diag/rdbms/jyzhao_s/jyzhao/trace/jyzhao_pr00_10688.trc:
  38. ORA-16037: user requested cancel of managed recovery operation
  39. Recovery interrupted!
  40. Sun Sep 03 14:40:38 2017
  41. MRP0: Background Media Recovery process shutdown (jyzhao)
  42. Managed Standby Recovery Canceled (jyzhao)
  43. Completed: alter database recover managed standby database cancel
  44. Sun Sep 03 14:40:53 2017
  45. alter database recover managed standby database using current logfile disconnect from session
  46. Attempt to start background Managed Standby Recovery process (jyzhao)
  47. Sun Sep 03 14:40:53 2017
  48. MRP0 started with pid=20, OS id=10747
  49. MRP0: Background Managed Standby Recovery process started (jyzhao)
  50. started logmerger process
  51. Sun Sep 03 14:40:58 2017
  52. Managed Standby Recovery starting Real Time Apply
  53. Parallel Media Recovery started with 2 slaves
  54. Media Recovery start incarnation depth : 1, target inc# : 3, irscn : 17882484
  55. Waiting for all non-current ORLs to be archived...
  56. All non-current ORLs have been archived.
  57. Managed Standby Recovery started with USING CURRENT LOGFILE
  58. Ignoring previously specified DELAY 60 minutes for thread 1 sequence 1226
  59. Media Recovery Log /u01/oradata/JYZHAO_S/archivelog/2017_09_03/o1_mf_1_1226_dtq74r91_.arc
  60. Managed Standby Recovery started with USING CURRENT LOGFILE
  61. Ignoring previously specified DELAY 60 minutes for thread 1 sequence 1227
  62. Media Recovery Log /u01/oradata/JYZHAO_S/archivelog/2017_09_03/o1_mf_1_1227_dtq75p6j_.arc
  63. Media Recovery Log /u01/oradata/JYZHAO_S/archivelog/2017_09_03/o1_mf_1_1228_dtq8v7c7_.arc
  64. Identified End-Of-Redo (failover) for thread 1 sequence 1228 at SCN 0x0.110dd74
  65. Completed: alter database recover managed standby database using current logfile disconnect from session
  66. Resetting standby activation ID 2541112848 (0x97764e10)
  67. Media Recovery End-Of-Redo indicator encountered
  68. Media Recovery Continuing
  69. Media Recovery Log /u01/oradata/JYZHAO_S/archivelog/2017_09_03/o1_mf_1_1_dtq8vdpr_.arc
  70. Media Recovery Log /u01/oradata/JYZHAO_S/archivelog/2017_09_03/o1_mf_1_2_dtq8vdpy_.arc
  71. Media Recovery Log /u01/oradata/JYZHAO_S/archivelog/2017_09_03/o1_mf_1_3_dtq8tgvl_.arc
  72. Sun Sep 03 14:41:11 2017
  73. Media Recovery Log /u01/oradata/JYZHAO_S/archivelog/2017_09_03/o1_mf_1_4_dtq8tvy5_.arc
  74. Media Recovery Log /u01/oradata/JYZHAO_S/archivelog/2017_09_03/o1_mf_1_5_dtq8yfr5_.arc
  75. Media Recovery Waiting for thread 1 sequence 6 (in transit)
  76. Recovery of Online Redo Log: Thread 1 Group 11 Seq 6 Reading mem 0
  77. Mem# 0: /u01/oradata/standbylog/standby_group_11.log

实际看到,C库已经可以正常应用日志。说明C库不需要重建即可通过简单配置成为新主库B库的新备库。

2.4 要求A库成为新主库的备库

此时A库启动的话,是一个独立运行的数据库,如果想将A库也设置为主库的话,那么,通过新主库的最新备份肯定是可行的,但是如果数据量很大,之前A库自己本身有历史的备份,能否不再耗时备份新主库,直接通过历史的备份恢复呢?其实这个从上面的C库不再需要重建直接成为新主库的备库,也可以推断出,是可以的。只需要确认这个备份是在failover之前完成的。下面我们来具体实验验证下可行性。
在B库创建新的备库控制文件,并传输到A库相同路径下:

  1. backup current controlfile for standby format '/tmp/std_control02.ctl';

在A库启动到nomount,恢复新的备库控制文件

  1. restore standby controlfile from '/tmp/std_control02.ctl';

在A库查看数据文件头的检查点,确认是在failover之前:

  1. SYS@jyzhao1 >select name from v$datafile;
  2. NAME
  3. ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  4. +DATA1/jyzhao/datafile/system.258.951608183
  5. +DATA1/jyzhao/datafile/sysaux.257.951608183
  6. +DATA1/jyzhao/datafile/undotbs1.259.951608185
  7. +DATA1/jyzhao/datafile/users.265.951608205
  8. +DATA1/jyzhao/datafile/undotbs2.261.951608185
  9. +DATA1/jyzhao/datafile/dbs_d_jingyu.262.951608185
  10. +DATA1/jyzhao/datafile/dbs_i_jingyu.263.951608185
  11. +DATA1/jyzhao/datafile/test.264.951608185
  12. +DATA1/jyzhao/datafile/test2.260.951608185
  13. +DATA1/jyzhao/datafile/dbadata.276.952933931
  14. 10 rows selected.
  15. SYS@jyzhao1 >select checkpoint_change# from v$database;
  16. CHECKPOINT_CHANGE#
  17. ------------------
  18. 17892035
  19. SYS@jyzhao1 >select checkpoint_change# from v$datafile;
  20. CHECKPOINT_CHANGE#
  21. ------------------
  22. 17892035
  23. 17892035
  24. 17892035
  25. 17892035
  26. 17892035
  27. 17892035
  28. 17892035
  29. 17892035
  30. 17892035
  31. 17892035
  32. 10 rows selected.
  33. SYS@jyzhao1 >select checkpoint_change# from v$datafile_header;
  34. CHECKPOINT_CHANGE#
  35. ------------------
  36. 0
  37. 0
  38. 0
  39. 0
  40. 0
  41. 0
  42. 0
  43. 0
  44. 0
  45. 0
  46. 10 rows selected.

上面这个数据文件头的检查点是0,说明数据文件没有正确获取到,实际上是由于OMF的名字有变化,直接将数据文件路径catalog到备份集中,再switch即可。

  1. catalog start with '+DATA1/jyzhao/datafile/';
  2. switch database to copy;

再次查询:

  1. SYS@jyzhao1 >select current_scn||'' from v$database;
  2. CURRENT_SCN||''
  3. ----------------------------------------
  4. 17892467
  5. SYS@jyzhao1 >select checkpoint_change# from v$database;
  6. CHECKPOINT_CHANGE#
  7. ------------------
  8. 17892035
  9. SYS@jyzhao1 >select checkpoint_change# from v$datafile;
  10. CHECKPOINT_CHANGE#
  11. ------------------
  12. 17892035
  13. 17892035
  14. 17892035
  15. 17892035
  16. 17892035
  17. 17892035
  18. 17892035
  19. 17892035
  20. 17892035
  21. 17892035
  22. 10 rows selected.
  23. SYS@jyzhao1 >select checkpoint_change# from v$datafile_header;
  24. CHECKPOINT_CHANGE#
  25. ------------------
  26. 17882484
  27. 17882484
  28. 17882484
  29. 17882484
  30. 17882484
  31. 17882484
  32. 17882484
  33. 17882484
  34. 17882484
  35. 17882484
  36. 10 rows selected.

此时在mount状态下开启日志应用:

  1. alter database recover managed standby database disconnect from session;

从告警日志观察,确认应用到最新时,取消日志应用:

  1. alter database recover managed standby database cancel;

打开数据库,开启实时应用:

  1. alter database recover managed standby database USING CURRENT LOGFILE disconnect from session;

最终查询可以正常实时应用。

三、结论

一般来说,在A库crash之后,B库failover成为新的主库,那么原来设置为延迟1小时应用的C 库是可以直接配置成为新主库的备库。A库修复后,也可以通过failover之前的现有备份集来恢复到failover之前的状态,而不需要在新主库重新去备份。

Oracle DG测试failover和后续恢复报告的更多相关文章

  1. oracle dg failover灾难切换

    oracle dg failover灾难切换SQL> alter database recover managed standby database finish force;SQL> a ...

  2. RHEL6.4 + Oracle 11g DG测试环境快速搭建参考

    环境现状: 两台虚拟主机A和B: 1. A机器已安装ASM存储的Oracle 11g 实例      参考:http://www.cnblogs.com/jyzhao/p/4332410.html 2 ...

  3. 【DATAGUARD】物理dg的failover切换(六)

    [DATAGUARD]物理dg的failover切换(六) 一.1  BLOG文档结构图 一.2  前言部分 一.2.1  导读 各位技术爱好者,看完本文后,你可以掌握如下的技能,也可以学到一些其它你 ...

  4. oracle DG 主备切换语句整理

    今日花了一下午时间进行了Oracle DataGuard的切换练习,参考了网上好多文章,最后将一些语句进行摘录,以备以后查询使用.之后有时间会带来Oracle DG的搭建和切换全过程文章. DataG ...

  5. 某控股公司OA系统ORACLE DG搭建

    *此处安装ORACLE DATAGUARD是利用ORACLE RMAN DUPLICATE方式安装.*可以搭建好ORACLE DG再来impdp生产数据,也可以先导入主库数据再来做DG*注意看下面的配 ...

  6. oracle DG搭建

    Oracle DG 搭建1. 环境 OS IP hostname db_name DB_UNIQUE_NAME主库 RHEL 5.4 192.168.12.20 edgzrip1.oracle.com ...

  7. oracle dg状态检查及相关命令

    oracle dg 状态检查 先检查备库的归档日志同步情况 SELECT NAME,applied FROM v$archived_log; alter database recover manage ...

  8. ORACLE DG 库参数db_file_name_convert和log_file_name_convert的作用

    https://www.cnblogs.com/xqzt/p/5089826.html ORACLE DG 库参数db_file_name_convert和log_file_name_convert的 ...

  9. oracle DG查看延时时间

    oracle DG查看延时时间 SQL> select value from v$dataguard_stats where name='apply lag'; 例如: SQL> sele ...

随机推荐

  1. (转)AJax跨域:No 'Access-Control-Allow-Origin' header is present on the requested resource

    在本地用ajax跨域访问请求时报错: No 'Access-Control-Allow-Origin' header is present on the requested resource. Ori ...

  2. maven:pom.xml中没有dependency标签错误

    dependency的标签是包含在dependencies中的.

  3. css的背景background的相关属性

    今天需要做一个占满设备宽度的轮播图,这里作为demo仅展示一张图,下面分别是要操作的图片(这里做了缩放处理,实际的图比较大),以及要实现的效果图,很明显两者是不成比例的:      (图一)     ...

  4. HTML5 开发APP

    近期在做app,现在项目进行了一段时间,我打算把自己的经验写出来,给自己总结一下也给会用小伙伴看一下.本人前端一枚.我们所以能选的技术就是CSS,HTML,JS了,经过准备我决定用HBuilder 准 ...

  5. 内核对象kobject和sysfs(1)——概述

    内核对象kobject和sysfs(1)--概述 问题: 在走读驱动代码时,经常看见kref,kobj,sysfs这些结构,这些结构到底有什么作用?如何理解并使用这些结构呢?这将在接下来的这一系列文章 ...

  6. 【 js 基础 】【 源码学习 】backbone 源码阅读(一)

    最近看完了 backbone.js 的源码,这里对于源码的细节就不再赘述了,大家可以 star 我的源码阅读项目(https://github.com/JiayiLi/source-code-stud ...

  7. JavaScript高程--<script>标签

    <script>标签 在HTML5中script主要有以下几个属性:async,defer,charset,src,type, async(可选): 关键词:异步脚本,外部文件,立即下载: ...

  8. HDU--1003 Max Sum(最大连续子序列和)

    Problem Description Given a sequence a[1],a[2],a[3]......a[n], your job is to calculate the max sum ...

  9. android项目数据库升级跨版本管理解决方案

    目前公司android项目普遍使用框架对数据库进行操作,数据库表与数据实体都具有严格的对应的关系,但是数据库的升依赖不同版本间的升级脚本,如果应用跨多版本进行升级时,当缺失部分升级脚本时就会导致应用异 ...

  10. C语言中全局变量存放在哪个位置?

    今年软考的时候,遇到了这个题目,表示不解,然后考完之后去查了一下百度,才发现自己选错.全局变量存放在静态存储区,位置是固定的. 局部变量在栈空间,栈地址是不固定的.栈:就是那些由编译器在需要的时候分配 ...