诊断：记一次存储异常CRASH致数据库无法正常打开的恢复

数据库存储异常crash，首先控制文件出现问题

ORA-: ?????  ????

ORA-: ???? : '/oracledata/oradata/orc11rac/orc11rac/system01.dbf'

ORA-: ????????? - ??????

/home/oracle>oerr ora

, , "file is more recent than control file - old control file"

// *Cause: The control file change sequence number in the data file is

// greater than the number in the control file. This implies that

// the wrong control file is being used. Note that repeatedly causing

// this error can make it stop happening without correcting the real

// problem. Every attempt to open the database will advance the

// control file change sequence number until it is great enough.

// *Action: Use the current control file or do backup control file recovery to

// make the control file current. Be sure to follow all restrictions

// on doing a backup control file recovery.

/home/oracle>oerr ora

, , "data file %s: '%s'"

// *Cause: Reporting file name for details of another error

// *Action: See associated error message

/home/oracle>oerr ora

, , "database file %s failed verification check"

// *Cause: The information in this file is inconsistent with information

// from the control file. See accompanying message for reason.

// *Action: Make certain that the db files and control files are the correct

// files for this database.

这个问题可以采取重建控制文件然后进行recover database进行解决。
需要注意的是，在RAC环境中，需要关闭cluster_database。
即在单线程环境下进行操作。
否则可能会遇到如下问题：

ORA-: CREATE CONTROLFILE failed

ORA-: operation requires database is in EXCLUSIVE mode

本以为，事情可以过去，但是在recover的时候，文件、redolog、archivedlog都出现讹误，常规手段恢复后都无法打开。
最后采取_allow_resetlogs_corruption参数的方式进行尝试。
在pfile文件中添加参数

*._allow_resetlogs_corruption=true

使用该参数resetlogs打开数据库时，可能会由于SCN不一致而遭遇到ORA-00600 2662号错误。

ORA-: internal error code, arguments: [], [], [], [], [], [], [], []

 -  =

每一次尝试重启，ORA-600的错误参数是会变动的。

ORA-: internal error code, arguments: [], [], [], [], [], [], [], []

 -  =

可以发现，从19980到19972，这个值在缩小，这个错误，如果值相对较近，可以尝试多重启几次。
但是需要重启2497次，这个是短期内无法接受。

此时我们可以通过Oracle的内部事件来调整SCN:

增进SCN有两种常用方法:

1.通过immediate trace name方式(在数据库Open状态下)

alter session set events 'IMMEDIATE trace name ADJUST_SCN level x';

2.通过10015事件(在数据库无法打开，mount状态下)

alter session set events '10015 trace name adjust_scn level x';

注:level 1为增进SCN 10亿 (1 billion) (1024*1024*1024),通常Level 1已经足够。也可以根据实际情况适当调整。

SQL> alter session set events 'IMMEDIATE trace name ADJUST_SCN level 10';

Session altered.

SQL> alter database open;

alter database open

*

ERROR at line 1:

ORA-01113: file 1 needs media recovery

ORA-01110: data file 1: '/oracledata/oradata/orc11rac/orc11rac/system01.dbf'

SQL> recover database

Media recovery complete.

SQL> alter database open;

alter database open

*

ERROR at line 1:

ORA-00603: ORACLE server session terminated by fatal error

Process ID: 27474

Session ID: 1105 Serial number: 5

仍无法打开，后台报错

ORA-: internal error code, arguments: [], [], [], [], [], [], [], []

ORA-600的报错发生了变化，上述操作已经生效。但是诱发了新的错误。

DESCRIPTION: 

A mismatch has been detected between Redo records and Rollback (Undo)

records.

We are validating the Undo block sequence number in the undo block against

the Redo block sequence number relating to the change being applied.

This error is reported when this validation fails.

ARGUMENTS:

Arg [a] Undo record seq number

Arg [b] Redo record seq number

FUNCTIONALITY:

KERNEL TRANSACTION UNDO

ORA- [] [a] [b] [ ] [ ] [ ]

Versions: 7.2. - 9.2. Source: ktuc.c

===========================================================================

Meaning: seq# mismatch while adding an undo record to an undo block. This

is done by the application of redo.

---------------------------------------------------------------------------

Argument Description:

a. (ktubhseq): undo record seq# - this is the seq# of the block that

this undo record WILL BE APPLIED TO.

This is from the Undo Block. It is

NOT the seq# of the undo block itself.

b. (ktudbseq): redo RECORD seq# - this is the seq# number in the block

that this redo WILL BE APPLIED TO.

This is from the Redo Record. 

---------------------------------------------------------------------------

Diagnosis:

This error is raised in kturdb which handles the adding of undo records

by the application of redo. 

When we try to apply redo to an undo block (forward changes are made by

the application of redo to a block) we check that the seq# in the undo

record matches the seq# in the redo record. These seq# should be the

same because when we apply a redo record we must apply it to the

correct version of the block. We can only apply a redo record to a

block that contains the same seq# as in the redo record. 

If the seq# do not match then this error is raised. This implies some

kind of block corruption in either the redo or the undo block. 

7.3.x - 8.1..x

ASSERT2(ubh->ktubhseq == db->ktudbseq, OERI(), KSESVSGN,

ubh->ktubhseq, db->ktudbseq);

9.2.x

ksesic2(OERI(), ksenrg(ubh->ktubhseq), ksenrg(db->ktudbseq));

struct ktubh

{

kxid ktubhxid; /* txid of tx currently using or last used this block */

ub2 ktubhseq; /* undo block sequence number */

ub1 ktubhcnt; /* high water mark record index, number of undo entries */

ub1 ktubhirb; /* rollback record index, rec index to start the rollback */

ub1 ktubhicl; /* collecting record index, rec index to start retrieving col info */

ub1 ktubhflg; /* dummy */

ub2 ktubhidx[]; /* byte offset of record in block, grows at runtime */

};

struct ktudb Kernel Transaction Undo Data operation Block (redo)

{

ub2 ktudbsiz; /* size of entry */

ub2 ktudbspc; /* verification: space left in undo block */

ub2 ktudbflg; /* flag to indicate the kind of redo operation */

kxid ktudbxid; /* current tx id */

ub2 ktudbseq; /* block sequence number */

ub1 ktudbrec; /* new record index for this change */

};

处理方式是
1、新建一个UNDO表空间；
2、修改undo管理为manual；
本次选择了手工的方式，参数文件中修改

*.undo_management=manual

SQL> startup mount

ORACLE instance started.

Total System Global Area 1.3429E+10 bytes

Fixed Size 2149040 bytes

Variable Size 6845105488 bytes

Database Buffers 6576668672 bytes

Redo Buffers 4730880 bytes

Database mounted.

SQL> alter database open;

Database altered.

至此，数据库成功打开。此时已经可以导出需要的数据进行备份。
某些版本的数据库仍需要进行TEMP表空间的temp文件添加。
但此时已经可以导出需要的数据进行备份。
继续观察后台日志报错，也可以新建新的UNDO表空间为auto管理。

诊断：记一次存储异常CRASH致数据库无法正常打开的恢复的更多相关文章

解Bug之路-记一次存储故障的排查过程
解Bug之路-记一次存储故障的排查过程高可用真是一丝细节都不得马虎.平时跑的好好的系统,在相应硬件出现故障时就会引发出潜在的Bug.偏偏这些故障在应用层的表现稀奇古怪,很难让人联想到是硬件出了问题, ...
mvc 使用预置队列类型存储异常对象
using PaiXie.Utils; using System; using System.Collections.Generic; using System.Linq; using System. ...
Android数据的四种存储方式之SQLite数据库
Test.java: /** * 本例解决的问题: * 核心问题:通过SQLiteOpenHelper类创建数据库对象 * 通过数据库对象对数据库的数据的操作 * 1.sql语句方式操作SQLite数 ...
Spring Boot干货系列：（八）数据存储篇-SQL关系型数据库之JdbcTemplate的使用
Spring Boot干货系列:(八)数据存储篇-SQL关系型数据库之JdbcTemplate的使用原创 2017-04-13 嘟嘟MD 嘟爷java超神学堂前言前面几章介绍了一些基础,但都是静 ...
（4.14）存储：RAID在数据库存储上的应用
关键词:(4.14)存储:RAID在数据库存储上的应用转自:http://blog.51cto.com/qianzhang/1251260 随着单块磁盘在数据安全.性能.容量上呈现出的局限,磁盘阵列 ...
数据存储之非关系型数据库存储----MongoDB存储
MongoDB存储----文档型数据库利用pymongo连接MongoDB import pymongo client = pymongo.MongoClient(host='localhost', ...
用texarea存储数据，查询数据库后原样显示在jsp中，包括空格和回车换行
用texarea存储数据,查询数据库后原样显示在jsp中,包括空格和回车换行
教你 Debug 的正确姿势——记一次 CoreMotion 的 Crash
作者:林蓝东最近的一个手机 QQ 版本发出去后收到比较多关于 CoreMotion 的 crash 上报,案发现场如下: 但是看看这个堆栈发现它完全不按照套路出牌啊! 乍一看是挂在 CoreMoti ...
异常Crash之 NSGenericException，NSArray was mutated while being enumerated
*** Terminating app due to uncaught exception 'NSGenericException', reason: '*** Collection <__NS ...

随机推荐

BZOJ1179 : [Apio2009]Atm 缩点+spfa
1179: [Apio2009]Atm Time Limit: 15 Sec Memory Limit: 162 MBSubmit: 2069 Solved: 826[Submit][Status ...
E20171123-sl
conform vi. 符合; 遵照; 适应环境; vi. 符合; 遵照; 适应环境; adj. 一致的; 顺从的; investigat ...
poj2096Collecting Bugs(概率期望dp)
Collecting Bugs Time Limit: 10000MS Memory Limit: 64000K Total Submissions: 6400 Accepted: 3128 ...
清北考前刷题day7早安
[App Store Connect帮助]五、管理构建版本（3）在您提交以供审核前选择构建版本
在提交 App 至“App 审核”前,请(从您为该版本上传的所有构建版本中)选择您想要提交的版本.一个 App Store 版本仅可关联一个构建版本.但是,在提交该版本至“App 审核”之前,您可以任 ...
justify-content属性
justify-content 用于设置或检索弹性盒子元素在主轴方向上的对齐方式. 属性值:flex-start 属性值:flex-end 属性值:center 属性值:space-between 属 ...
oracle-数据库泵EXPDP导出用户下所有
1登录sys用户 2创建目录 create directory [dirname] as ‘[dirpath]’; dirname:取的名字 dirpath:dmp文件导出路径示例:create d ...
spring cloud config搭建说明例子（二）-添加eureka
添加注册eureka 服务端 ConfigServer pom.xml <dependency> <groupId>org.springframework.cloud</ ...
$CF55D [数位DP]$
题面数位DP+状压. 首先,按照数位DP的基本套路,每个个位数的最小公倍数为2520,所以只用考虑模2520的情况.考虑一个DP.dp[i][j][k]表示当前是第i位,2~9的数的集合为j,模25 ...
1.2Hello, World!的大小
描述还记得在上一章里,我们曾经输出过的“Hello, World!”吗? 它虽然不是本章所涉及的基本数据类型的数据,但我们同样可以用sizeof函数获得它所占用的空间大小. 请编程求出它的大小,看看 ...

诊断：记一次存储异常CRASH致数据库无法正常打开的恢复

诊断：记一次存储异常CRASH致数据库无法正常打开的恢复的更多相关文章

随机推荐

热门专题