root.sh脚本支持checkpoints文件实现重复运行

安装集群GRID/GI一般包括三个过程：首先，运行OUI/RunInstaller输入集群配置信息，其次，拷贝/编译集群文件，最后，以root用户运行root.sh脚本配置集群/启动集群，其中运行root.sh脚本是最关键的阶段。接触过很多SR case都是在这个阶段出现错误导致安装失败。如果问题修复后，需要先deconfigure 已有的配置，然后再运行root.sh。从11.2.0.2版本开始支持重复运行root.sh脚本，也就是说修复问题后，可以直接再运行root.sh，并且从上次失败的地方继续安装（类似”断点续传”）。这个特性在12c中又得到增强。实现这个功能主要是通过将安装阶段信息记录到checkpoint文件和OCR文件来实现：

11.2 checkpoint文件位置

$ORACLE_BASE/Clusterware/ckptGridHA_${nodename}.xml

12c checkpoint文件位置

$ORACLE_BASE/crsdata/$hostname/crsconfig/ckptGridHA_${nodename}.xml

下面分享一个安装12.1.0.2 集群GRID/GI, 运行root.sh 脚本失败的案例。

案例分享

=========

在Linux系统上安装12.1.0.2 集群GRID/GI软件，节点2运行root.sh失败，屏幕的错误信息：

OLR initialization - successful

2015/12/15 13:16:55 CLSRSC-507: The root script cannot proceed on this node rac2 because either the first-node operations have not completed on node rac1 or there was an error in obtaining the status of the first-node operations.

以上错误说明节点2无法确认节点1安装状态是否完成。Root.sh是如果来确认节点1安装是否完成呢？需要检查日志：

$GRID_HOME>/cfgtoollogs/crsconfig/rootcrs_rac2_2015-12-18_09-41-53PM.log

2015-12-18 21:42:39: Trying to get the value of key: SYSTEM.rootcrs.checkpoints.firstnode in OCR.

2015-12-18 21:42:39: setting ORAASM_UPGRADE to 1

2015-12-18 21:42:39: Check the existence of key pair with key name: SYSTEM.rootcrs.checkpoints.firstnode in OCR.

2015-12-18 21:42:39: setting ORAASM_UPGRADE to 1

2015-12-18 21:42:39: Invoking "/u01/gridsoft/12.1.0/bin/cluutil -exec -keyexists -key checkpoints.firstnode"

2015-12-18 21:42:39: trace file=/u01/gridbase/crsdata/rac2/crsconfig/cluutil9.log

2015-12-18 21:42:39: Running as user grid: /u01/gridsoft/12.1.0/bin/cluutil -exec -keyexists -key checkpoints.firstnode

2015-12-18 21:42:39: s_run_as_user2: Running /bin/su grid -c ' echo CLSRSC_START; /u01/gridsoft/12.1.0/bin/cluutil -exec -keyexists -key checkpoints.firstnode '

2015-12-18 21:42:39: Removing file /tmp/filexr1WwO

2015-12-18 21:42:39: Successfully removed file: /tmp/filexr1WwO

2015-12-18 21:42:39: pipe exit code: 256

2015-12-18 21:42:39: /bin/su exited with rc=1

2015-12-18 21:42:39: oracle.ops.mgmt.rawdevice.OCRException: PROC-32: Cluster Ready Services on the local node is not running Messaging error [gipcretConnectionRefused] [29]

2015-12-18 21:42:39: Cannot get OCR key with CLUUTIL, try using OCRDUMP.

2015-12-18 21:42:39: Check OCR key using ocrdump

2015-12-18 21:42:54: ocrdump output: PROT-302: Failed to initialize ocrdump

2015-12-18 21:42:54: The key pair with keyname: SYSTEM.rootcrs.checkpoints.firstnode does not exist in OCR.

以上信息说明节点2首先执行cluutil -exec -keyexists -key checkpoints.firstnode命令来查看OCR中的key: SYSTEM.rootcrs.checkpoints.firstnode，失败后又尝试执行OCRDUMP命令，但是OCRDUMP命令也失败。接下来分析OCRDUMP命令也失败的原因：

$GRID_BASE/diag/crs/<node>/crs/trace/ocrdump_13146.trc

2015-12-18 21:42:48.098879 : OCRASM: ASM Error Stack : ORA-29701: unable to connect to Cluster Synchronization Service

2015-12-18 21:42:48.098885 : OCRASM: proprasmo: ASM instance is down. Proceed to open the file in dirty mode.

CLWAL: clsw_Initialize: Error [32] from procr_init_ext

CLWAL: clsw_Initialize: Error [PROCL-32: Oracle High Availability Services on the local node is not running Messaging error [gipcretConnectionRefused] [29]] from procr_init_ext

2015-12-18 21:42:48.101773 : GPNP: clsgpnpkww_initclswcx: [at clsgpnpkww.c:351] Result: (56) CLSGPNP_OCR_INIT. (:GPNP01201: )Failed to init CLSW-OLR context. CLSW Error (3): CLSW-3: Error in the cluster registry (OCR) layer. [32] [PROCL-32: Oracle High Availability Services on the local node is not running Messaging error [gipcretConnectionRefused] [29]]

2015-12-18 21:42:48.112746 : OCRASM: proprasmo: Error [13] in opening the GPNP profile. Try to get offline profile

2015-12-18 21:42:48.220769 : OCRRAW: kgfo_kge2slos error stack at kgfolclcpi1: AMDU-00210: No disks found in diskgroup OCR_VOTING

以上信息提示无法连接ORA-29701 CSS和PROCL-32 OHASD这些都是正常的，因为节点2集群没有启动，这些错误可能会干扰我们分析问题。关键的错误信息是AMDU-00210: No disks found in diskgroup OCR_VOTING，也就是说节点2没有找到ASM disk导致OCRDUMP失败，因此无法确认节点1安装的状态是否完成。接下来我们执行kfed确认ASM disk是否有问题：

节点1查看disk /dev/raw/raw1

$ /u01/gridsoft/12.1.0/bin/kfed read /dev/raw/raw1

kfbh.endian: 1 ; 0x000: 0x01

kfbh.hard: 130 ; 0x001: 0x82

kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD <=========disk raw1类型是KFBTYP_DISKHEAD，是正常的asm disk

kfbh.datfmt: 1 ; 0x003: 0x01

kfbh.block.blk: 0 ; 0x004: blk=0

kfbh.block.obj: 2147483648 ; 0x008: disk=0

kfbh.check: 420965027 ; 0x00c: 0x19176aa3

kfbh.fcn.base: 0 ; 0x010: 0x00000000

kfbh.fcn.wrap: 0 ; 0x014: 0x00000000

...

kfdhdb.vfstart: 128 ; 0x0ec: 0x00000080 <=========vfstart 值说明这个disk是vote file

kfdhdb.vfend: 160 ; 0x0f0: 0x000000a0 <=========vfend 值说明这个disk是vote file

节点2查看disk /dev/raw/raw1

$ /u01/gridsoft/12.1.0/bin/kfed read /dev/raw/raw1

kfbh.endian: 0 ; 0x000: 0x00

kfbh.hard: 0 ; 0x001: 0x00

kfbh.type: 0 ; 0x002: KFBTYP_INVALID<=========节点2上查看raw1类型是无效的KFBTYP_INVALID

kfbh.datfmt: 0 ; 0x003: 0x00

kfbh.block.blk: 0 ; 0x004: blk=0

kfbh.block.obj: 0 ; 0x008: file=0

kfbh.check: 0 ; 0x00c: 0x00000000

kfbh.fcn.base: 0 ; 0x010: 0x00000000

kfbh.fcn.wrap: 0 ; 0x014: 0x00000000

kfbh.spare1: 0 ; 0x018: 0x00000000

kfbh.spare2: 0 ; 0x01c: 0x00000000

000000000 00000000 00000000 00000000 00000000 [................]

Repeat 255 times

KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

在节点1查看/dev/raw/raw1显示disk 类型是KFBTYP_INVALID，并且kfdhdb.vfstart有值，说明raw1在节点1是正常的asm disk，并且是vote disk。但是节点2查看相同的disk，显示完全不同的信息。正常情况下，配置的共享设备raw1在节点1和节点2看到的信息应该是一致的，但是这个case中节点1和节点2看到的是不同的信息，说明共享disk配置是不正确的。

同时，在节点1手动执行OCRDUMP确认key SYSTEM.rootcrs.checkpoints.firstnode是存在的，并且状态是” SUCCESS”

su – root

ocrdump /tmp/ocrdump1.out

more /tmp/ocrdump1.out

[SYSTEM.rootcrs.checkpoints.firstnode]

ORATEXT : SUCCESS

最后，修改UDEV配置文件(/etc/udev/rules.d/99-oracle-asmdevices.rules)后问题解决。

root.sh脚本支持checkpoints文件实现重复运行的更多相关文章

【RAC】安装cluster软件在节点2执行root.sh脚本
安装cluster软件在节点2执行root.sh脚本报错如下: Running vipca(silent) for configuring nodeapps /db/oracle/product ...
安装GRID时跑root.sh脚本报错（ORA-27091: unable to queue I/O）
在安装GRID过程中,运行root.sh脚本时报如下信息: Adding Clusterware entries to upstart CRS-2672: Attempting to start 'o ...
RAC安装GI时运行root.sh脚本结果
第一节点运行root.sh脚本的结果: # /u01/app//grid/root.sh Performing root user operation for Oracle 11g The follo ...
如何诊断crs 安装时 root.sh 脚本执行错误
troubleshooting root.sh problem ------*for 10g and 11.1 1.查证公网,私网的节点名是可以互相ping通的 2.---查证OCR/Voting 文 ...
sh 脚本执行sql文件传参数
一.前言今天做数据删除,用的命令行输入参数,并且调用执行的sql文件,我采用了sed命令,进行替换. sh脚本如下 #! /bin/sh echo "Please enter the ba ...
Oracle 12.2.0.1 RAC for rhel 7.X 数据库安装（节点1执行root.sh失败）
说明: 最开始是用的rehat7.2安装12.2.0.1,后面安装GI节点一执行root.sh脚本失败,排查原因,最开始以为是操作系统的问题,换成rehat7.6,同样的出现问题,经过一番折腾,后面通 ...
10gR2 rac怎样重跑root.sh ？
原文博客链接地址:10gR2 rac怎样重跑root.sh ? 前几天遇到一客户的10205 rac,出现LMD进程IPC SEND TIMEOUT问题. 准备深入研究下Oracle RAC 的LMO ...
centos 7安装rac 11gR2时运行root.sh报错找不到ohas服务（ohasd failed to start）
单独在linux 7中为ohasd设置一个服务.步骤如下1. 创建服务ohas.service的服务文件并赋予权限touch /usr/lib/systemd/system/ohas.servicec ...
Oracle：Redhat 7.4+Oracle Rac 11.2.0.4 执行root.sh报错处理
一.报错信息二.原因分析因为RHEL 7使用systemd而不是initd运行进程和重启进程,而root.sh通过传统的initd运行ohasd进程三.解决办法在RHEL 7中ohasd需要被 ...

随机推荐

C/C++学习笔记1
namespace: 通过命名空间防止变量冲突. 定义 namespace nsl { class Student { public: Student(int n,string nam,int a) ...
FZU 2150 Fire Game （高姿势bfs--两个起点）（路径不重叠：一个队列同时跑）
Description Fat brother and Maze are playing a kind of special (hentai) game on an N*M board (N rows ...
E20190409-hm
viable adj. 切实可行的; 能养活的; 能自行生产发育的; 有望实现的; resolution n. 分辨率; 解决; 决心; 坚决; produce vt. 产生; 生产; 制作; 创 ...
模板 - 数据结构 - ST表 + 二维ST表
区间最大值,$O(nlogn)$ 预处理,$O(1)$ 查询,不能动态修改.在查询次数M显著大于元素数量N的时候看得出差距. 令 $f[i][j]$ 表示 $[i,i+2^j-1]$ 的最大值. 显然 ...
Maven 依赖范围(转)
1.什么是依赖范围? maven 项目不同的阶段引入到classpath中的依赖是不同的,例如,编译时,maven 会将与编译相关的依赖引入classpath中,测试时,maven会将测试相关的的依赖 ...
ZOJ3359【阅读理解】
前言: 和队友一发入魂,很强势. 比赛中题目长的,就和队友一起读,这样比较快,然后还不会梦游,把点一句一句地搞出来. 思路: 在头5次,每次有人踢球就可能会输. 后面谁没进,对方进了救输. 代码: / ...
android摄像头获取图像——第一弹
http://www.cnblogs.com/mengyan/archive/2012/09/01/2666636.html 安卓读取视频的几种方式: 详细讲述请参考网址:http://www.cnb ...
手游性能优化之深入理解Texture Compression
http://gad.qq.com/article/detail/7154875 一.引子手游项目开发日常里,经常有美术同学搞不清Photoshop制图软件与Unity3D游戏引擎之间的图片asse ...
euler证明
我们用g(x)表示x的欧拉函数值,即1~x与x互质的数的个数欧拉函数公式为: g(x)= y*((x1-1)/x1)*((x2-1)/x2)*((x3-1)/x3)....(其中x1, x2, x3 ...
LeetCode.905-按奇偶排序数组(Sort Array By Parity)
这是悦乐书的第347次更新,第371篇原创 01 看题和准备今天介绍的是LeetCode算法题中Easy级别的第212题(顺位题号是905).给定一个非负整数数组A,返回一个由A的所有偶数元素组成的 ...

root.sh脚本支持checkpoints文件实现重复运行

root.sh脚本支持checkpoints文件实现重复运行的更多相关文章

随机推荐

热门专题