由于dns服务为启动导致的GI集群启动故障
1、物业由于突然断电导致grid集群重新启动后rac数据库无法正常启动,对集群进行检查,结果如下,发现其中有4个数据库状态为instance shutdown。
[root@node1 ~]# su - grid
[grid@node1 ~]$ crsctl status res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.FLASH.dg
ONLINE ONLINE node1
ONLINE ONLINE node2
ONLINE ONLINE node3
ONLINE ONLINE node4
ONLINE ONLINE node5
ONLINE ONLINE node6
ora.GRIDDG.dg
ONLINE ONLINE node1
ONLINE ONLINE node2
ONLINE ONLINE node3
ONLINE ONLINE node4
ONLINE ONLINE node5
ONLINE ONLINE node6
ora.LISTENER.lsnr
ONLINE ONLINE node1
ONLINE ONLINE node2
ONLINE ONLINE node3
ONLINE ONLINE node4
ONLINE ONLINE node5
ONLINE ONLINE node6
ora.LTDG.dg
ONLINE ONLINE node1
ONLINE ONLINE node2
ONLINE ONLINE node3
ONLINE ONLINE node4
ONLINE ONLINE node5
ONLINE ONLINE node6
ora.ORADG.dg
ONLINE ONLINE node1
ONLINE ONLINE node2
ONLINE ONLINE node3
ONLINE ONLINE node4
ONLINE ONLINE node5
ONLINE ONLINE node6
ora.asm
ONLINE ONLINE node1 Started
ONLINE ONLINE node2 Started
ONLINE ONLINE node3 Started
ONLINE ONLINE node4 Started
ONLINE ONLINE node5 Started
ONLINE ONLINE node6 Started
ora.gsd
OFFLINE OFFLINE node1
OFFLINE OFFLINE node2
OFFLINE OFFLINE node3
OFFLINE OFFLINE node4
OFFLINE OFFLINE node5
OFFLINE OFFLINE node6
ora.net1.network
ONLINE ONLINE node1
ONLINE ONLINE node2
ONLINE ONLINE node3
ONLINE ONLINE node4
ONLINE ONLINE node5
ONLINE ONLINE node6
ora.ons
ONLINE ONLINE node1
ONLINE ONLINE node2
ONLINE ONLINE node3
ONLINE ONLINE node4
ONLINE ONLINE node5
ONLINE ONLINE node6
ora.registry.acfs
ONLINE ONLINE node1
ONLINE ONLINE node2
ONLINE ONLINE node3
ONLINE ONLINE node4
ONLINE ONLINE node5
ONLINE ONLINE node6
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE node2
ora.LISTENER_SCAN2.lsnr
1 ONLINE ONLINE node3
ora.LISTENER_SCAN3.lsnr
1 ONLINE ONLINE node1
ora.cvu
1 ONLINE ONLINE node3
ora.efmisdb.db
1 ONLINE OFFLINE Instance Shutdown
ora.efmisdb.efmissrv.svc
1 ONLINE OFFLINE
ora.faspdb.db
1 ONLINE OFFLINE Instance Shutdown
ora.faspdb.faspsvc.svc
1 ONLINE OFFLINE
ora.ltdb.db
1 ONLINE OFFLINE Instance Shutdown
2 ONLINE OFFLINE Instance Shutdown
ora.node1.vip
1 ONLINE ONLINE node1
ora.node2.vip
1 ONLINE ONLINE node2
ora.node3.vip
1 ONLINE ONLINE node3
ora.node4.vip
1 ONLINE ONLINE node4
ora.node5.vip
1 ONLINE ONLINE node5
ora.node6.vip
1 ONLINE ONLINE node6
ora.oadb.db
1 ONLINE ONLINE node6 Open
ora.oc4j
1 ONLINE ONLINE node3
ora.orcl.db
1 ONLINE ONLINE node5 Open
ora.scan1.vip
1 ONLINE ONLINE node2
ora.scan2.vip
1 ONLINE ONLINE node3
ora.scan3.vip
1 ONLINE ONLINE node1
2、asmcmd查看asm磁盘是否可以正常访问,确认asm正常:
[grid@node1 ~]$ asmcmd -p
ASMCMD [+] > ls
FLASH/
GRIDDG/
LTDG/
ORADG/
ASMCMD [+] > ls ltdg
ASMCMD [+] > cd oradg
ASMCMD [+oradg] > ls
EFMISDB/
FASPDB/
LTDB/
OADB/
ORCL/
ASMCMD [+oradg] > cd ltdb
ASMCMD [+oradg/ltdb] > ls
CONTROLFILE/
DATAFILE/
ONLINELOG/
PARAMETERFILE/
TEMPFILE/
spfileltdb.ora
ASMCMD [+oradg/ltdb] > cd datafile
ASMCMD [+oradg/ltdb/datafile] > ls
BSIP_JLPT.335.923932111
EFMIS.291.920050869
EFMIS.292.920050843
EFMIS.293.920050823
EFMIS.294.920050793
EFMIS.338.926285547
EFMIS.339.926285561
EFMIS_YS.327.922529787
EFMIS_YS.DBF
FASP2.290.920053427
FASP2.325.922112969
FASP2.326.922112977
FASP2.328.923068707
FASP2.329.923571833
FASP2.330.923571847
FASP2.340.926285581
FASP2TEST.334.923932089
SYSAUX.257.919523823
SYSTEM.256.919523823
UNDOTBS1.258.919523825
UNDOTBS2.264.919523987
USERS.259.919523825
ASMCMD [+oradg/ltdb/datafile] > exit
3、登录数据库,尝试启动数据库,此间打开alert日志:
[oracle@node1 ~]$ export ORACLE_SID=ltdb1
[oracle@node1 ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Tue Nov 1 12:32:15 2016
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup
ORA-00119: invalid specification for system parameter REMOTE_LISTENER
ORA-00132: syntax error or unresolved network name 'lt-cluster:1521'
SQL> exit
发现数据库启动报错119,查看以下alert日志发现进程kill因为119:
Tue Nov 01 12:32:18 2016
Starting ORACLE instance (normal)
************************ Large Pages Information *******************
Per process system memlock (soft) limit = 32 KB
Total Shared Global Region in Large Pages = 0 KB (0%)
Large Pages used by this instance: 0 (0 KB)
Large Pages unused system wide = 0 (0 KB)
Large Pages configured system wide = 0 (0 KB)
Large Page size = 2048 KB
RECOMMENDATION:
Total System Global Area size is 3010 MB. For optimal performance,
prior to the next instance restart:
1. Increase the number of unused large pages by
at least 1505 (page size 2048 KB, total size 3010 MB) system wide to
get 100% of the System Global Area allocated with large pages
2. Large pages are automatically locked into physical memory.
Increase the per process memlock (soft) limit to at least 3018 MB to lock
100% System Global Area's large pages into physical memory
********************************************************************
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Initial number of CPU is 4
Number of processor cores in the system is 4
Number of processor sockets in the system is 2
Private Interface 'eth1:1' configured from GPnP for use as a private interconnect.
[name='eth1:1', type=1, ip=169.254.149.234, mac=00-50-56-80-52-8e, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
Public Interface 'eth0' configured from GPnP for use as a public interface.
[name='eth0', type=1, ip=192.168.100.61, mac=00-50-56-80-2b-89, net=192.168.100.0/24, mask=255.255.255.0, use=public/1]
Public Interface 'eth0:1' configured from GPnP for use as a public interface.
[name='eth0:1', type=1, ip=192.168.100.70, mac=00-50-56-80-2b-89, net=192.168.100.0/24, mask=255.255.255.0, use=public/1]
Public Interface 'eth0:6' configured from GPnP for use as a public interface.
[name='eth0:6', type=1, ip=192.168.100.81, mac=00-50-56-80-2b-89, net=192.168.100.0/24, mask=255.255.255.0, use=public/1]
CELL communication is configured to use 0 interface(s):
CELL IP affinity details:
NUMA status: non-NUMA system
cellaffinity.ora status: N/A
CELL communication will use 1 IP group(s):
Grp 0:
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as USE_DB_RECOVERY_FILE_DEST
Autotune of undo retention is turned on.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options.
ORACLE_HOME = /u01/app/oracle/product/11.2.0.4/dbhome_1
System name: Linux
Node name: node1
Release: 2.6.18-308.el5
Version: #1 SMP Fri Jan 27 17:17:51 EST 2012
Machine: x86_64
VM name: VMWare Version: 6
Using parameter settings in server-side pfile /u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/initltdb1.ora
System parameters with non-default values:
processes = 800
sessions = 1224
spfile = "+ORADG/ltdb/spfileltdb.ora"
sga_target = 3008M
control_files = "+ORADG/ltdb/controlfile/current.260.919523907"
control_files = "+FLASH/ltdb/controlfile/current.256.919523907"
db_block_size = 8192
compatible = "11.2.0.4.0"
cluster_database = TRUE
db_create_file_dest = "+ORADG"
db_recovery_file_dest = "+FLASH"
db_recovery_file_dest_size= 4407M
thread = 1
undo_tablespace = "UNDOTBS1"
_partition_large_extents = "FALSE"
_index_partition_large_extents= "FALSE"
instance_number = 1
remote_login_passwordfile= "EXCLUSIVE"
db_domain = ""
dispatchers = "(PROTOCOL=TCP) (SERVICE=ltdbXDB)"
remote_listener = "lt-cluster:1521"
audit_file_dest = "/u01/app/oracle/admin/ltdb/adump"
audit_trail = "DB"
db_name = "ltdb"
open_cursors = 300
pga_aggregate_target = 998M
diagnostic_dest = "/u01/app/oracle"
Cluster communication is configured to use the following interface(s) for this instance
169.254.149.234
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
Tue Nov 01 12:33:04 2016
USER (ospid: 31080): terminating the instance due to error 119
Instance terminated by USER, pid = 31080
4、目前为止可以看到GI集群启动正常,但是数据库无法打开,报错119,通过查看参数文件配置并未发现异常。通过nslookup检查scan ip解析情况,发现dns解析出现问题,查看dbs服务器,发现dns服务器并未启动。启动dns服务器后重启数据库服务器,发现可以正常打开数据库。
由于dns服务为启动导致的GI集群启动故障的更多相关文章
- SQL Server AG集群启动不起来的临时自救大招
SQL Server AG集群启动不起来的临时自救大招 背景 前晚一朋友遇到AG集群发生来回切换不稳定的情况,情急之下,朋友在命令行使用命令重启WSFC集群 结果重启WSFC集群之后,非但没有好转,导 ...
- Hadoop ha CDH5.15.1-hadoop集群启动后,两个namenode都是standby模式
Hadoop ha CDH5.15.1-hadoop集群启动后,两个namenode都是standby模式 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 一说起周五,想必大家都特别 ...
- Deinstall卸载RAC之Oracle软件及数据库+GI集群软件
Deinstall卸载Oracle软件及数据库+GI集群软件 1. 本篇文档应用场景: 需要安装新的ORACLE RAC产品,系统没有重装,需要对原环境中的RAC进行卸载: #本篇文档,在AIX 6. ...
- 庐山真面目之十二微服务架构基于Docker搭建Consul集群、Ocelot网关集群和IdentityServer版本实现
庐山真面目之十二微服务架构基于Docker搭建Consul集群.Ocelot网关集群和IdentityServer版本实现 一.简介 在第七篇文章<庐山真面目之七微服务架构Consul ...
- SpringCloud升级之路2020.0.x版-20. 启动一个 Eureka Server 集群
本系列代码地址:https://github.com/HashZhang/spring-cloud-scaffold/tree/master/spring-cloud-iiford 我们的业务集群结构 ...
- zookeeper源码 — 二、集群启动—leader选举
上一篇介绍了zookeeper的单机启动,集群模式下启动和单机启动有相似的地方,但是也有各自的特点.集群模式的配置方式和单机模式也是不一样的,这一篇主要包含以下内容: 概念介绍:角色,服务器状态 服务 ...
- Hadoop ha CDH5.15.1-hadoop集群启动后,集群容量不正确,莫慌,这是正常的表现!
Hadoop ha CDH5.15.1-hadoop集群启动后,集群容量不正确,莫慌,这是正常的表现! 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.集群启动顺序 1>. ...
- 【ELK】【docker】6.Elasticsearch 集群启动多节点 + 解决ES节点集群状态为yellow
本章其实是ELK第二章的插入章节. 本章ES集群的多节点是docker启动在同一个虚拟机上 ====================================================== ...
- 转载:oracle RAC集群启动和关闭
http://www.cnblogs.com/yhfssp/p/8184761.html oracle 11G RAC集群启动和关闭: 1.停止数据库 $srvctl stop database –d ...
随机推荐
- IOC关注服务(或应用程序部件)是如何定义的以及他们应该如何定位他们依赖的其它服务
IOC关注服务(或应用程序部件)是如何定义的以及他们应该如何定位他们依赖的其它服务.通常,通过一个容器或定位框架来获得定义和定位的分离,容器或定位框架负责: 保存可用服务的集合 提供一种方式将各种部件 ...
- 不可将布尔变量直接与 TRUE、FALSE 或者 1、0 进行比较
不可将布尔变量直接与 TRUE.FALSE 或者 1.0 进行比较. 根据布尔类型的语义,零值为“假”(记为 FALSE),任何非零值都是“真”(记为 TRUE). TRUE 的值究竟是什么并没有统一 ...
- 50个必备的实用jQuery代码段(转)
1. 如何创建嵌套的过滤器: //允许你减少集合中的匹配元素的过滤器, //只剩下那些与给定的选择器匹配的部分.在这种情况下, //查询删除了任何没(:not)有(:has) //包含class为“s ...
- mysql 启动报错--发现系统错误2,系统找不到指定的文件。
解决方法: 控制面板--找到mysql程序--修复
- shiro缓存
shiro的可以权限控制内容包括:URL地址.Web页面的元素.以及方法,即shiro对用户权限的控制是细粒度的.从用户的一次访问来说,他可能需要最多经过三种.多次的验证.这里的多次怎么说呢?如果说W ...
- squid2.7安装与配置
CleverCode近期研究了一下squid的安装与配置. 如今总结一下.分享给大家. 1 简单介绍 代理server英文全称是Proxy Server,其功能就是代理网络用户去取得网络信息. Squ ...
- android基站定位程序获取地理位置
目录 一.设置界面 二.为按钮绑定事件 三.获取基站信息 四.获取经纬度 五.获取物理位置 六.显示结果 七.运行程序 八.总结 九.程序代码 正文 在Android操作系统下,基站定位其实很简单,先 ...
- 在word文档中如何插入Mathtype公式
将mathtype公式插入到word文档中,是mathtype公式编辑器最基本的操作.当在Mathtype数学公式编辑器中编辑好公式之后,点击文件->更新XXX文档(XXX为当前编辑的word文 ...
- 定义的函数在main中调用时提示找不到标识符
要把定义的函数放在main函数前,如果放在main函数后了,要在main前做声明.声明就是把函数定义的首部一行加一个分号放在main之前. 在c语言中自定义了一个函数,在main中调用时提示找不到标识 ...
- Java 实现选择排序
选择排序: 原理:依次从数组最左边取一个元素,与之后的位置上的元素比較,假设大于/小于(取决于须要升序排还是降序排).则保存较大/较小元素的索引 当一轮比較后,将保存的较大/较小元素的索引与 这轮開始 ...