每年年底,系统管理员都要组织一次容灾方案的测试、演练。会在一个与生产环境网络隔离的DR环境中,启动各个“生产环境服务器”,然后让各路人员参与其中测试、演练容灾方案是否可靠。这次演练中,一台Oracle数据库服务器启动的时候遇到了问题。如下所示,启动的时候遇到ORA-03113: end-of-file on communication channel错误。

[oracle@mylnx6 ~]$ sqlplus / as sysdba

 

SQL*Plus: Release 10.2.0.5.0 - Production on Fri Dec 21 09:42:11 2018

 

Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.

 

Connected to an idle instance.

 

SQL> startup

ORA-03113: end-of-file on communication channel

SQL> 

检查告警日志,发现数据库在启动的时候,报“ORA-00471: DBWR process terminated with error”错误。如下所示:

PMON started with pid=2, OS id=25005

PSP0 started with pid=3, OS id=25007

MMAN started with pid=4, OS id=25009

DBW0 started with pid=5, OS id=25011

LGWR started with pid=6, OS id=25013

CKPT started with pid=7, OS id=25016

SMON started with pid=8, OS id=25018

RECO started with pid=9, OS id=25020

CJQ0 started with pid=10, OS id=25022

MMON started with pid=11, OS id=25024

Fri Dec 21 09:44:36 CST 2018

starting up 8 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...

MMNL started with pid=12, OS id=25026

Fri Dec 21 09:45:12 CST 2018

starting up 24 shared server(s) ...

Fri Dec 21 09:46:43 CST 2018

Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_pmon_25005.trc:

ORA-00471: DBWR process terminated with error

Fri Dec 21 09:46:43 CST 2018

PMON: terminating instance due to error 471

Instance terminated by PMON, pid = 25005

启动数据库实例的时候,报“ORA-00471: DBWR process terminated with error”这个错误,这个很蹊跷,很有可能是进程被系统给Kill掉了,检查操作系统的错误日志,发现出现了oom_kill_process,也就是说数据库实例启动的时候,由于系统内存资源紧张,DBWR进程被系统选作了牺牲品。具体错误日志如下所示:

Dec 21 09:46:39 mylnx6 kernel: oracle invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0

Dec 21 09:46:39 mylnx6 kernel: oracle cpuset=/ mems_allowed=0

Dec 21 09:46:39 mylnx6 kernel: Pid: 25026, comm: oracle Not tainted 2.6.32-200.13.1.el5uek #1

Dec 21 09:46:39 mylnx6 kernel: Call Trace:

Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff810a0b66>] ? cpuset_print_task_mems_allowed+0x92/0x9e

Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff810d9ae6>] oom_kill_process+0x85/0x25b

Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff810d9fbc>] ? select_bad_process+0xbc/0x102

Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff810da03f>] __out_of_memory+0x3d/0x86

Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff810da30f>] out_of_memory+0xfc/0x195

Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff810dd75e>] __alloc_pages_nodemask+0x487/0x595

Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff811075ac>] alloc_page_vma+0xb9/0xc8

Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff810ff0a7>] read_swap_cache_async+0x52/0xf1

Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff810ff1a3>] swapin_readahead+0x5d/0x9c

Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff810d725a>] ? find_get_page+0x22/0x69

Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff810f1ea3>] handle_mm_fault+0x44b/0x80f

Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff8106d7cd>] ? getrusage+0x2b1/0x2ce

Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff8101270e>] ? common_interrupt+0xe/0x13

Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff81043696>] ? should_resched+0xe/0x2f

Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff81456006>] do_page_fault+0x210/0x299

Dec 21 09:46:39 mylnx6 kernel:  [<ffffffff81453fd5>] page_fault+0x25/0x30

Dec 21 09:46:39 mylnx6 kernel: Mem-Info:

Dec 21 09:46:39 mylnx6 kernel: Node 0 DMA per-cpu:

Dec 21 09:46:39 mylnx6 kernel: CPU    0: hi:    0, btch:   1 usd:   0

Dec 21 09:46:39 mylnx6 kernel: CPU    1: hi:    0, btch:   1 usd:   0

Dec 21 09:46:39 mylnx6 kernel: CPU    2: hi:    0, btch:   1 usd:   0

Dec 21 09:46:39 mylnx6 kernel: CPU    3: hi:    0, btch:   1 usd:   0

Dec 21 09:46:39 mylnx6 kernel: CPU    4: hi:    0, btch:   1 usd:   0

Dec 21 09:46:39 mylnx6 kernel: CPU    5: hi:    0, btch:   1 usd:   0

Dec 21 09:46:39 mylnx6 kernel: CPU    6: hi:    0, btch:   1 usd:   0

Dec 21 09:46:39 mylnx6 kernel: CPU    7: hi:    0, btch:   1 usd:   0

Dec 21 09:46:39 mylnx6 kernel: Node 0 DMA32 per-cpu:

Dec 21 09:46:39 mylnx6 kernel: CPU    0: hi:  186, btch:  31 usd:   0

Dec 21 09:46:39 mylnx6 kernel: CPU    1: hi:  186, btch:  31 usd:   0

Dec 21 09:46:39 mylnx6 kernel: CPU    2: hi:  186, btch:  31 usd:   0

Dec 21 09:46:39 mylnx6 kernel: CPU    3: hi:  186, btch:  31 usd:   0

Dec 21 09:46:39 mylnx6 kernel: CPU    4: hi:  186, btch:  31 usd:   0

Dec 21 09:46:39 mylnx6 kernel: CPU    5: hi:  186, btch:  31 usd:   0

Dec 21 09:46:39 mylnx6 kernel: CPU    6: hi:  186, btch:  31 usd:   0

Dec 21 09:46:39 mylnx6 kernel: CPU    7: hi:  186, btch:  31 usd:   0

Dec 21 09:46:39 mylnx6 kernel: Node 0 Normal per-cpu:

Dec 21 09:46:39 mylnx6 kernel: CPU    0: hi:  186, btch:  31 usd:   0

Dec 21 09:46:40 mylnx6 lvm[4702]: Another thread is handling an event. Waiting...

Dec 21 09:46:41 mylnx6 kernel: CPU    1: hi:  186, btch:  31 usd:   0

Dec 21 09:46:40 mylnx6 lvm[4702]: Another thread is handling an event. Waiting...

Dec 21 09:46:41 mylnx6 kernel: CPU    2: hi:  186, btch:  31 usd:   0

Dec 21 09:46:41 mylnx6 kernel: CPU    3: hi:  186, btch:  31 usd:   0

Dec 21 09:46:41 mylnx6 kernel: CPU    4: hi:  186, btch:  31 usd:   0

Dec 21 09:46:41 mylnx6 kernel: CPU    5: hi:  186, btch:  31 usd:   0

Dec 21 09:46:41 mylnx6 kernel: CPU    6: hi:  186, btch:  31 usd:   0

Dec 21 09:46:41 mylnx6 kernel: CPU    7: hi:  186, btch:  31 usd:   0

Dec 21 09:46:41 mylnx6 kernel: active_anon:1764 inactive_anon:209 isolated_anon:64

Dec 21 09:46:41 mylnx6 kernel:  active_file:349 inactive_file:1710 isolated_file:0

Dec 21 09:46:41 mylnx6 kernel:  unevictable:5377 dirty:0 writeback:4 unstable:0

Dec 21 09:46:41 mylnx6 kernel:  free:29838 slab_reclaimable:2400 slab_unreclaimable:119491

Dec 21 09:46:41 mylnx6 kernel:  mapped:2703 shmem:830 pagetables:9849 bounce:0

Dec 21 09:46:41 mylnx6 kernel: Node 0 DMA free:15652kB min:12kB low:12kB high:16kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15172kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes

Dec 21 09:46:41 mylnx6 kernel: lowmem_reserve[]: 0 3000 24210 24210

Dec 21 09:46:41 mylnx6 kernel: Node 0 DMA32 free:86296kB min:2464kB low:3080kB high:3696kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072096kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes

Dec 21 09:46:41 mylnx6 kernel: lowmem_reserve[]: 0 0 21210 21210

Dec 21 09:46:41 mylnx6 kernel: Node 0 Normal free:17404kB min:17440kB low:21800kB high:26160kB active_anon:7056kB inactive_anon:836kB active_file:1396kB inactive_file:6840kB unevictable:21508kB isolated(anon):256kB isolated(file):0kB present:21719040kB mlocked:21504kB dirty:0kB writeback:16kB mapped:10812kB shmem:3320kB slab_reclaimable:9600kB slab_unreclaimable:477964kB kernel_stack:2800kB pagetables:39396kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:544 all_unreclaimable? no

Dec 21 09:46:41 mylnx6 kernel: lowmem_reserve[]: 0 0 0 0

Dec 21 09:46:41 mylnx6 kernel: Node 0 DMA: 1*4kB 2*8kB 1*16kB 0*32kB 2*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15652kB

Dec 21 09:46:41 mylnx6 kernel: Node 0 DMA32: 12*4kB 13*8kB 2*16kB 5*32kB 5*64kB 11*128kB 3*256kB 7*512kB 6*1024kB 4*2048kB 16*4096kB = 86296kB

Dec 21 09:46:41 mylnx6 kernel: Node 0 Normal: 420*4kB 1917*8kB 49*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 17800kB

Dec 21 09:46:41 mylnx6 kernel: 4722 total pagecache pages

Dec 21 09:46:41 mylnx6 kernel: 694 pages in swap cache

Dec 21 09:46:41 mylnx6 kernel: Swap cache stats: add 589182, delete 588488, find 343370/443306

Dec 21 09:46:41 mylnx6 kernel: Free swap  = 66723056kB

Dec 21 09:46:41 mylnx6 kernel: Total swap = 67108856kB

Dec 21 09:46:41 mylnx6 kernel: 6291440 pages RAM

Dec 21 09:46:41 mylnx6 kernel: 107316 pages reserved

Dec 21 09:46:41 mylnx6 kernel: 24060 pages shared

Dec 21 09:46:41 mylnx6 kernel: 77648 pages non-shared

Dec 21 09:46:41 mylnx6 kernel: Out of memory: kill process 25011 (oracle) score 8425150 or a child

Dec 21 09:46:41 mylnx6 kernel: Killed process 25011 (oracle)

Dec 21 09:47:20 mylnx6 lvm[4702]: Another thread is handling an event. Waiting...

检查这个系统的内存,发现DR环境下,这个服务器只分配了24G内存,而实际生产环境的内存为64G(设置了Linux标准大页,而且SGA_MAX_SIZE大小为32G),而且这个环境是生产环境的“克隆体”,只是由于资源限制的缘故,系统管理员只分配24G内存。如下所示:

[root@mylnx6 ~]# free -m

             total       used       free     shared    buffers     cached

Mem:         24156      24033        123          0          0          6

-/+ buffers/cache:      24026        130

Swap:        65535         41      65494

[root@mylnx6 ~]# ps -ef | grep ora_

root     11759 11490  0 16:10 pts/1    00:00:00 grep ora_

[root@mylnx6 ~]# ipcs -m

 

------ Shared Memory Segments --------

key        shmid      owner      perms      bytes      nattch     status      

0x00000000 3080192    root      644        80         2                       

0x00000000 3112961    root      644        16384      2                       

0x00000000 3145730    root      644        280        2                       

0x00000000 4096003    gdm       600        393216     0                       

0x2cd12178 3866628    oracle    640        34361835520 0                       

0x00000000 5210117    gdm       600        393216     2          dest   

如上所示,可以看到oracle用户的共享内存段为34361835520字节。所以引起这个错误的原因是因为在系统层面配置了标准大页的缘故(内存资源变化了,但是配置没有随之修改),为了快速解决问题,我们取消标准大页的相关设置。如下所示:

修改limits.conf参数,注释soft memlock和hard memlock参数。

vi /etc/security/limits.conf

然后修改sysctl.conf,将vm.nr_hugepages注释掉。然后重启一下(DR测试环境,可以随时重启)。然后启动Oracle数据库实例,一切正常,当然还需调整相关参数,继续后续测试~。

ORA-00471: DBWR process terminated with error案例的更多相关文章

  1. [SOLVED] “Error 1067: The process terminated unexpectedly” on Windows 10, 7 & 8

    Windows background services enable Windows features function properly. If some errors happen to serv ...

  2. 诊断:MRP0: Background Media Recovery terminated with error 1111

    表现: 灾备环境,无法继续应用日志. 日志: MRP0: Background Media Recovery terminated with error 1111 Fri Jan 18 15:55:2 ...

  3. Job for httpd.service failed because the control process exited with error code. See "systemctl status httpd.service" and "journalctl -xe" for details

    thinkphp 在Apache上配置启用伪静态,重启Apache1 restart 竟然失败了,报错 Job for httpd.service failed because the control ...

  4. kkjcre1p: unable to spawn jobq slave process, slot 0, error 1089(Linux x86_64)补丁

    在shutdown immediately的时候,alert Log出现如下错误信息,并且不能正常关闭 kkjcre1p: unable to spawn jobq slave process, sl ...

  5. Web自动化框架之五一套完整demo的点点滴滴(excel功能案例参数化+业务功能分层设计+mysql数据存储封装+截图+日志+测试报告+对接缺陷管理系统+自动编译部署环境+自动验证false、error案例)

    标题很大,想说的很多,不知道从那开始~~直接步入正题吧 个人也是由于公司的人员的现状和项目的特殊情况,今年年中后开始折腾web自动化这块:整这个原因很简单,就是想能让自己偷点懒.也让减轻一点同事的苦力 ...

  6. protractor protractor.conf.js [launcher] Process exited with error code 1 undefined:1190

    y@y:karma-t01$ protractor protractor.conf.js [launcher] Process exited with error code undefined: vl ...

  7. CentOS启动docker1.13失败(Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.)

    一.启动失败 1.启动docker [root@localhost ~]# systemctl start docker Job for docker.service failed because t ...

  8. Job for network.service failed because the control process exited with error code

    转自:https://blog.csdn.net/dongfei2033/article/details/81124465 今天在centOS 7下更改完静态ip后发现network服务重启不了,翻遍 ...

  9. Job for docker.service failed because the control process exited with error

    Docker 无法启动 报错信息:Job for docker.service failed because the control process exited with error 找了很久才解决 ...

随机推荐

  1. python之pickle模块

    1 概念 pickle是python语言的标准模块,安装python后以包含pickle库,不需要再单独安装. pickle提供了一种简单的持久化功能,可以将对象以文件的形式存放在磁盘上. pickl ...

  2. fiddler抓取https失败解决方案

    众所周知,Fiddler默认只能抓取到http请求,要抓取到https请求我们还需要FiddlerCertMaker插件的支持, 至于怎么使用fiddler抓https及插件的使用方式,大家可以去百度 ...

  3. PHP-mysqli 出错回显

    面向对象风格 <?php $conn = new mysqli("localhost", "username", "password" ...

  4. Visual Studio 2017中使用SourceLink调试ASP.NET Core源码

    背景 当我们在学习ASP.NET Core或者调试ASP.NET Core程序的时候,有时候需要调试底层代码,但是当我们在Visual Studio中调试程序的时候,由于一些基础库或者第三方库缺少pd ...

  5. sql server 性能调优之 死锁排查

    一.概述 记得以前客户在使用软件时,有偶发出现死锁问题,因为发生的时间不确定,不好做问题的重现,当时解决问题有点棘手了.现总结下查看死锁的常用二种方式. 1.1 第一种是图形化监听: sqlserve ...

  6. Spring中bean实例化的三种方式

    之前我已经有好几篇博客介绍Spring框架了,不过当时我们都是使用注解来完成注入的,具体小伙伴可以参考这几篇博客(Spring&SpringMVC框架案例).那么今天我想来说说如何通过xml配 ...

  7. 构造方法、封装、关键字(this、static)和代码块的介绍

    1.构造方法 1.1 构造方法与成员方法的区别 构造方法分为无参构造和有参构造,其中有参构造方法和无参构造方法为方法的重载关系. 构造方法在初始化一个类的对象时进行调用,它没有返回值,方法名与类名相同 ...

  8. 谈谈iOS获取调用链

    本文由云+社区发表 iOS开发过程中难免会遇到卡顿等性能问题或者死锁之类的问题,此时如果有调用堆栈将对解决问题很有帮助.那么在应用中如何来实时获取函数的调用堆栈呢?本文参考了网上的一些博文,讲述了使用 ...

  9. 痞子衡嵌入式:飞思卡尔i.MX RT系列MCU启动那些事(5)- 再聊eFUSE及其烧写方法

    大家好,我是痞子衡,是正经搞技术的痞子.今天痞子衡给大家介绍的是飞思卡尔i.MX RT系列MCU的eFUSE. 在i.MXRT启动系列第二篇文章 Boot配置(BOOT Pin, eFUSE) 里痞子 ...

  10. Jenkins结合.net平台综合应用之使用FileZilla搭建ftp服务器

    上一节我们讲解了如何编译web项目,web项生成以后我们是手动复制到iis目录下的,这显然不符合devops初衷,这里我们讲解如何利用ftp协议把文件传到远程服务器的iis目录下. 这一讲分两部分一部 ...