案例:latch: cache buffers chains event tuning
前两天对oracle数据库(single instance)进行了迁移升级从10.2.0.4 升到11.2.0.3,有一个项目迁完后第二天,cpu负载升到了130更高(16cpus). 向用户询问后使用上没有改变,平时就几个人使用,而该用户活动会话就有100多个。最在等待CBC latch. 怀疑是执行计划发生了改变。
[root@dbserver40 ~]# free
total used free shared buffers cached
Mem: 16429016 16363308 65708 0 188084 5901364
-/+ buffers/cache: 10273860 6155156
Swap: 16378228 18660 16359568
[root@dbserver40 ~]# top
top - 14:52:02 up 7 days, 21:08, 1 user, load average: 141.69, 131.61, 127.43
Tasks: 990 total, 187 running, 803 sleeping, 0 stopped, 0 zombie
Cpu(s): 98.9%us, 0.6%sy, 0.0%ni, 0.4%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 16429016k total, 16360432k used, 68584k free, 188140k buffers
Swap: 16378228k total, 18660k used, 16359568k free, 5901532k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20533 oracle 19 0 7416m 59m 56m R 32.0 0.4 179:24.70 oracle
11667 oracle 19 0 7416m 57m 54m R 31.6 0.4 43:20.90 oracle
2376 oracle 19 0 7402m 42m 39m R 29.7 0.3 66:22.29 oracle
3165 oracle 19 0 7416m 59m 55m R 29.7 0.4 56:28.18 oracle
...
[root@dbserver40 ~]# vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
177 0 18660 65432 188156 5901540 0 0 16 20 0 4 18 0 82 0 0
179 0 18660 65236 188172 5901524 0 0 0 268 1302 1723 100 0 0 0 0
88 0 18660 61580 188172 5901576 0 0 0 44 1924 2357 99 1 0 0 0
89 0 18660 61332 188172 5901576 0 0 0 64 3122 3488 98 1 1 0 0
92 0 18660 62464 188172 5901584 0 0 0 24 2165 2566 99 1 1 0 0
181 0 18660 53432 188200 5901596 0 0 0 36 1082 1434 100 0 0 0 0
182 0 18660 73676 188200 5901596 0 0 0 112 3667 4159 99 1 1 0 0
183 0 18660 152648 188208 5901588 0 0 0 188 2908 3600 98 1 1 0 0
AWR 部分信息
Top 5 Timed Foreground Events
Event Waits Time(s) Avg wait (ms) % DB time Wait Class
latch: cache buffers chains 26,429 102,652 3884 21.09 Concurrency
DB CPU 57,500 11.81
buffer busy waits 7,910 1,631 206 0.34 Concurrency
log file sync 14,725 160 11 0.03 Commit
library cache: mutex X 83 65 779 0.01 Concurrency
Host CPU (CPUs: 16 Cores: 8 Sockets: 2)
Load Average Begin Load Average End %User %System %WIO %Idle
103.12 110.21 99.5 0.3 0.0 0.2
Instance CPU
%Total CPU %Busy CPU %DB time waiting for CPU (Resource Manager)
99.5 99.7 0.0
开始排查latch: cache buffers chains,通常是hot block issue
cms@PORA40>select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE 11.2.0.3.0 Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
system@PORA40>show parameter optimizer
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
optimizer_capture_sql_plan_baselines boolean FALSE
optimizer_dynamic_sampling integer 2
optimizer_features_enable string 11.2.0.3
optimizer_index_caching integer 0
optimizer_index_cost_adj integer 100
optimizer_mode string ALL_ROWS
optimizer_secure_view_merging boolean TRUE
optimizer_use_invisible_indexes boolean FALSE
optimizer_use_pending_statistics boolean FALSE
optimizer_use_sql_plan_baselines boolean TRUE
sys@PORA40>select addr,gets,misses,sleeps
from v$latch_children
where name='cache buffers chains'
and misses > 100000 order by 3 desc;
ADDR GETS MISSES SLEEPS
---------------- -------------------- -------------------- --------------------
0000000214573E08 125350308 8576425 48949
00000002166A4B88 51526750 2257318 239
00000002142FBCC0 30946314 2121193 125
0000000214362AD8 31811886 2098570 4171
0000000218F10E58 30812696 2065458 134
...
sys@PORA40>select file#,dbablk,state,class from x$bh where hladdr='0000000214573E08';
FILE# DBABLK STATE CLASS
-------------------- -------------------- -------------------- --------------------
18 529 1 4
18 529 3 4
18 529 3 4
18 529 3 4
18 529 3 4
18 529 3 4
18 529 3 4
18 529 3 4
7 7375 1 1
4 211350 1 1
9 31539 1 1
1 100892 1 1
2 112974 1 1
13 rows selected.
Tip:
x$bh.STATE NUMBER
KCBBHFREE 0 buffer free
KCBBHEXLCUR 1 buffer current (and if DFS locked X)
KCBBHSHRCUR 2 buffer current (and if DFS locked S)
KCBBHCR 3 buffer consistant read
KCBBHREADING 4 Being read
KCBBHMRECOVERY 5 media recovery (current & special)
KCBBHIRECOVERY 6 Instance recovery (somewhat special)
x$bh.CLASS NUMBER See Note 33434.1
1,’data block’,
2,’sort block’,
3,’save undo block’,
4,’segment header’,
5,’save undo header’,
6,’free list’,
7,’extent map’,
8,’1st level bmb’,
9,’2nd level bmb’,
10,’3rd level bmb’,
11,’bitmap block’,
12,’bitmap index block’,
13,’file header block’,
14,’unused’,
15,’system undo header’,
16,’system undo block’,
17,’undo header’,
18,’undo block’
sys@PORA40>select owner,segment_name
2
from dba_extents
where file_id=&p1
and &p2 between
5
block_id and block_id + blocks -1;
Enter value for p1: 18
old 3: where file_id=&p1
new 3: where file_id=18
Enter value for p2: 529
old 4: and &p2 between
new 4: and 529 between
OWNER
SEGMENT_NAME
------------------------------ --------------------------------
CMS
CMS_ENTITY
确认了在对象CMS.CMS_ENTITY上的一致读非常高。看TOP SQL中也与些对象相关,每次执行估算是60000 s。查看该SQL当前环境的执行计划。
cms@PORA40>explain plan for SELECT *
2 FROM (
SELECT /*+FIRST_ROWS*/
3 e.entity_desc, e.entity_url,
n.news_content
4 FROM cms_entity e, cms_news n
5 WHERE n.entity_id = e.entity_id
6 AND e.ENTITY_DESC LIKE '%%'
7 AND e.ENTITY_CATEGORY LIKE
'%310003617%'
8 ORDER BY e.entity_id DESC)
9
WHERE ROWNUM <= 100;
Explained.
cms@PORA40>select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------
Plan hash value: 3482088815
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows
| Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | |
100 | 254K| 47M
(1)|157:40:34 |
|* 1 | COUNT STOPKEY | | |
| | |
| 2 | VIEW
| |
195 | 496K| 47M
(1)|157:40:34 |
| 3 | NESTED LOOPS | |
195 | 42900 | 47M (1)|157:40:34 |
| 4 | TABLE ACCESS BY INDEX ROWID| CMS_NEWS | 69599 |
6388K| 49866 (1)| 00:09:59 |
| 5 | INDEX FULL SCAN DESCENDING| PK_CMS_NEWS |
69599 | | 166
(1)| 00:00:02 |
|* 6 | TABLE ACCESS FULL | CMS_ENTITY | 1
| 126 | 679
(1)| 00:00:09 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=100)
6 -
filter("E"."ENTITY_DESC" LIKE '%%' AND
"E"."ENTITY_CATEGORY" LIKE
'%310003617%' AND
"E"."ENTITY_DESC" IS NOT NULL AND
"E"."ENTITY_CATEGORY" IS NOT NULL
AND
"N"."ENTITY_ID"="E"."ENTITY_ID")
把CBO optimizer 的参数改回升级前版本
cms@PORA40>select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------
Plan hash value: 3142591826
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows
| Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | |
100 | 254K| 826
(1)| 00:00:10 |
|* 1 | COUNT STOPKEY | | |
| | |
| 2 | VIEW | |
195 | 496K| 826
(1)| 00:00:10 |
|* 3 | SORT ORDER BY STOPKEY | |
195 | 42900 | 826 (1)| 00:00:10 |
| 4 | NESTED LOOPS | |
195 | 42900 | 825 (1)| 00:00:10 |
|* 5 | TABLE ACCESS FULL | CMS_ENTITY | 188
| 23688 | 680 (1)| 00:00:09 |
| 6 | TABLE ACCESS BY INDEX ROWID|
CMS_NEWS | 1 |
94 | 1 (0)| 00:00:01 |
|* 7 | INDEX UNIQUE SCAN | PK_CMS_NEWS | 1 |
| 0 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=100)
3 - filter(ROWNUM<=100)
5 -
filter("E"."ENTITY_DESC" LIKE '%%' AND "E"."ENTITY_CATEGORY"
LIKE '%310003617%')
7 -
access("N"."ENTITY_ID"="E"."ENTITY_ID")
果然执行计划发生了改变。在NL join 中增加了SORT ORDER BY
STOPKEY ,并发现两次关连CMS_ENTITY表都是FTS(full table
scan),些表有8W+记录,居然没任何索引。
cms@PORA40>select * from user_ind_columns where table_name='CMS_ENTITY';
none any index
cms@PORA40>alter table CMS_ENTITY add constraint pk_CMS_ENTITY primary
key (entity_id);
Table altered.
建完索引后看两个版本的参数的执行计划
alter session set optimizer_features_enable='10.2.0.4';
cms@PORA40>explain plan for ...
Explained.
cms@PORA40>select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------
Plan hash value: 819915907
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100 |
254K| 24889 (1)| 00:04:59 |
|* 1 | COUNT STOPKEY | | |
| | |
| 2 | VIEW | | 195 |
496K| 24889 (1)| 00:04:59 |
| 3 | NESTED LOOPS | | 195 | 42900 | 24889 (1)| 00:04:59 |
|* 4 | TABLE ACCESS BY INDEX ROWID|
CMS_ENTITY | 188 | 23688 | 24743 (1)| 00:04:57 |
| 5 | INDEX FULL SCAN DESCENDING| PK_CMS_ENTITY
| 88523 | | 212
(1)| 00:00:03 |
| 6 | TABLE ACCESS BY INDEX ROWID| CMS_NEWS |
1 | 94 | 1
(0)| 00:00:01 |
|* 7 | INDEX UNIQUE SCAN | PK_CMS_NEWS |
1 | | 0
(0)| 00:00:01 |
------------------------------------------------------------------------------------------------
cms@PORA40>alter session set optimizer_features_enable='11.2.0.3';
Session altered.
cms@PORA40>explain plan for ...
cms@PORA40>select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------
Plan hash value: 1963794189
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100 |
254K| 24889 (1)| 00:04:59 |
|* 1 | COUNT STOPKEY | | |
| | |
| 2 | VIEW | | 195 |
496K| 24889 (1)| 00:04:59 |
| 3 | NESTED LOOPS | | |
| | |
| 4 | NESTED LOOPS | | 195 | 42900 | 24889 (1)| 00:04:59 |
|* 5 | TABLE ACCESS BY INDEX ROWID|
CMS_ENTITY | 188 | 23688 | 24743 (1)| 00:04:57 |
| 6 | INDEX FULL SCAN DESCENDING|
PK_CMS_ENTITY | 88523 | | 212
(1)| 00:00:03 |
|* 7 | INDEX UNIQUE SCAN | PK_CMS_NEWS |
1 | | 0
(0)| 00:00:01 |
| 8 | TABLE ACCESS BY INDEX ROWID |
CMS_NEWS | 1 |
94 | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
显然又都有了新改变。最主要的是11g现在变的可以接受了。10G 的变的没以前好了。因为这台机器上有一堆的小项目,调optimizer_features_enable是不想要的。那能不能让11g也用上10g刚才的执行计划呢? 去掉hint 试试
cms@PORA40>SELECT *
FROM ( SELECT
e.entity_desc,
e.entity_url, n.news_content
FROM cms_entity e,
cms_news n
WHERE n.entity_id = e.entity_id
AND e.ENTITY_DESC
LIKE '%%'
AND
e.ENTITY_CATEGORY LIKE '%310003617%'
ORDER BY e.entity_id DESC)
WHERE ROWNUM <= 100
Execution Plan
----------------------------------------------------------
Plan hash value: 2559076494
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows
| Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | |
100 | 254K| 826
(1)| 00:00:10 |
|* 1 | COUNT STOPKEY | | |
| | |
| 2 | VIEW | |
195 | 496K| 826
(1)| 00:00:10 |
|* 3 | SORT ORDER BY STOPKEY | |
195 | 42900 | 826 (1)| 00:00:10 |
| 4 | NESTED LOOPS | | |
| | |
| 5 | NESTED LOOPS | |
195 | 42900 | 825 (1)| 00:00:10 |
|* 6 | TABLE ACCESS FULL | CMS_ENTITY | 188
| 23688 | 680 (1)| 00:00:09 |
|* 7 | INDEX UNIQUE SCAN | PK_CMS_NEWS | 1 |
| 0 (0)| 00:00:01 |
| 8 | TABLE ACCESS BY INDEX ROWID|
CMS_NEWS | 1 |
94 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
可以看到这个执行计划相对刚才的更好些,随然sql
profile,outlines, baselines可以固定执行计划,可惜的是这些sql全是字面常量每次也都是hard parse, 也不希望改cursor_sharing ,看到负载降到了2,最主要的是应用程序早就N年没人接了,所以程序不能动,也就先调到上一步。如果有个参数可以忽略指定的hint就好了。有个隐藏参数_optimizer_ignore_hints
在解析时会忽略所有hint(不含递归SQL),可以在instance,session,sql
级指定。
cms@PORA40>alter session set "_optimizer_ignore_hints"=true;
Session altered.
cms@PORA40>explain plan for SELECT *
2 FROM (
SELECT /*+FIRST_ROWS*/
3 e.entity_desc, e.entity_url,
n.news_content
4 FROM cms_entity e, cms_news n
5 WHERE n.entity_id = e.entity_id
6 AND e.ENTITY_DESC LIKE '%%'
7 AND e.ENTITY_CATEGORY LIKE
'%310003617%'
8 ORDER BY e.entity_id DESC)
9 WHERE ROWNUM <= 100;
Explained.
cms@PORA40>select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 2559076494
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows
| Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | |
100 | 254K| 826
(1)| 00:00:10 |
|* 1 | COUNT STOPKEY | | |
| | |
| 2 | VIEW | |
195 | 496K| 826
(1)| 00:00:10 |
|* 3 | SORT ORDER BY STOPKEY | |
195 | 42900 | 826 (1)| 00:00:10 |
| 4 | NESTED LOOPS | | |
| | |
| 5 | NESTED LOOPS | |
195 | 42900 | 825 (1)| 00:00:10 |
|* 6 | TABLE ACCESS FULL | CMS_ENTITY | 188
| 23688 | 680 (1)| 00:00:09 |
|* 7 | INDEX UNIQUE SCAN | PK_CMS_NEWS | 1 |
| 0 (0)| 00:00:01 |
| 8 | TABLE ACCESS BY INDEX ROWID|
CMS_NEWS | 1 |
94 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Summary:
当出现了CBC latch,通常是因为热块引起,可以参考hot block tunning的文章,检查对象上是否产生了无用的一致读。
升级前有必要先测试
程序代码中不是迫不得已不要使用hint
_optimizer_ignore_hints 隐藏参数可以忽略sql hint,但一定要注意测试,会有可能导致sql profile,baselines,outlines 失效,比如awr ,或系统view sql中的hint 被ignore, 降低查询。
案例:latch: cache buffers chains event tuning的更多相关文章
- 【转载】latch: cache buffers chains
本文转自惜分飞的博客,博客原文地址:www.xifenfei.com/1109.html,支持原创,分享知识! 当一个数据块读入sga区,相应的buffer header会被放置到hash列表上,我们 ...
- latch: cache buffers chains故障处理总结(转载)
一大早就接到开发商的电话,说数据库的CPU使用率为100%,应用相应迟缓.急匆匆的赶到现场发现进行了基本的检查后发现是latch: cache buffers chains 作祟,处理过程还算顺利,当 ...
- latch: cache buffers chains故障处理总结
一大早就接到开发商的电话,说数据库的CPU使用率为100%,应用相应迟缓.急匆匆的赶到现场发现进行了基本的检查后发现是latch: cache buffers chains 作祟,处理过程还算顺利,当 ...
- [转帖]深入理解latch: cache buffers chains
深入理解latch: cache buffers chains http://blog.itpub.net/12679300/viewspace-1244578/ 原创 Oracle 作者:wzq60 ...
- Oracle索引失效问题:WHERE C1='' OR C2 IN(SubQuery),并发请求时出现大量latch: cache buffers chains等待
问题描述: 项目反馈某功能响应时间很长,高峰期时系统整体响应很慢... 获取相应的AWR,问题确实比较严重,latch: cache buffers chains等待,因为这些会话SQL执行时间太长, ...
- 关于latch: cache buffers chains的sql优化
前段时间,优化了一些耗buffer比较多的sql,但是CPU使用率还是没下来 . 查看操作系统CPU使用率 查看awr,发现又有一条超级耗性能的sql冒出来了. 该SQL每次执行耗费3e多个buffe ...
- 又是latch: cache buffers chains惹得祸
前言 一大早,客户给我打电话说: xx,应用很慢,查询数据总是超时,让我看看... 根据多年DBA经验,首当其冲的肯定是去查询数据库在这段时间都在干嘛. 分析 导出awr报告分析 1). 数据库在此时 ...
- latch:cache buffers chains的优化思路
数据块在buffer cache存放是以linked list方式存放的.当一个session想要访问/修改buffer cache的block,首先需要通过hash算法检查该block是否存在于bu ...
- 低效的SQL引发的cache buffers chains latch
1.低效的SQL 低效的SQL语句时发生cache buffers chains 锁存器争用的最重要原因.多个进程同时扫描大范围的索引或表时,可能广泛 地发生cache buffers chains ...
随机推荐
- 【Android 复习】:第01期:引导界面(一)ViewPager介绍和使用详解
一.ViewPager实现的效果图 二.ViewPager实现的功能 看到上面的效果图,想必大家已经猜出了这个类是干吗用的了,ViewPager类提供了多界面切换的新效果, 新效果有如下特征: < ...
- Delphi TdxBarDockControl 用法
1.放个TdxBarManager在窗体上2.放个TdxBarDockControl在panel上,把它的BarManager属性设置为dxBarManager13.双击dxBarManager1,新 ...
- vijosP1629 八
vijosP1629 八 链接:https://vijos.org/p/1629 [思路] 暴力容斥(看他们都这么叫=_=)+精度选择. 总体思路是先统计LR区间内满足是8倍数的数目ans,再从ans ...
- Microsoft Dynamics CRM 2011 相关-摘自网络
Microsoft Dynamics CRM Server 2011硬件需求: 组件 *最低要求 *推荐配置 处理器 x64 体系结构或兼容的双核 1.5 GHz 处理器 四核 x64 体系结构 2 ...
- 宁波Uber优步司机奖励政策(2月1日~2月7日)
滴快车单单2.5倍,注册地址:http://www.udache.com/ 如何注册Uber司机(全国版最新最详细注册流程)/月入2万/不用抢单:http://www.cnblogs.com/mfry ...
- Shell简介:什么是Shell,Shell命令的两种执行方式
Shell本身是一个用C语言编写的程序,它是用户使用Unix/Linux的桥梁,用户的大部分工作都是通过Shell完成的.Shell既是一种命令语言,又是一种程序设计语言.作为命令语言,它交互式地解释 ...
- storm的acker机制理解
转载请注明原创地址http://www.cnblogs.com/dongxiao-yang/p/6142356.html Storm 的拓扑有一些特殊的称为"acker"的任务,这 ...
- jquery 1.9里面已经删除了toggle(fn1, fn2)函数
转自:http://blog.sina.com.cn/s/blog_50042fab0101c7a9.html jquery 1.9里面已经删除了toggle(fn1, fn2)函数: 引用 Note ...
- Android Studio 环境配置优化
一.插件 .ignore: 版本控制忽略文件高亮和补齐ADB Idea: ctrl + Shift + A 查找中添加常用卸载安装app的一些操作,无需命令行Android ButterKnife Z ...
- 大型Web应用运行时 PHP负载均衡指南
如今,“大型服务器”模式的时代已经过去,我们在运行一些大的Web应用时候,可以使用各种各样的负载均衡技术,这是一种更可行的方法,将使硬件成本降至最低. 过去当运行一个大的web应用时候意味着需要运行一 ...