log file sync 等侍值高的一般通用解决办法

log file sync等待时间发生在redo log从log buffer写入到log file期间。

下面对log file sync做个详细的解释。

何时发生日志写入：

1.commit或者rollback

2.每3秒

3.log buffer 1/3满或者已经有1M的redo数据。

更精确的解释：_LOG_IO_SIZE 大小默认是LOG_BUFFER的1/3,当log buffer中redo数据达到_LOG_IO_SIZE 大小时，发生日志写入。

4.DBWR写之前

_log_io_size隐含参数：

LOG_BUFFER（bytes）写入的数量超过_LOG_IO_SIZE会触发lgwr写日志的条件,缺省值为LOG BUFFER的1/3或1M。

但是这个说法通过查询并不能验证，隐含参数尽量不要修改。

col name for a25

col VALUE for a20

col DESCRIB for a50

SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ

FROM SYS.x$ksppi x, SYS.x$ksppcv y

WHERE x.inst_id = USERENV ('Instance')

AND y.inst_id = USERENV ('Instance')

AND x.indx = y.indx

AND x.ksppinm LIKE '_log_io_size';

NAME VALUE DESCRIB

------------------------- -------------------- --------------------------------------------------

_log_io_size 0 automatically initiate log write if this many redo

blocks in buffer

log file sync发生的过程：

此等待事件用户发出提交或回滚声明后,等待提交完成的事件,提交命令会去做日志同步,也就是写日志缓存到日志文件，在提交命令未完成前,用户将会看见此等待事件.

注意,它专指因提交,回滚而造成的写缓存到日志文件的等待.当发生此等待事件时,有时也会伴随log file parallel write.因为此等待事件将会写日志缓存，如果日志的I/O系统较为缓慢的话,

这必将造成log file parallel write 等待.当发生log file sync等待后,判断是否由于缓慢的日志I/O造成的,可以查看两个等待事件的等待时间,如果比较接近,就证明日志I/O比较缓慢或重做日志过多,这时,造成log file sync的原因是因为log file parallel write,可以参考解决log file parallel write的方法解决问题,

**如果log file sync的等待时间很高,而log file parallel write的等待时间并不高,这意味着log file sync的原因并不是缓慢的日志I/O,而是应用程序过多的提交造成。

当log file sync的等待时间和 log file parallel write等待时间基本相同，说明是IO问题造成的log file sync等待事件。

-----

更好理解的解释：

回顾一下单机数据库中的'log file sync' 等待事件，当user session 提交（commit）时，user session会通知LGWR进程将redo buffer中的信息写入到redo log file，当LGWR进程完成写操作后，LGWR再post（通知）user session 写操作已经完成，user session 接收到LGWR的通知后提交操作才完成。因此user session 在没有收到LGWR post（通知）之前一致处于等待状态，具体的等待事件为'log file sync'。

-----

引起log file sync的原因：

1.频繁提交或者rollback,检查应用是否有过多的短小的事务，如果有，可以使用批处理来缓解。

2.OS的IO缓慢：解决办法是将日志文件放裸设备上或绑定在RAID 0或RAID 1+0中，而不是绑定在RAID 5中。

3.过大的日志缓冲区（log_buffer ）

过大的log_buffer,允许LGWR变得懒惰，因为log buffer中的数据量无法达不到_LOG_IO_SIZE，导致更多的重做条目堆积在日志缓冲区中。

当事务提交或者3s醒来时，LGWR才会把所有数据都写入到redo log file中。

由于数据很多，LGWR要用更多时间等待redo写完毕。

这种情况，可以调小参数_LOG_IO_SIZE参数，其默认值是LOG_BUFFER的1/3或1MB，取两者之中较小的值。

换句话说，你可以具有较大的日志缓冲区，但较小的_LOG_IO_SIZE将增加后台写入次数，从而减少log file sync的等待时间。

4.CPU负载高。详见下面的描述。

5.RAC私有网络性能差，导致LMS同步commit SCN慢。

如何诊断log file sync：

1.AWR：发生log file sync时，先做个snapshot，然后做AWR，AWR时间选择在10-30分钟。

已发生的log file sync，那么通过AWR依然可以分析，也要保持在10-30分钟。

2.Lgwr trace file（10.2.0.4开始），大于500ms会写入

trace文件中如果有Warning: log write time 1000ms, size 2KB，很有可能IO慢。

3.分析CPU资源使用情况的工具，CPU过于繁忙，lgwr无法及时获取CPU调度，出现log file sync。

vmstat，关注r是否大于CPU核数，大于说明cpu繁忙。

OSW:OSWatcher,同上。

4.Alert：确认log file 15到20分钟切换一次

5.Script to Collect Log File Sync Diagnostic Information (lfsdiag.sql) [Document 1064487.1]

解决办法：

1.如果确实是因为频繁提交造成的log file sync,那么减少commit。

2.如果确实是因为io引起的，那么解决办法是将日志文件放裸设备上或绑定在RAID 1+0中，而不是放在在RAID 5中（切记，redo log file一定不要放在SSD上！！！）。

3.确保CPU资源充足。CPU资源不足，LGWR通知user session后，user session无法及时获得CPU调度，不能正常工作。

4.是否有些表可以使用nologging，会减少redo产生量

5.检查redo log file足够大，确保redo log file每15到20分钟切换一次。

更深入分析log file sync：

如果上面的分析没有解决log file sync等待事件，那么需要做下面的分析。

The log file sync wait may be broken down into the following components:

log file sync 能拆解为一下步骤：

1. Wakeup LGWR if idle 1.唤醒LGWR进程

2. LGWR gathers the redo to be written and issue the I/O 2.LGWR进程收集redo，然后发给I/O

3. Time for the log write I/O to complete 3.等待log写入I/O完成

4. LGWR I/O post processing 4.LGWR I/O post processing

5. LGWR posting the foreground/user session that the write has completed 5.LGWR通知前台/用户回话，redo写入完成

6. Foreground/user session wakeup 6.前台/用户会话唤醒

Steps 2 and 3 are accumulated in the "redo write time" statistic. (i.e. as found under STATISICS section of Statspack and AWR)

步骤2和3消耗的时间在AWR中的"redo write time"中有所体现。(AWR中 Instance Activity Stats )

Step 3 is the "log file parallel write" wait event. (Document:34583.1 "log file parallel write" Reference Note)

步骤3产生"log file parallel write"等待事件。

另外：如果是最大保护模式的DATAGUARD(SYNC传输),这一步骤还包含网络写、RFS/redo写入到备库的standby log file sync的时间。

Steps 5 and 6 may become very significant as the system load increases. This is because even after the foreground has been posted it may take a some time for the OS to schedule it to run. May require monitoring from O/S level.

在系统负载高时（尤其是CPU高的情况，看vmstat r值），步骤5和6会变得非常明显。因为，前台收到LGWR写入完成的通知后，操作系统需要消耗一些时间调度Foreground/user session进程唤醒（也就是CPU调度）。需要系统级别监控。

几个技术指标：

log file sync 等待时间小于20ms算正常

log file parallel write 等待时间小于20ms算正常

log file parallel wirte 和log file sync等待时间很接近，说明就是IO问题，因为大部分时间都花在了log写入到磁盘上。

相关脚本：

--等待时间平均等待时间

select EVENT,TOTAL_WAITS,TOTAL_TIMEOUTS,TIME_WAITED,AVERAGE_WAIT
from   v$system_event
where  event in ('log file sync','log file parallel write');
select value from v$parameter where name = 'log_buffer';

---------------新特性：log file sync 两种方式--------------

Adaptive Log File Sync

Adaptive Log File sync was introduced in 11.2. The parameter controlling this feature, _use_adaptive_log_file_sync, is set to false by default in 11.2.0.1 and 11.2.0.2.

_use_adaptive_log_file_sync参数在11gR2提出。11.2.0.1和11.2.0.2两个版本该参数默认是false。

从11.2.0.3开始，这个参数默认值是true，也就是开始启用“自适应日志同步机制”。

11.2.0.1和11.2.0.2也可以开启改参数

ALTER SYSTEM SET "_use_adaptive_log_file_sync"= scope=;

开启改参数后，日志同步机制会在2种方式中切换。

该参数决定了，foreground/user session 和LGWR进程通过什么方式获知commit操作已完成（也就是redo写log file完成）。

Post/wait, traditional method for posting completion of writes to redo log

传统方式，在11.2.0.3之前，user session等待LGWR通知redo写入到log file完毕，被动方式。

优点：post/wait方式，user session几乎能立即发现redo已刷到磁盘。

Polling, a new method where the foreground process checks if the LGWR has completed the write.

新方式，主动监测LGWR是否完成写入，主动方式。这种方式比Post/wait方式响应速度慢，但是可以节约CPU资源。

优点：当commit完成后，LGWR会把commit完成的消息通知给很多user session，这个过程消耗大量CPU。

Polling方式采用朱勇监测LGWR释放写入redo完成，所以释放了LGWR占用的CPU资源。

系统负载高（CPU繁忙）采用Polling方式更好。

系统负载低（CPU清闲）采用post/wait方式更好，它能够提供比polling方式更好的响应时间。

ORACLE根据内部统计信息决定采用何种方式。post/wait和polling方式互相切换能引起过热，为了确保安全，切换不要太频繁。

LGWR的trace文件记录了switch记录，关键字是 "Log file sync switching to ...":

Switch to polling:

*** 2015-01-21 08:19:04.077
kcrfw_update_adaptive_sync_mode: post->poll long#=2 sync#=5 sync=62 poll=1056 rw=454 ack=0 min_sleep=1056
*** 2015-01-21 08:19:04.077
Log file sync switching to polling
Current scheduling delay is 1 usec
Current approximate redo synch write rate is 1 per sec
kcrfw_update_adaptive_sync_mode: poll->post current_sched_delay=0 switch_sched_delay=1 current_sync_count_delta=1 switch_sync_count_delta=5

Switch to post/wait:

*** 2015-01-21 08:46:09.428
Log file sync switching to post/wait
Current approximate redo synch write rate is 0 per sec
*** 2015-01-21 08:47:46.473
kcrfw_update_adaptive_sync_mode: post->poll long#=2 sync#=11 sync=228 poll=1442 rw=721 ack=0 min_sleep=1056

相关脚本：

查询当前log file sync 方式是post-wait还是poll

SQL> select name,value from v$sysstat where name in ('redo sync poll writes','redo synch polls');
NAME                                                                  VALUE
---------------------------------------------------------------- ----------
redo synch polls                                                  325355850

每小时采用poll log file sync方式的次数

col begin_interval_time format a25
col instance_number format 99 heading INST
col stat_name format a25
select snap.BEGIN_INTERVAL_TIME,hist.instance_number , hist.stat_name,hist.redo_synch_polls
from ( select snap_id,instance_number,stat_name,value -lag(value,1,null) over ( order by snap_id,instance_number,stat_name) redo_synch_polls
from dba_hist_sysstat
where stat_name='redo synch polls'
and dbid=(select dbid from v$database)
and instance_number = nvl('&instance_number',1)) hist,
dba_hist_snapshot snap
where redo_synch_polls >0
and hist.snap_id=snap.snap_id
and hist.instance_number=snap.instance_number
order by 1,2
/
BEGIN_INTERVAL_TIME       INST STAT_NAME                 REDO_SYNCH_POLLS
------------------------- ---- ------------------------- ----------------
06-JAN-15 07.00.02.884 AM    2 redo synch polls                       734
06-JAN-15 08.00.08.425 AM    2 redo synch polls                     23767
06-JAN-15 09.00.13.770 AM    2 redo synch polls                     39827
06-JAN-15 10.00.19.233 AM    2 redo synch polls                     48479
06-JAN-15 11.00.24.431 AM    2 redo synch polls                     41541
06-JAN-15 12.00.29.670 PM    2 redo synch polls                     47566
06-JAN-15 01.00.35.029 PM    2 redo synch polls                     32169
06-JAN-15 02.00.04.159 PM    2 redo synch polls                     37405
06-JAN-15 02.59.04.536 PM    2 redo synch polls                     41469
06-JAN-15 04.00.08.556 PM    2 redo synch polls                     38683
06-JAN-15 05.00.12.523 PM    2 redo synch polls                     51618
06-JAN-15 06.00.16.584 PM    2 redo synch polls                     52511
06-JAN-15 07.00.03.352 PM    2 redo synch polls                     42229
06-JAN-15 08.00.08.663 PM    2 redo synch polls                     35229
06-JAN-15 09.00.13.882 PM    2 redo synch polls                     18499

log file sync 等侍值高的一般通用解决办法的更多相关文章

log file sync 事件（转）
log file sync log file sync等待时间发生在redo log从log buffer写入到log file期间. 下面对log file sync做个详细的解释. 何时发 ...
RAC 性能分析 - 'log file sync' 等待事件
简介本文主要讨论 RAC 数据库中的'log file sync' 等待事件.RAC 数据库中的'log file sync' 等待事件要比单机数据库中的'log file sync' 等待事件复杂 ...
log file sync 因为数据线有问题而造成高等侍的表现
这是3月份某客户的情况,原因是服务器硬件故障后进行更换之后,业务翻译偶尔出现提交缓慢的情况.我们先来看下awr的情况. 我们可以看到,该系统的load profile信息其实并不高,每秒才21个tra ...
log file sync等待超高一例
这是3月份某客户的情况,原因是server硬件故障后进行更换之后,业务翻译偶尔出现提交缓慢的情况.我们先来看下awr的情况. 我们能够看到,该系统的load profile信息事实上并不高,每秒才21 ...
完全揭秘log file sync等待事件-转自itpub
原贴地址:http://www.itpub.net/thread-1777234-1-1.html 谢谢 guoyJoe 老大这里先引用一下tanel poder大师的图: 什么是log fil ...
oracle之等待事件LOG FILE SYNC （awr）优化
log file sycn是ORACLE里最普遍的等待事件之一,一般log file sycn的等待时间都非常短 1-5ms,不会有什么问题,但是一旦出问题,往往都比较难解决.什么时候会产生log f ...
Oracle之等待事件log file sync + log file parallel write (awr优化)
这是3月份某客户的情况,原因是server硬件故障后进行更换之后,业务翻译偶尔出现提交缓慢的情况.我们先来看下awr的情况. 我们能够看到,该系统的load profile信息事实上并不高,每秒才21 ...
理解LGWR,Log File Sync Waits以及Commit的性能问题[转]
理解LGWR,Log File Sync Waits以及Commit的性能问题一．概要: 1. Commit和log filesync的工作机制 2. 为什么log file wait太久 3. ...
log file sync等待超高案例浅析
监控工具DPA发现海外一台Oracle数据库服务器DB Commit Time指标告警,超过红色告警线(40毫秒左右,黄色告警是10毫秒,红色告警线是20毫秒),如下截图所示,生成了对应的时段的AWR ...

随机推荐

hermite矩阵
在读线代书.因为之前并没有上过线性代数的课.所以决定把基础打牢牢. 读书的时候当然会出现不懂的概念和术语或者定理什么的.所以在这记录一下啦--- hermit矩阵要理解它好像先要知道什么是共轭(con ...
Cmder Windows 下的终端神器
废话 Windows 下常用的终端有两个,古老的 cmd 和功能强大但你记不住函数的 PowerShell ,两者我都用过一段时间,给我的提体验是功能够用,界面丑陋,虽然 win10 下可以通过调整背 ...
【转载自netfocus博客】聚合（根）、实体、值对象精炼思考总结
1.内容摘要最近在看DDD领域驱动设计,看到实体(Entity),值对象 (Value Object),以及聚合根(Aggregate Root) 时.对他们的关系有些模糊,不清楚.于是去找了找资料 ...
[Codeforces 863D]Yet Another Array Queries Problem
Description You are given an array a of size n, and q queries to it. There are queries of two types: ...
[SCOI2008]天平
题目描述你有n个砝码,均为1克,2克或者3克.你并不清楚每个砝码的重量,但你知道其中一些砝码重量的大小关系.你把其中两个砝码A 和B 放在天平的左边,需要另外选出两个砝码放在天平的右边.问:有多少种 ...
零开始：NetCore项目权限管理系统：基础框架搭建
有兴趣的同学可以一起做喜欢NetCore的朋友,欢迎加群QQ:86594082 源码地址:https://github.com/feiyit/SoaProJect 新建一个空的解决方案,建立对应的解 ...
sharepoint环境安装
SharePoint 2013 测试环境安装配置指南软件版本 Windows Server 2012 标准版 SQL Server 2012 标准版 SharePoint Server 2013 企 ...
Just for mysql
mysql的下载与安装由于学校开设了数据库专业,并且最近准备在做一个web端的设计,虽然本人是负责前端(当然,前端技术也很LOW),但因种种原因,准备开始学习数据库相关的知识,以mysql为例. 昨 ...
java获取当前系统时间
阿里巴巴推荐 Timestamp d = new Timestamp(System.currentTimeMillis()); 唯一的好处就是除了Timestamp,没有再新建什么了
线性表 linear_list 顺序存储结构
可以把线性表看作一串珠子序列:指其中的元素是有序的注意last和length变量的内在关系注意:将元素所占的空间和表长合并为C语言的一个结构类型静态分配的方式,分配给一个固定大小的存储空间之后 ...

log file sync 等侍值高的一般通用解决办法

log file sync 等侍值高的一般通用解决办法的更多相关文章

随机推荐

热门专题