关于innodb_flush_log_at_trx_commit、innodb_flush_method、innodb_log_block_size和fsync()、O

想着整理关于innodb_flush_log_at_trx_commit、innodb_flush_method、innodb_log_block_size和fsync()、O_DIRECT、iops的关系，纯属这两天处理http://www.cnblogs.com/zhjh256/p/6519032.html帖子的问题，顺便整理下，以备下次不用各处重新汇总。

innodb_flush_log_at_trx_commit用于控制commit的时候log_buffer同步到磁盘的行为，跟oracle中commit_logging、commit_wait参数的作用大同小异吧。不管如何最后都会触发写入disk的行为。这就涉及到innodb_log_block_size和innodb_flush_method了，innodb_log_block_size用于设置innodb log的逻辑大小，默认512，SSD或者ext4下可以选择4096（主要是ext4 fs bs改成4096的原因），具体差别到底多大，没特别仔细测试，反正往上各种答案都有，不过大体差别在15%以内。innodb_flush_method用于InnoDB 控制刷新数据文件和日志文件的行为，根据是否SSD、RAID、SAN或者云盘，不同的配置是会影响IO吞吐量的。取值如下：

fdatasync（默认，oracle mysql默认为fsync）: use fsync() to flush both the data and log files.
O_SYNC: use O_SYNC to open and flush the log files; use fsync() to flush the data files.
O_DIRECT: use O_DIRECT to open the data files and fsync() system call to flush both the data and log files.
O_DIRECT_NO_FSYNC: use O_DIRECT to open the data files but don’t use fsync() system call to flush both the data and log files. This option isn’t suitable for XFS file system.
ALL_O_DIRECT: use O_DIRECT to open both data and log files, and use fsync() to flush the data files but not the log files. This option is recommended when InnoDB log files are big (more than 8GB), otherwise there might be even a performance degradation. Note: When using this option on ext4 filesystem variable innodb_log_block_size should be set to 4096 (default log-block-size in ext4) in order to avoid the unaligned AIO/DIO warnings.

How each setting affects performance depends on hardware configuration and workload. Benchmark your particular configuration to decide which setting to use, or whether to keep the default setting. Examine the Innodb_data_fsyncs status variable to see the overall number of fsync() calls for each setting. The mix of read and write operations in your workload can affect how a setting performs. For example, on a system with a hardware RAID controller and battery-backed write cache, O_DIRECT can help to avoid double buffering between the InnoDB buffer pool and the operating system file system cache. On some systems where InnoDB data and log files are located on a SAN, the default value or O_DSYNC might be faster for a read-heavy workload with mostly SELECT statements. Always test this parameter with hardware and workload that reflect your production environment. For general I/O tuning advice, see Section 8.5.8, “Optimizing InnoDB Disk I/O”.

再看下fsync()/fdatasync()，它是linux提供的系统调用。

fsync() transfers ("flushes") all modified in-core data of (i.e.,
       modified buffer cache pages for) the file referred to by the file
       descriptor fd to the disk device (or other permanent storage device)
       so that all changed information can be retrieved even after the
       system crashed or was rebooted.  This includes writing through or
       flushing a disk cache if present.  The call blocks until the device
       reports that the transfer has completed.  It also flushes metadata
       information associated with the file。
没什么特别需要解释的，主要是刷新更改数据的缓存到磁盘或者磁盘驱动器自带的缓存。
 
最后的重点是O_DIRECT和O_SYNC选项。这些选项主要是系统调用open()(http://man7.org/linux/man-pages/man2/open.2.html)时使用的选项。

O_DIRECT：读写文件时尽量最小化的使用缓存，绕过操作系统的缓存，常用于具有自己缓存的应用比如数据库，所以特别适用于具有自身缓存的存储。

O_SYNC：write()+fsync()。
所以，应该来说对于SAN/云盘来讲，可能还是ALL_O_DIRECT会合适些。
 
iops，就非DSS系统来说，主要还是关注4k/8/16k为主的顺序写为主，随机io意义不大，因为绝大部分数据都cache在sga/buffer pool中，所以主要还是redo的写入占据绝大部分，checkpoint占据小部分random io。该值要依赖于innodb_flush_log_at_trx_commit、innodb_flush_method、innodb_log_block_size（因为mysql的innodb_log和binlog是分开的，所以还涉及到sync_binlog的值），不同的组合会有不同的tps，因此最差情况下可能是低于dd/iometer/orion测试出来iops的1/2。
除此之外，还有一个无法控制的因素，即使innodb_flush_log_at_trx_commit=1，innodb_flush_method=O_DIRECT，也不意味着一次commit就是一次fsync，因为os可以决定多个同时commit合并到一个fsync，所以这一点而言，只能是极端情况下每commit/fsync，当然按照这么估计也是合理的。

具体多少范围合理，一定程度上还是依赖于经验。
最后，就要说到在云盘了，云盘技术上来说就是SAN的机制，所以对于写入很频繁、但是有没有采用SSD盘的数据库就很尴尬了，因为系统盘只有40G，剩余可用估计也就20G左右，此时如果innodb_flush_log_at_trx_commit=1，如何同时保证不丢失数据、满足性能要求，然后就呵呵考验DBA和架构师的水平了。
 
PS: 网上找了张对比图，目前常用的linux iops测试工具有如下：

从上面可以发现，其中少了fio，主要是fio会损坏文件系统，所以一般不推荐使用。

iometer使用可参考：http://www.linuxidc.com/Linux/2010-09/28918.htm

关于innodb_flush_log_at_trx_commit、innodb_flush_method、innodb_log_block_size和fsync()、O_DIRECT、iops、云盘的关系与总结的更多相关文章

全球首个百万IOPS云盘即将商业化阿里云推出超高性能云盘ESSD
近日,在经过近半年的上线公测后,阿里云全球首个跨入IOPS百万时代的云盘——ESSD即将迎来商业化,单盘IOPS高达100万,这是阿里云迄今为止性能最强的企业级块存储服务. 搭配ECS云服务器使用, ...
read/write/fsync与fread/fwrite/fflush的关系和区别
read/write/fsync: 1. linux底层操作: 2. 内核调用, 涉及到进程上下文的切换,即用户态到核心态的转换,这是个比较消耗性能的操作. fread/fwrite/fflush: ...
InnoDB O_DIRECT选项漫谈（一）【转】
本文来自:http://insidemysql.blog.163.com/blog/static/2028340422013671186977/ 最近和文件系统内核开发人员做技术交流,对O_DIR ...
sync fsync fdatasync ---systemtap跟踪
aa.stp: probe kernel .function ( "sys_sync" ) { printf ( "probfunc:%s fun:%s\n", ...
sysbench测试阿里云ECS云磁盘的IOPS,吞吐量
测试阿里云ECS 对象:在aliyun上买了一个ECS附加的云盘,使用sysbench测试云盘的IOPS和吞吐量 sysbench prepare 准备文件,10个文件,1个1G [root@iZwz ...
IOPS和Throughput
IOPS和Throughput吞吐量两个参数是衡量存储性能的主要指标.IOPS表示存储每秒传输IO的数量,Throughput吞吐量则表示每秒数据的传输总量.两者在不同的情况下都能表示存储的性能状况, ...
磁盘性能评价指标—IOPS和吞吐量
转:http://blog.csdn.net/hanchengxi/article/details/19089589 一.磁盘 I/O 的概念 I/O 的概念,从字义来理解就是输入输出.操作系统从上层 ...
论存储IOPS和Throughput吞吐量之间的关系
论存储IOPS和Throughput吞吐量之间的关系 http://www.csdn.net/article/2015-01-14/2823552 IOPS和Throughput吞吐量两个参数是衡量存 ...
云数据库将进入企业级百万IOPS时代
IOPS (Input/Output Operations Per Second),即每秒进行读写(I/O)操作的次数,以衡量存储每秒可接受多少次主机发出的访问.数据库,特别是关系型数据库由于需要处理 ...

随机推荐

[LeetCode] 455. Assign Cookies_Easy tag: Sort
Assume you are an awesome parent and want to give your children some cookies. But, you should give e ...
jenkins构建多个项目执行顺序设置
通常我们会在jenkins中构建多个项目,那么如果项目有依赖,或者有关联怎么办? 例: 如下图: ,有两个构建项目,posWeb是个web项目,welife是接口项目. 由于接口项目数据会影响pos ...
shell基础：输入输出重定向
输出重定向将命令输出存入到文件,类似日志.便于查看.2和>>间没空格.但这种方法没用 ,命令执行时并不知道对错. /dev/null下的null就是一个垃圾箱,脚本中的一些命令并不需要保存 ...
js语法没有任何问题但是就是不走，检查js中命名的变量名，用 service-area错误，改service_area （原）
js语法没有任何问题但是就是不走,检查js中命名的变量名,用 service-area错误,改service_area
D Cloud of Hashtags Codeforces Round #401 (Div. 2)
Cloud of Hashtags [题目链接]Cloud of Hashtags &题意: 给你一个n,之后给出n个串,这些串的总长度不超过5e5,你要删除最少的单词(并且只能是后缀),使得 ...
Apache Storm Installation
安装的过程参照此处的过程介绍(https://www.tutorialspoint.com/apache_storm/apache_storm_installation.htm) 安装的过程要安装3个 ...
URL List by Category
URLs List AI https://www.cnblogs.com/zlel/p/8882129.html Javascript Promise http://liubin.org/promis ...
kail linux arp欺骗
首先连接wifi,进入内网 1,查看内网的存活主机命令 fping -asg 192.168.1.0/24 (视不同环境而定,假设这里的路由器地址为 192.168.1.1) 也可利用其他 ...
C/C++笔试题（基础题）
为了便于温故而知新,特于此整理 C/C++ 方面相关面试题.分享,共勉. (备注:各题的重要程度与先后顺序无关.不断更新中......欢迎补充) (1)分析下面程序的输出(* 与 -- 运算符优先级问 ...
python 读csv数据通过改变分隔符去掉引号
import csv with open(r'C:\Temp\ff.csv') as f: f_csv=csv.reader(f,delimiter='\t') headers=next(f_csv) ...

关于innodb_flush_log_at_trx_commit、innodb_flush_method、innodb_log_block_size和fsync()、O_DIRECT、iops、云盘的关系与总结

关于innodb_flush_log_at_trx_commit、innodb_flush_method、innodb_log_block_size和fsync()、O_DIRECT、iops、云盘的关系与总结的更多相关文章

随机推荐

热门专题