MySQL 5.6 Reference Manual-14.2 InnoDB Concepts and Architecture
14.2 InnoDB Concepts and Architecture
14.2.1 MySQL and the ACID Model
14.2.2 InnoDB Multi-Versioning
14.2.5 InnoDB Table and Index Structures
14.2.6 InnoDB Mutex and Read/Write Lock Implementation
The information in this section provides background to help you get the most performance and functionality from using InnoDB tables. It is intended for:
这一章节的信息能够帮助你获得大部分的性能以及对InnoDB表使用的功能。它的目的是:
l Anyone switching to MySQL from another database system, to explain what things might seem familiar and which might be all-new.
l 任何想要从其他数据库系统切换到MySQL的,可以用来了解那些熟悉的以及全新的内容。
l Anyone moving from MyISAM tables to InnoDB, now that InnoDB is the default MySQL storage engine.
l 准备从MyISAM转移到InnoDB的,现在InnoDB是MySQL的默认存储引擎了。
l Anyone considering their application architecture or software stack, to understand the design considerations, performance characteristics, and scalability of InnoDB tables at a detailed level.
l 任何需要考虑应用的架构或者软件的堆栈,需要理解InnoDB表细节层面的设计考虑,性能特点,还有可扩展性。
In this section, you will learn:
在这一章节,你将会学习到:
l How InnoDB closely adheres to ACID principles.
l InnoDB是如何紧密遵循ACID原则的。
l How multi-version concurrency control (MVCC) keeps transactions from viewing or modifying each others' data before the appropriate time.
l 多版本并发控制(MVCC)是如何在适当时间之前阻止事务查看或者修改其他事务的数据。
l The physical layout of InnoDB-related objects on disk, such as tables, indexes, tablespaces, undo logs, and the redo log.
l InnoDB在磁盘上的相关物理布局,例如表,索引,表空间,undo日志,redo日志。
14.2.1 MySQL and the ACID Model
The ACID model is a set of database design principles that emphasize aspects of reliability that are important for business data and mission-critical applications. MySQL includes components such as the InnoDB storage engine that adhere closely to the ACID model, so that data is not corrupted and results are not distorted by exceptional conditions such as software crashes and hardware malfunctions. When you rely on ACID-compliant features, you do not need to reinvent the wheel of consistency checking and crash recovery mechanisms. In cases where you have additional software safeguards, ultra-reliable hardware, or an application that can tolerate a small amount of data loss or inconsistency, you can adjust MySQL settings to trade some of the ACID reliability for greater performance or throughput.
ACID模式是一组数据库设计的原则,用来强调业务数据和核心应用的重要性。MySQL包含的部件例如InnoDB存储引擎是紧密遵循ACID模式的,所以数据不会因为软件崩溃,硬件故障而造成数据的损坏和扭曲。当你要依赖于ACID的特性,你不需要重新开发一致性的部件来检查崩溃和恢复的机制。当你的软件有额外的保护,高可靠性的硬件,或者应用能够忍受少量的数据丢失或者数据不一致的时候,你可以通过调整MySQL的设定,用一些ACID的可靠性交换更高的性能或者吞吐量。
The following sections discuss how MySQL features, in particular the InnoDB storage engine, interact with the categories of the ACID model:
下面的章节讲述了InnoDB存储引擎的详细特性,ACID的相互影响:
l A: atomicity.
l C: consistency.
l I:: isolation.
l D: durability.
Atomicity
The atomicity aspect of the ACID model mainly involves InnoDB transactions. Related MySQL features include:
ACID的原子性主要涉及在InnoDB的事务上。MySQL的相关特性包括:
l Autocommit setting.
l COMMIT statement.
l ROLLBACK statement.
l Operational data from the INFORMATION_SCHEMA tables.
Consistency
The consistency aspect of the ACID model mainly involves internal InnoDB processing to protect data from crashes. Related MySQL features include:
ACID的一致性主要体现在InnoDB的内部处理来防止数据崩溃。相关的MySQL特性包括:
l InnoDB doublewrite buffer.
l InnoDB crash recovery.
Isolation
The isolation aspect of the ACID model mainly involves InnoDB transactions, in particular the isolation level that applies to each transaction. Related MySQL features include:
ACID的隔离性主要涉及InnoDB的事务,特别是每个事务的隔离级别。相关的MySQL特性包括:
l Autocommit setting.
l SET ISOLATION LEVEL statement.
l The low-level details of InnoDB locking. During performance tuning, you see these details through INFORMATION_SCHEMA tables.
l Innodb锁的底层细节。在性能调优的过程中,你可以通过INFORMATION_SCHEMA表来查看这些细节。
Durability
The durability aspect of the ACID model involves MySQL software features interacting with your particular hardware configuration. Because of the many possibilities depending on the capabilities of your CPU, network, and storage devices, this aspect is the most complicated to provide concrete guidelines for. (And those guidelines might take the form of buy “new hardware”.) Related MySQL features include:
ACID的持久化主要涉及在MySQL软件的特性,特别是硬件配置的影响。因为很多的发展潜力都依赖于CPU,网络,还有存储磁盘的性能,所以这部分提供的指导方针是最为复杂的。(这些指导方针也有可能包括购买新的硬件。)相关的MySQL特性包括:
l InnoDB doublewrite buffer, turned on and off by the innodb_doublewrite configuration option.
l InnoDB doublewrite buffer,通过innodb_doublewrite配置参数打开或关闭。
l Configuration option innodb_flush_log_at_trx_commit.
l Configuration option sync_binlog.
l Configuration option innodb_file_per_table.
l Write buffer in a storage device, such as a disk drive, SSD, or RAID array.
l Battery-backed cache in a storage device.
l 存储磁盘上有依靠电池的缓存。
l The operating system used to run MySQL, in particular its support for the fsync() system call.
l 在运行MySQL的操作系统上,要支持fsync()。
l Uninterruptible power supply (UPS) protecting the electrical power to all computer servers and storage devices that run MySQL servers and store MySQL data.
l 用不间断电源(UPS)来保护所有计算机和存储设备的电能,使得能够使MySQL实例正常运行并存储MySQL数据。
l Your backup strategy, such as frequency and types of backups, and backup retention periods.
l 备份策略,例如备份的频率和类型,还有备份的保留周期。
l For distributed or hosted data applications, the particular characteristics of the data centers where the hardware for the MySQL servers is located, and network connections between the data centers.
l 对于分布式的应用数据,还要关注MySQL实例所在的主机硬件和数据中心之间的网络连接。
14.2.2 InnoDB Multi-Versioning
InnoDB is a multi-versioned storage engine: it keeps information about old versions of changed rows, to support transactional features such as concurrency and rollback. This information is stored in the tablespace in a data structure called a rollback segment (after an analogous data structure in Oracle). InnoDB uses the information in the rollback segment to perform the undo operations needed in a transaction rollback. It also uses the information to build earlier versions of a row for a consistent read.
InnoDB是一个多版本的存储引擎:它会保存修改行的旧版本的信息,用来支持支持事务并发和回滚。这些信息被存储在回滚段的表空间里(和Oracle里相似的数据结构)。InnoDB使用回滚段里的信息在需要回滚的事务里来执行undo的操作。它也会使用这些信息来构建行的早期版本用于一致性读。
Internally, InnoDB adds three fields to each row stored in the database. A 6-byte DB_TRX_ID field indicates the transaction identifier for the last transaction that inserted or updated the row. Also, a deletion is treated internally as an update where a special bit in the row is set to mark it as deleted. Each row also contains a 7-byte DB_ROLL_PTR field called the roll pointer. The roll pointer points to an undo log record written to the rollback segment. If the row was updated, the undo log record contains the information necessary to rebuild the content of the row before it was updated. A 6-byte DB_ROW_ID field contains a row ID that increases monotonically as new rows are inserted. If InnoDB generates a clustered index automatically, the index contains row ID values. Otherwise, the DB_ROW_ID column does not appear in any index.
InnoDB会为每一行在内部添加三列的数据。6-byte DB_TRX_ID表明了最后的insert或者update事务的事务标识。同样,delete操作在内部是被当作update的,只是会用一个特殊的bit用来标识已经被删除了。每个行也包含一个7-byte DB_ROLL_PTR列被称为滚动指针(roll pointer)。滚动指针会指向写在回滚段上一行undo日志记录。如果数据行被修改了,undo日志记录包含了必要的信息用来重建行修改之前的内容。6-byte DB_ROW_ID列包含了一个行的ID,在有新数据被插入的时候它会单调递增。如果InnoDB自动产生了一个聚合索引,这个索引就会包含这个行ID的值。否则,DB_ROW_ID就不会出现在任何的索引里。
Undo logs in the rollback segment are divided into insert and update undo logs. Insert undo logs are needed only in transaction rollback and can be discarded as soon as the transaction commits. Update undo logs are used also in consistent reads, but they can be discarded only after there is no transaction present for which InnoDB has assigned a snapshot that in a consistent read could need the information in the update undo log to build an earlier version of a database row.
回滚段里的undo日志会把insert和update的内容分割开来。insert undo日志只有在事务回滚的时候是需要的,只要事务一提交就会被丢弃。update undo日志还会被用于一致性读,只有在InnoDB不会再分配一个更早期版本的快照用于一致性读的时候才会被丢弃。
Commit your transactions regularly, including those transactions that issue only consistent reads. Otherwise, InnoDB cannot discard data from the update undo logs, and the rollback segment may grow too big, filling up your tablespace.
定期提交事务,包括那些只执行一致性读的事务。否则,InnoDB不会释放undo日志里面的数据,回滚段会增长得非常大,甚至于会撑满你的表空间。
The physical size of an undo log record in the rollback segment is typically smaller than the corresponding inserted or updated row. You can use this information to calculate the space needed for your rollback segment.
回滚段里undo日志记录的物理大小通常情况下会小于其相关的insert或者update的行记录。你可以使用这个信息来评估回滚段需要的空间。
In the InnoDB multi-versioning scheme, a row is not physically removed from the database immediately when you delete it with an SQL statement. InnoDB only physically removes the corresponding row and its index records when it discards the update undo log record written for the deletion. This removal operation is called a purge, and it is quite fast, usually taking the same order of time as the SQL statement that did the deletion.
在InnoDB的多版本方案里,当执行一个delete语句的时候行数据并不会立即从数据库里物理移除。只有在InnoDB丢弃与delete相关的update undo日志记录的时候,相关的行和索引记录才会被物理移除。这个移除操作被称之为purge,它的速度非常快,通常会以delete语句操作的同样的时间顺序执行。
If you insert and delete rows in smallish batches at about the same rate in the table, the purge thread can start to lag behind and the table can grow bigger and bigger because of all the “dead” rows, making everything disk-bound and very slow. In such a case, throttle new row operations, and allocate more resources to the purge thread by tuning the innodb_max_purge_lag system variable. See Section 14.12, “InnoDB Startup Options and System Variables” for more information.
如果你已相同的速度小批量在表insert或者delete行时,purge线程就会开始落后,因为这些“被删除的行”表会变得越来越大,使得所有的东西都被磁盘锁限制,速度也会非常慢。在这种情况下,要减慢新行插入的速度,通过innodb_max_purge_lag系统参数为purge线程分配更多的资源。更多信息可见Section 14.12, “InnoDB Startup Options and System Variables”。
Multi-Versioning and Secondary Indexes
InnoDB multiversion concurrency control (MVCC) treats secondary indexes differently than clustered indexes. Records in a clustered index are updated in-place, and their hidden system columns point undo log entries from which earlier versions of records can be reconstructed. Unlike clustered index records, secondary index records do not contain hidden system columns nor are they updated in-place.
InnoDB多版本并发控制(MVCC)在处理辅助索引的时候会和聚合索引不一样。聚合索引的记录会就地更新,它们隐藏的系统列会指向undo日志记录,更早期版本的undo日志记录会被重新构建。不同于聚合索引记录,辅助索引记录既不包含隐藏的系统列,也不会就地进行更新。
When a secondary index column is updated, old secondary index records are delete-marked, new records are inserted, and delete-marked records are eventually purged. When a secondary index record is delete-marked or the secondary index page is updated by a newer transaction, InnoDB looks up the database record in the clustered index. In the clustered index, the record's DB_TRX_ID is checked, and the correct version of the record is retrieved from the undo log if the record was modified after the reading transaction was initiated.
当一个辅助索引被更新了,旧的辅助索引记录会被做个删除标记,插入新的记录,有删除标记的记录最后才会被删除。当辅助索引记录被做了删除标记,或者辅助索引页被更新的事务更新了,InnoDB会在聚合索引里查找数据库记录。在聚合索引里,记录的DB_TRX_ID会被检查,如果在读事务开始之后记录被修改了,就会从undo日志里去检索正确的版本的记录。
If a secondary index record is marked for deletion or the secondary index page is updated by a newer transaction, the covering index technique is not used. Instead of returning values from the index structure, InnoDB looks up the record in the clustered index.
如果辅助索引记录被作为删除标记或者辅助索引页被更新的事务更新了,索引覆盖技术则不会被使用。与从索引结构里返回值相反,InnoDB会从聚合索引了查找记录。
However, if the index condition pushdown (ICP) optimization is enabled, and parts of the WHERE condition can be evaluated using only fields from the index, the MySQL server still pushes this part of the WHERE condition down to the storage engine where it is evaluated using the index. If no matching records are found, the clustered index lookup is avoided. If matching records are found, even among delete-marked records, InnoDB looks up the record in the clustered index.
然而,如果index condition pushdown (ICP)优化被开启了,WHERE条件的部分使用索引中的列就能进行评估,MySQL实例仍然会吓退这部分的WHERE条件到使用索引评估的存储引擎。如果没有匹配的记录被找到,聚合索引查找就会被避免。如果找到了匹配的记录,甚至是在删除标记的记录里,InnoDB就会从聚合索引里查找记录。
14.2.3 InnoDB Redo Log
14.2.3.1 Group Commit for Redo Log Flushing
The redo log is a disk-based data structure used during crash recovery to correct data written by incomplete transactions. During normal operations, the redo log encodes requests to change InnoDB table data, which result from SQL statements or low-level API calls. Modifications that did not finish updating the data files before an unexpected shutdown are replayed automatically during initialization, and before the connections are accepted. For information about the role of the redo log in crash recovery, see Section 14.16.1, “The InnoDB Recovery Process”.
redo日志是一个基于磁盘的数据结构,用于崩溃恢复那些正确的数据。在正常的操作期间,redo日志编码请求通过SQL语句或者底层的API修改InnoDB表的数据。那些意外关机之前未能完成对数据文件的修改操作在初始化阶段会被自动重放,之前的连接也能够继续被接受。更多redo日志在崩溃恢复时的规则信息建Section 14.16.1, “The InnoDB Recovery Process”。
By default, the redo log is physically represented on disk as a set of files, named ib_logfile0 and ib_logfile1. MySQL writes to the redo log files in a circular fashion. Data in the redo log is encoded in terms of records affected; this data is collectively referred to as redo. The passage of data through the redo log is represented by an ever-increasing LSN value.
默认情况下,redo日志以一组文件在物理磁盘上,名字是ib_logfile0和ib_logfile1。MySQL以循环的方式吸入redo日志文件。Data in the redo log is encoded in terms of records affected; this data is collectively referred to as redo. 通过redo日志的数据表现为永远递增的LSN值。
Disk layout for the redo log is configured using the following options:
redo日志磁盘布局的配置可以使用下面那的参数:
l innodb_log_file_size: Defines the size of each redo log file in bytes. By default, redo log files are 50331648 bytes (48MB) in size. The combined size of log files (innodb_log_file_size * innodb_log_files_in_group) cannot exceed a maximum value that is slightly less than 512GB.
l innodb_log_file_size:定义每个redo日志文件的大小(bytes)。默认情况下,redo日志文件的大小是50331648 bytes (48MB)。整个日志文件的大小 (innodb_log_file_size * innodb_log_files_in_group)不能超过512GB。
l innodb_log_files_in_group: The number of log files in the log group. The default is to create two files named ib_logfile0 and ib_logfile1.
l innodb_log_files_in_group:分组日志的数量。默认的是创建两个文件分别是ib_logfile0 and ib_logfile1。
l innodb_log_group_home_dir: The directory path to the InnoDB log files. If you do not specify a value, the log files are created in the MySQL data directory (datadir).
l innodb_log_group_home_dir:InnoDB日志文件的路径。如果你没有指定值,日志文件会被创建到MySQL的数据目录里 (datadir)。
To change your initial redo log configuration, refer to Section 14.5.2, “Changing the Number or Size of InnoDB Redo Log Files”. For information about optimizing redo logging, see Section 8.5.4, “Optimizing InnoDB Redo Logging”.
为了修改你初始的日志配置,可以查阅Section 14.5.2, “Changing the Number or Size of InnoDB Redo Log Files”。关于redo日志优化的信息,可以查看Section 8.5.4, “Optimizing InnoDB Redo Logging”。
14.2.3.1 Group Commit for Redo Log Flushing
InnoDB, like any other ACID-compliant database engine, flushes the redo log of a transaction before it is committed. InnoDB uses group commit functionality to group multiple such flush requests together to avoid one flush for each commit. With group commit, InnoDB issues a single write to the log file to perform the commit action for multiple user transactions that commit at about the same time, significantly improving throughput.
InnoDB,和其他符合ACID的数据库引擎一样,会在事务提交之前刷新它的redo日志。InnoDB会使用群组提交功能,一起提高刷新请求,这样就避免了每个commit都要进行一次刷新。通过群组提交,InnoDB用单个写操作来为多个用户事务在同一时间点执行提交操作,这样能够显著提高吞吐量。
For more information about performance of COMMIT and other transactional operations, see Section 8.5.2, “Optimizing InnoDB Transaction Management”.
更多关于COMMIT和其他事务操作的性能信息,可见Section 8.5.2, “Optimizing InnoDB Transaction Management”。
14.2.4 InnoDB Undo Logs
An undo log (or rollback segment) is a storage area that holds copies of data modified by active transactions. If another transaction needs to see the original data (as part of a consistent read operation), the unmodified data is retrieved from this storage area. By default, this area is physically part of the system tablespace. However, as of MySQL 5.6.3, undo logs can reside in separate undo tablespaces. For more information, see Section 14.5.7, “Storing InnoDB Undo Logs in Separate Tablespaces”. For more information about undo logs and multi-versioning, see Section 14.2.2, “InnoDB Multi-Versioning”.
undo日志(或者是回滚段)是一块存储区域,用来存放活动事务的修改的数据拷贝。如果其他的事务需要查看原始的数据(作为一致性读操作的一部分),就会从这个存储区域去检索修改之前的数据。默认情况下,这一区域物理存在在系统表空间里。但是,从MySQL5.6.3开始,undo日志也能放到单独的undo表空间里。更多的信息查看Section 14.5.7, “Storing InnoDB Undo Logs in Separate Tablespaces”。更多关于undo日志和多版本的信息可以查看Section 14.2.2, “InnoDB Multi-Versioning”。
InnoDB supports 128 undo logs, each supporting up to 1023 concurrent data-modifying transactions, for a total limit of approximately 128K concurrent data-modifying transactions (read-only transactions do not count against the maximum limit). Each transaction is assigned to one of the undo logs, and remains tied to that undo log for the duration. The innodb_undo_logs option defines how many undo logs are used by InnoDB.
InnoDB支持128个undo日志,每个日志支持到1023个并发数据修改事务,总的限制大概是128K个并发数据修改事务(read-only事务不包括在限制之内)。每个事务都会分配到一个undo日志,and remains tied to that undo log for the duration。innodb_undo_logs参数定义了InnoDB能够用多少个undo日志。
14.2.5 InnoDB Table and Index Structures
14.2.5.1 Role of the .frm File for InnoDB Tables
14.2.5.2 Clustered and Secondary Indexes
14.2.5.3 InnoDB FULLTEXT Indexes
14.2.5.4 Physical Structure of an InnoDB Index
14.2.5.6 Adaptive Hash Indexes
14.2.5.7 Physical Row Structure
This section describes how InnoDB tables, indexes, and their associated metadata is represented at the physical level. This information is primarily useful for performance tuning and troubleshooting.
这一章节描述了InnoDB表,索引,以及它们对应的元数据是如何在物理层面表现的。这部分的信息对性能调优和故障排除是极为有用的。
14.2.5.1 Role of the .frm File for InnoDB Tables
MySQL stores its data dictionary information for tables in .frm files in database directories. Unlike other MySQL storage engines, InnoDB also encodes information about the table in its own internal data dictionary inside the tablespace. When MySQL drops a table or a database, it deletes one or more .frm files as well as the corresponding entries inside the InnoDB data dictionary. You cannot move InnoDB tables between databases simply by moving the .frm files.
MySQL存储它数据目录的信息在数据库目录的.frm文件里。不像MySQL其他的存储引擎,Innodb还会把表相关的编码信息放它自己的数据目录的表空间里。当MySQL删除一个表或者数据库的时候,它还会在InnoDB数据目录的.frm文件里面删除一个或多个以之对应的条目。你不能简单地通过移动.frm文件在不同的数据库里移动InnoDB表。
14.2.5.2 Clustered and Secondary Indexes
Every InnoDB table has a special index called the clustered index where the data for the rows is stored. Typically, the clustered index is synonymous with the primary key. To get the best performance from queries, inserts, and other database operations, you must understand how InnoDB uses the clustered index to optimize the most common lookup and DML operations for each table.
每一个InnoDB表都有一个特别的索引叫做聚合索引(clustered index),用来存储行的数据。通常情况下,聚合索引等同于主键。想要得到查询,insert以及其他数据库操作的最佳性能,你必须要理解InnoDB如何使用聚合索引来优化每个表的大部分的普通查询和DML操作。
l When you define a PRIMARY KEY on your table, InnoDB uses it as the clustered index. Define a primary key for each table that you create. If there is no logical unique and non-null column or set of columns, add a new auto-increment column, whose values are filled in automatically.
l 当你为表 定义了一个主键,InnoDB会以clustered index的方式使用它。为你创建的每表创建一个主键。如果没有逻辑唯一,非空的一个或多个列,可以添加一个自动增长的列,让它的值可以自动增加。
l If you do not define a PRIMARY KEY for your table, MySQL locates the first UNIQUE index where all the key columns are NOT NULL and InnoDB uses it as the clustered index.
l 如果你的表没有定义主键,MySQL定位到第一个非空的唯一索引,并会以clustered index的方式使用它。
l If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally generates a hidden clustered index on a synthetic column containing row ID values. The rows are ordered by the ID that InnoDB assigns to the rows in such a table. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order.
l 如果表没有主键或者合适的唯一索引,InnoDB会在内部生成一个隐藏的clustered index,这个索引会建立在一个包含行ID的合成列上。行的数据也会按照ID的顺排列。这个行ID是一个6-byte的字段,在新行插入的时候会进行单调递增。因此,以行ID排序的行在物理上是以插入的顺序排列的。
How the Clustered Index Speeds Up Queries
Accessing a row through the clustered index is fast because the index search leads directly to the page with all the row data. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record. (For example, MyISAM uses one file for data rows and another for index records.)
通过clustered index访问行的速度是非常快的,因为索引搜索会直接引向所有行数据的物理块。如果表是很大的,the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record. (例如,MyISAM是一个文件存储行数据,另一个文件用于索引记录。)
How Secondary Indexes Relate to the Clustered Index
All indexes other than the clustered index are known as secondary indexes. In InnoDB, each record in a secondary index contains the primary key columns for the row, as well as the columns specified for the secondary index. InnoDB uses this primary key value to search for the row in the clustered index.
除了clustered index,其他的索引都被称之为辅助索引(secondary index)。在InnoDB里,每个secondary index的记录都包含了主键列的值,同样也要为secondary index指定一个列。InnoDB使用这个主键值在clustered index里面查找行记录。
If the primary key is long, the secondary indexes use more space, so it is advantageous to have a short primary key.
如果主键列很长,那么secondary index就会使用更多的空间,所以使用较短的主键是比较有利的。
For coding guidelines to take advantage of InnoDB clustered and secondary indexes, see Section 8.3.2, “Using Primary Keys” Section 8.3, “Optimization and Indexes” Section 8.5, “Optimizing for InnoDB Tables” Section 8.3.2, “Using Primary Keys”.
对于InnoDB的clustered和secondary index的编码指南,可以查看Section 8.3.2, “Using Primary Keys” Section 8.3, “Optimization and Indexes” Section 8.5, “Optimizing for InnoDB Tables” Section 8.3.2, “Using Primary Keys”。
14.2.5.3 InnoDB FULLTEXT Indexes
FULLTEXT indexes are created on text-based columns (CHAR, VARCHAR, or TEXT columns) to help speed up queries and DML operations on data contained within those columns, omitting any words that are defined as stopwords.
FULLTEXT index创建在基于文本的列上(CHAR, VARCHAR, or TEXT columns)来加快查询和DML操作的速度,省略的words会被定义成stopwords。
A FULLTEXT index can be defined as part of a CREATE TABLE statement, or added later using ALTER TABLE or CREATE INDEX.
FULLTEXT index在CREATE TABLE语句的时候定义,或者是之后通过ALTER TABLE or CREATE INDEX定义。
Full-text searching is performed using MATCH() ... AGAINST syntax. For usage information, see Section 12.9, “Full-Text Search Functions”.
通过MATCH() ... AGAINST可以执行全文搜索。详细的使用信息可以查看Section 12.9, “Full-Text Search Functions”。
Full-Text Index Design
InnoDB FULLTEXT indexes have an inverted index design. Inverted indexes store a list of words, and for each word, a list of documents that the word appears in. To support proximity search, position information for each word is also stored, as a byte offset.
InnoDB FULLTEXT index是一个反向的索引设计。反向索引存储了一个words的列表,对于每个word,会呈现在一个documents的列表里。为了支持邻近搜索,每个word的位置信息也会被存储起来,作为一个字节偏移量。
Full-text Index Tables
For each InnoDB FULLTEXT index, a set of index tables is created, as shown in the following example:
对于每个InnoDB FULLTEXT index,会创建一个索引表的集合,如下面显示的例子:
CREATE TABLE opening_lines (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
opening_line TEXT(500),
author VARCHAR(200),
title VARCHAR(200),
FULLTEXT idx (opening_line)
) ENGINE=InnoDB;
mysql> SELECT table_id, name, space from INFORMATION_SCHEMA.INNODB_SYS_TABLES
WHERE name LIKE 'test/%';
+----------+----------------------------------------------------+-------+
| table_id | name | space |
+----------+----------------------------------------------------+-------+
| 333 | test/FTS_0000000000000147_00000000000001c9_INDEX_1 | 289 |
| 334 | test/FTS_0000000000000147_00000000000001c9_INDEX_2 | 290 |
| 335 | test/FTS_0000000000000147_00000000000001c9_INDEX_3 | 291 |
| 336 | test/FTS_0000000000000147_00000000000001c9_INDEX_4 | 292 |
| 337 | test/FTS_0000000000000147_00000000000001c9_INDEX_5 | 293 |
| 338 | test/FTS_0000000000000147_00000000000001c9_INDEX_6 | 294 |
| 330 | test/FTS_0000000000000147_BEING_DELETED | 286 |
| 331 | test/FTS_0000000000000147_BEING_DELETED_CACHE | 287 |
| 332 | test/FTS_0000000000000147_CONFIG | 288 |
| 328 | test/FTS_0000000000000147_DELETED | 284 |
| 329 | test/FTS_0000000000000147_DELETED_CACHE | 285 |
| 327 | test/opening_lines | 283 |
+----------+----------------------------------------------------+-------+
The first six tables represent the inverted index and are referred to as auxiliary index tables. When incoming documents are tokenized, the individual words (also referred to as “tokens”) are inserted into the index tables along with position information and the associated Document ID (DOC_ID). The words are fully sorted and partitioned among the six index tables based on the charactre set sort weight of the word's first character.
前六个表代表反向索引,被称为辅助索引表。当传入的文档被分词了,个体word(也被称为“tokens”)会沿着位置信息和被分配的Document ID (DOC_ID)插入到索引表里。words会基于首字符的字符集排序权重,被分割并完全存储在这六个表里。
The inverted index is partitioned into six auxiliary index tables to support parallel index creation. By default, two threads tokenize, sort, and insert words and associated data into the index tables. The number of threads is configurable using the innodb_ft_sort_pll_degree option. When creating FULLTEXT indexes on large tables, consider increasing the number of threads.
反向索引被分割到六个辅助的索引表里来支持并行的索引创建。默认情况下,会有两个线程来分词,排序,以及插入数据,把关联的数据放入到索引表里。这个线程的数量可以通过innodb_ft_sort_pll_degree配置。当要在一个很大的表上创建FULLTEXT index时候,可以考虑增加线程的数量。
Auxiliary index table names are prefixed with FTS_ and postfixed with INDEX_*. Each index table is associated with the indexed table by a hex value in the index table name that matches the table_id of the indexed table. For example, the table_id of the test/opening_lines table is 327, for which the hex value is 0x147. As shown in the preceding example, the “147” hex value appears in the names of index tables that are associated with the test/opening_lines table.
辅助索引表的名字是以FTS_作为前缀,INDEX_*作为后缀。每个索引的表的名字是一个十六进制的值,并且和索引表的table_id匹配。例如,test/opening_lines表的table_id是327,它的十六进制值是0x147。如之前例子里所显示的,“147”的十六进制值呈现在索引表的名字里,这也是分配给test/opening_lines表的table_id。
A hex value representing the index_id of the FULLTEXT index also appears in auxiliary index table names. For example, in the auxiliary table name test/FTS_0000000000000147_00000000000001c9_INDEX_1, the hex value 1c9 has a decimal value of 457. The index defined on the opening_lines table (idx) can be identified by querying the INFORMATION_SCHEMA.INNODB_SYS_INDEXES table for this value (457).
代表FULLTEXT索引index_id的十六进制值也会呈现在辅助索引表的名字里。例如,辅助表的名字test/FTS_0000000000000147_00000000000001c9_INDEX_1,十六进制值1c9的十进制值是457.opening_lines table (idx)上的索引定义可以通过查询INFORMATION_SCHEMA.INNODB_SYS_INDEXES表得知是(457)。
mysql> SELECT index_id, name, table_id, space from INFORMATION_SCHEMA.INNODB_SYS_INDEXES
WHERE index_id=457;
+----------+------+----------+-------+
| index_id | name | table_id | space |
+----------+------+----------+-------+
| 457 | idx | 327 | 283 |
+----------+------+----------+-------+
Index tables are stored in their own tablespace when innodb_file_per_table is enabled. If innodb_file_per_table is disabled, index tables are stored in the InnoDB system tablespace (space 0).
如果innodb_file_per_table开启的话,索引表会存储在它们自身的表空间里。如果innodb_file_per_table是关闭的,索引表会存储在InnoDB系统表空间里 (space 0)。
Note
Due to a bug introduced in MySQL 5.6.5, index tables are created in the InnoDB system tablespace (space 0) when innodb_file_per_table is enabled. The bug is fixed in MySQL 5.6.20 and MySQL 5.7.5 (Bug#18635485).
由于在MySQL5.6.5引进的一个bug,当innodb_file_per_table开启的时候索引表会创建在InnoDB系统表空间里(space 0)。这个bug已经在MySQL 5.6.20 and MySQL 5.7.5 (Bug#18635485)修复。
The other index tables shown in the preceding example are used for deletion handling and for storing the internal state of the FULLTEXT index.
之前例子里显示的其他索引表被用于删除处理,以及存储FULLTEXT索引的内部状态。
l FTS_*_DELETED and FTS_*_DELETED_CACHE: Contain the document IDs (DOC_ID) for documents that are deleted but whose data is not yet removed from the full-text index. The FTS_*_DELETED_CACHE is the in-memory version of the FTS_*_DELETED table.
l FTS_*_DELETED and FTS_*_DELETED_CACHE: 包含了已经删除的但是full-text index里的数据还未移除的文档的document IDs (DOC_ID)。FTS_*_DELETED_CACHE是FTS_*_DELETED表的内存里的版本。
l FTS_*_BEING_DELETED and FTS_*_BEING_DELETED_CACHE: Contain the document IDs (DOC_ID) for documents that are deleted and whose data is currently in the process of being removed from the full-text index. The FTS_*_BEING_DELETED_CACHE table is the in-memory version of the FTS_*_BEING_DELETED table.
l FTS_*_BEING_DELETED and FTS_*_BEING_DELETED_CACHE: 包含了那些已经删除的文档,而且它在full-text index里的数据目前正在开始移除的文档的document IDs (DOC_ID)。FTS_*_BEING_DELETED_CACHE表是FTS_*_BEING_DELETED表的在内存里的版本。
l FTS_*_CONFIG: Stores information about the internal state of the FULLTEXT index. Most importantly, it stores the FTS_SYNCED_DOC_ID, which identifies documents that have been parsed and flushed to disk. In case of crash recovery, FTS_SYNCED_DOC_ID values are used to identify documents that have not been flushed to disk so that the documents can be re-parsed and added back to the FULLTEXT index cache. To view the data in this table, query the INFORMATION_SCHEMA.INNODB_FT_CONFIG table.
l FTS_*_CONFIG: 存储了FULLTEXT index内部状态的信息。最为重要的是,它存储了FTS_SYNCED_DOC_ID(确认那些已经被解析了,被刷新到磁盘上的文档)。在崩溃恢复的情况下,FTS_SYNCED_DOC_ID的值用来鉴定那些还未刷新到磁盘上的文档,那么这些文档就可以重新解析并添加回FULLTEXT index cache。要查询这个表的数据,可以查询INFORMATION_SCHEMA.INNODB_FT_CONFIG表。
Full-Text Index Cache
When a document is inserted, it is tokenized, and the individual words and associated data are inserted into the FULLTEXT index. This process, even for small documents, could result in numerous small insertions into the auxiliary index tables, making concurrent access to these tables a point of contention. To avoid this problem, InnoDB uses a FULLTEXT index cache to temporarily cache index table insertions for recently inserted rows. This in-memory cache structure holds insertions until the cache is full and then batch flushes them to disk (to the auxiliary index tables). You can query the INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE table to view tokenized data for recently inserted rows.
当文本被插入的时候,它会被分词,单个的word和相关的数据会被插入到FULLTEXT索引里。这个处理过程,即使针对的是小的文档,也会产生很多小的插入到辅助索引表的操作,在一个连接点上产生很多最这些表的并发访问。为了避免这样的问题,InnoDB使用FULLTEXT index cache来临时缓存最近插入到索引表的行数据。这个在内存中的缓存结构会保存插入的数据,直到缓存慢了,然后就会把它们批量刷新到磁盘上(到辅助索引表里)。你可以通过查询INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE表来查看最近插入行的分词数据。
The caching and batch flushing behavior avoids frequent updates to auxiliary index tables, which could result in concurrent access issues during busy insert and update times. The batching technique also avoids multiple insertions for the same word, and minimizes duplicate entries. Instead of flushing each word individually, insertions for the same word are merged and flushed to disk as a single entry, improving insertion efficiency while keeping auxiliary index tables as small as possible.
缓存和批量刷新操作避免了频繁更新辅助索引表,避免了在繁忙的insert和update时间上的并发访问题。批处理技术还能够避免插入多个相同的word,最大限度地减少重复条目。相对与刷新每个单独的word,同一个word的插入工作会被合并,然后把单个的条目刷新到磁盘上,这样就提升了插入操作的效率,也使得了辅助索引表尽可能地小。
The innodb_ft_cache_size variable is used to configure the full-text index cache size (on a per-table basis), which affects how often the full-text index cache is flushed. You can also define a global full-text index cache size limit for all tables in a given instance using the innodb_ft_total_cache_size option.
innodb_ft_cache_size变量用于配置full-text index cache的大小(基于每个表),决定了full-text index cache多久刷新一次。你可以使用innodb_ft_total_cache_size参数定义一个全局的针对所有表的full-text index cache大小的限制。
The full-text index cache stores the same information as auxiliary index tables. However, the full-text index cache only caches tokenized data for recently inserted rows. The data that is already flushed to disk (to the full-text auxiliary tables) is not brought back into the full-text index cache when queried. The data in auxiliary index tables is queried directly, and results from the auxiliary index tables are merged with results from the full-text index cache before being returned.
full-text index cache存储了和辅助索引表相同的信息。但是,full-text index cache只缓存了最近插入行的分词数据。已经刷新到磁盘(到full-text辅助表)的数据不会返回到 full-text index cache里。辅助索引表的数据是可以直接查询的,辅助索引表的结果把full-text index cache的结果进行合并的。
InnoDB Full-Text Document ID and FTS_DOC_ID Column
InnoDB uses a unique document identifier referred to as a Document ID (DOC_ID) to map words in the full-text index to document records where the word appears. The mapping requires an FTS_DOC_ID column on the indexed table. If an FTS_DOC_ID column is not defined, InnoDB automatically adds a hidden FTS_DOC_ID column when the full-text index is created. The following example demonstrates this behavior.
InnoDB使用唯一的文档标识作为Document ID (DOC_ID),用来把full-text index里面的word映射到文档记录里。映射会要求在索引表上有一个FTS_DOC_ID列。如果FTS_DOC_ID没有定义,InnoDB会在full-text index创建的时候自动创建一个隐藏的FTS_DOC_ID列。下面的例子显示了这种行为。
The following table definition does not include an FTS_DOC_ID column:
下面表没有定义FTS_DOC_ID列:
CREATE TABLE opening_lines (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
opening_line TEXT(500),
author VARCHAR(200),
title VARCHAR(200)
) ENGINE=InnoDB;
When you create a full-text index on the table using CREATE FULLTEXT INDEX syntax, a warning is returned which reports that InnoDB is rebuilding the table to add the FTS_DOC_ID column.
当你使用CREATE FULLTEXT INDEX在表上创建一个full-text index的时候,会返回一个警告表示InnoDB在表上添加了一个FTS_DOC_ID列。
mysql> CREATE FULLTEXT INDEX idx ON opening_lines(opening_line);
Query OK, 0 rows affected, 1 warning (0.19 sec)
Records: 0 Duplicates: 0 Warnings: 1
mysql> SHOW WARNINGS;
+---------+------+--------------------------------------------------+
| Level | Code | Message |
+---------+------+--------------------------------------------------+
| Warning | 124 | InnoDB rebuilding table to add column FTS_DOC_ID |
+---------+------+--------------------------------------------------+
The same warning is returned when using ALTER TABLE to add a full-text index to a table that does not have an FTS_DOC_ID column. If you create a full-text index at CREATE TABLE time and do not specify an FTS_DOC_ID column, InnoDB adds a hidden FTS_DOC_ID column, without warning.
如果没有FTS_DOC_ID列的话在ALTER TABLE添加full-text index的时候也会返回相同的警告。如果你在CREATE TABLE的时候没有没有指定FTS_DOC_ID列,同时也添加了full-text index,InnoDB会添加一个隐藏的FTS_DOC_ID列,而不会有警告。
Defining an FTS_DOC_ID column at CREATE TABLE time reduces the time required to create a full-text index on a table that is already loaded with data. If an FTS_DOC_ID column is defined on a table prior to loading data, the table and its indexes do not have to be rebuilt to add the new column. If you are not concerned with CREATE FULLTEXT INDEX performance, leave out the FTS_DOC_ID column to have InnoDB create it for you. InnoDB creates a hidden FTS_DOC_ID column along with a unique index (FTS_DOC_ID_INDEX) on the FTS_DOC_ID column. If you want to create your own FTS_DOC_ID column, the column must be defined as BIGINT UNSIGNED NOT NULL and named FTS_DOC_ID (all upper case), as in the following example:
在CREATE TABLE的时候就明确定义了FTS_DOC_ID列,这样因为在创建full-text index的可以加载已有的数据,索引可以减少需要的时间。如果FTS_DOC_ID列在加载数据之前就已经定义了,表和索引不需要重建来添加新的列。如果你不关心CREATE FULLTEXT INDEX的性能,可以忽略FTS_DOC_ID列,InnoDB会为你创建它。Innodb创建的隐藏列FTS_DOC_ID上还会有一个唯一索引(FTS_DOC_ID_INDEX)。如果你想要自己创建FTS_DOC_ID列,这列必须定义成BIGINT UNSIGNED NOT NULL并命名为FTS_DOC_ID,如下面的例子:
Note
The FTS_DOC_ID column does not need to be defined as an AUTO_INCREMENT column but AUTO_INCREMENT could make loading data easier.
FTS_DOC_ID不需要定义成一个AUTO_INCREMENT列,但是AUTO_INCREMENT会使得数据加载得更容易。
CREATE TABLE opening_lines (
FTS_DOC_ID BIGINT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
opening_line TEXT(500),
author VARCHAR(200),
title VARCHAR(200)
) ENGINE=InnoDB;
If you choose to define the FTS_DOC_ID column yourself, you are responsible for managing the column to avoid empty or duplicate values. FTS_DOC_ID values cannot be reused, which means FTS_DOC_ID values must be ever increasing.
如果你选择自己定义FTS_DOC_ID列,那你就要负责管理列不能为空或者有重复值。FTS_DOC_ID的值是不能够重复利用的,那就意味这FTS_DOC_ID的值必须是永远增加的。
Optionally, you can create the required unique FTS_DOC_ID_INDEX (all upper case) on the FTS_DOC_ID column.
根据情况,你可以在FTS_DOC_ID列上创建需要的唯一索引FTS_DOC_ID_INDEX。
CREATE UNIQUE INDEX FTS_DOC_ID_INDEX on opening_lines(FTS_DOC_ID);
If you do not create the FTS_DOC_ID_INDEX, InnoDB creates it automatically.
如果你没有创建FTS_DOC_ID_INDEX,InnoDB会自动创建它。
Before MySQL 5.6.31, the permitted gap between the largest used FTS_DOC_ID value and new FTS_DOC_ID value is 10000. In MySQL 5.6.31 and later, the permitted gap is 65535.
在MySQL5.6.31之前,在FTS_DOC_ID最大使用的值和新的FTS_DOC_ID值之间允许的间隙是10000。在MySQL5.6.31及以后的版本,这个允许的间隙是65535。
InnoDB Full-Text Index Deletion Handling
Deleting a record that has a full-text index column could result in numerous small deletions in the auxiliary index tables, making concurrent access to these tables a point of contention. To avoid this problem, the Document ID (DOC_ID) of a deleted document is logged in a special FTS_*_DELETED table whenever a record is deleted from an indexed table, and the indexed record remains in the full-text index. Before returning query results, information in the FTS_*_DELETED table is used to filter out deleted Document IDs. The benefit of this design is that deletions are fast and inexpensive. The drawback is that the size of the index is not immediately reduced after deleting records. To remove full-text index entries for deleted records, you must run OPTIMIZE TABLE on the indexed table with innodb_optimize_fulltext_only=ON to rebuild the full-text index. For more information, see Optimizing InnoDB Full-Text Indexes.
删除一条有full-text index的记录会引起许多在辅助索引上上的删除操作,在时间连接时间点上有很多对这些表的并发访问。为了避免这样的问题,每当从索引表里删除条记录时,被删除文档的Document ID (DOC_ID)会被记录在指定的FTS_*_DELETED表里,索引记录仍然会保留在full-text index里。在返回查询结果之前,FTS_*_DELETED表里的信息会被用于过滤出已删除的Document IDs。这样设计的好处是删除操作会是快速廉价的。缺点是在删除记录后索引的大小不会立刻减小。为了移除已删除记录的full-text index条目,你必须在索引表上通过innodb_optimize_fulltext_only=ON运行OPTIMIZE TABLE来重建full-text index。更多相关的信息可以查看Optimizing InnoDB Full-Text Indexes。
InnoDB Full-Text Index Transaction Handling
InnoDB FULLTEXT indexes have special transaction handling characteristics due its caching and batch processing behavior. Specifically, updates and insertions on a FULLTEXT index are processed at transaction commit time, which means that a FULLTEXT search can only see committed data. The following example demonstrates this behavior. The FULLTEXT search only returns a result after the inserted lines are committed.
InnoDB FULLTEXT index由于它的缓存和批处理行为,有一个特殊的事务处理特点。FULLTEXT index上的update和insert操作会在事务提交的时候处理,这就意味着FULLTEXT搜索只能看到已提交的数据。下面的例子演示了这种情况。FULLTEXT搜索只能返回已提交的insert数据。
mysql> CREATE TABLE opening_lines (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
opening_line TEXT(500),
author VARCHAR(200),
title VARCHAR(200),
FULLTEXT idx (opening_line)
) ENGINE=InnoDB;
mysql> BEGIN;
Query OK, 0 rows affected (0.00 sec)
mysql> INSERT INTO opening_lines(opening_line,author,title) VALUES
('Call me Ishmael.','Herman Melville','Moby-Dick'),
('A screaming comes across the sky.','Thomas Pynchon','Gravity\'s Rainbow'),
('I am an invisible man.','Ralph Ellison','Invisible Man'),
('Where now? Who now? When now?','Samuel Beckett','The Unnamable'),
('It was love at first sight.','Joseph Heller','Catch-22'),
('All this happened, more or less.','Kurt Vonnegut','Slaughterhouse-Five'),
('Mrs. Dalloway said she would buy the flowers herself.','Virginia Woolf','Mrs. Dalloway'),
('It was a pleasure to burn.','Ray Bradbury','Fahrenheit 451');
Query OK, 8 rows affected (0.00 sec)
Records: 8 Duplicates: 0 Warnings: 0
mysql> SELECT COUNT(*) FROM opening_lines WHERE MATCH(opening_line) AGAINST('Ishmael');
+----------+
| COUNT(*) |
+----------+
| 0 |
+----------+
mysql> COMMIT;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT COUNT(*) FROM opening_lines WHERE MATCH(opening_line) AGAINST('Ishmael');
+----------+
| COUNT(*) |
+----------+
| 1 |
+----------+
Monitoring InnoDB Full-Text Indexes
You can monitor and examine the special text-processing aspects of InnoDB FULLTEXT indexes by querying the following INFORMATION_SCHEMA tables:
你可以通过查询下面的INFORMATION_SCHEMA表来监控检查InnoDB FULLTEXT index的各方面内容:
INNODB_FT_CONFIG
INNODB_FT_INDEX_TABLE
INNODB_FT_INDEX_CACHE
INNODB_FT_DEFAULT_STOPWORD
INNODB_FT_DELETED
INNODB_FT_BEING_DELETED
You can also view basic information for FULLTEXT indexes and tables by querying INNODB_SYS_INDEXES and INNODB_SYS_TABLES.
你还能够通过查询INNODB_SYS_INDEXES and INNODB_SYS_TABLES来查看FULLTEXT index和表的基本信息。
See Section 14.13.4, “InnoDB INFORMATION_SCHEMA FULLTEXT Index Tables” for more information.
更多的信息查看Section 14.13.4, “InnoDB INFORMATION_SCHEMA FULLTEXT Index Tables”。
14.2.5.4 Physical Structure of an InnoDB Index
All InnoDB indexes are B-trees where the index records are stored in the leaf pages of the tree. The default size of an index page is 16KB.
所有的InnoDB索引都是B-trees结构的,索引的记录存储在树的叶子节点上。索引页的默认大小是16KB。
When new records are inserted into an InnoDB clustered index, InnoDB tries to leave 1/16 of the page free for future insertions and updates of the index records. If index records are inserted in a sequential order (ascending or descending), the resulting index pages are about 15/16 full. If records are inserted in a random order, the pages are from 1/2 to 15/16 full. If the fill factor of an index page drops below 1/2, InnoDB tries to contract the index tree to free the page.
当有新的记录被 插入到InnoDB clustered index的时候,InnoDB会留下1/16的页空余空间用来添加和修改索引记录。如果索引记录是连续的顺序插入的(升序或降序),那么结果达到页空间的15/16就会满了。如果记录是随机顺序插入的,那么达到1/2到15/16的时候就会满了。如果因为删除因素索引页小于了1/2,InnoDB就会视图收缩索引树来节约页空间。
You can configure the page size for all InnoDB tablespaces in a MySQL instance by setting the innodb_page_size configuration option before creating the instance. Once the page size for an instance is set, you cannot change it. Supported sizes are 16KB, 8KB, and 4KB, corresponding to the option values 16k, 8k, and 4k.
你可以在创建实例之前通过设定innodb_page_size参数指定所有InnoDB表空间的页大小。一旦实例的页大小被设定了,你就不能再修改它了。可支持的大小有16KB, 8KB, and 4KB,相对应的参数值是16k, 8k, and 4k。
A MySQL instance using a particular InnoDB page size cannot use data files or log files from an instance that uses a different page size.
MySQL实例使用的InnoDB页大小比较特殊,一个实例里使用的数据文件或者日志文件不能使用不同的页大小。
14.2.5.5 Change Buffer
The change buffer is a special data structure that caches changes to secondary index pages when affected pages are not in the buffer pool. The buffered changes, which may result from INSERT, UPDATE, or DELETE operations (DML), are merged later when the pages are loaded into the buffer pool by other read operations.
change buffer是一个特殊的数据结构,用于在被影响的页不在buffer pool的时候缓存secondary index页的修改。由于DML操作而产生的在buffer中缓存的修改操作,会在由于其他的读操作加载到buffer pool的时候进行合并。
Unlike clustered indexes, secondary indexes are usually non-unique, and inserts into secondary indexes happen in a relatively random order. Similarly, deletes and updates may affect secondary index pages that are not adjacently located in an index tree. Merging cached changes at a later time, when affected pages are read into the buffer pool by other operations, avoids substantial random access I/O that would be required to read-in secondary index pages from disk.
不像clustered index,secondary index通常是非唯一的,插入到secondary index的操作也很有可能是随机顺序的。同样地,delete和update操作会使得secondary index页在索引数里不是相邻的。当受影响的页由于其他操作读到buffer pool的时候,缓存里的修改合并操作会在稍后的时间执行,这样就避免了大量随机的I/O访问,也就避免了从磁盘里读取要求的secondary index页。
Periodically, the purge operation that runs when the system is mostly idle, or during a slow shutdown, writes the updated index pages to disk. The purge operation can write disk blocks for a series of index values more efficiently than if each value were written to disk immediately.
可以在系统大部分空闲的时候周期性运行purge操作,或者在缓慢的关机过程中把修改的索引页写入到磁盘里。和每个值立刻写入到磁盘的方式相比,purge操作可以把一系列的索引值写入到磁盘块里,这样更有效率。
Change buffer merging may take several hours when there are numerous secondary indexes to update and many affected rows. During this time, disk I/O is increased, which can cause a significant slowdown for disk-bound queries. Change buffer merging may also continue to occur after a transaction is committed. In fact, change buffer merging may continue to occur after a server shutdown and restart (see Section 14.19.2, “Forcing InnoDB Recovery” for more information).
当有大量的secondary index要更新,涉及到非常多的行时,change buffer的合并操作会要花费数个小时。在这个时间段里,磁盘I/O将会提升,基于磁盘的查询操作也会显著放慢。change buffer的合并操作也会发生在事务提交之后。实际上,change buffer的合并操作也会发生在实例关闭和重启的时候。
In memory, the change buffer occupies part of the InnoDB buffer pool. On disk, the change buffer is part of the system tablespace, so that index changes remain buffered across database restarts.
在内存里,change buffer会占用部分的InnoDB buffer pool。在磁盘上,change buffer占用的是一部分的系统表空间,所以在数据库重启的过程中索引的修改仍然会保留在 buffer里。
The type of data cached in the change buffer is governed by the innodb_change_buffering configuration option. For more information see, Section 14.4.5, “Configuring InnoDB Change Buffering”. You can also configure the maximum change buffer size. For more information, see Section 14.4.5.1, “Configuring the Change Buffer Maximum Size”.
change buffer通过innodb_change_buffering配置参数管理缓存里的数据。更多的信息查看Section 14.4.5, “Configuring InnoDB Change Buffering”。你还可以修改配置最大的change buffer的大小。更多的信息查看Section 14.4.5.1, “Configuring the Change Buffer Maximum Size”。
Monitoring the Change Buffer
The following options are available for change buffer monitoring:
下面的参数可用于监控change buffer:
l InnoDB Standard Monitor output includes status information for the change buffer. To view monitor data, issue the SHOW ENGINE INNODB STATUS command.
l InnoDB Standard Monitor的输出包括了change buffer的状态信息。可以通过执行SHOW ENGINE INNODB STATUS来查看监控数据。
mysql> SHOW ENGINE INNODB STATUS\G
Change buffer status information is located under the INSERT BUFFER AND ADAPTIVE HASH INDEX heading and appears similar to the following:
change buffer的状态信息在INSERT BUFFER AND ADAPTIVE HASH INDEX的表头下面,呈现出的信息像下面着怎样的:
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 0, seg size 2, 0 merges
merged operations:
insert 0, delete mark 0, delete 0
discarded operations:
insert 0, delete mark 0, delete 0
Hash table size 4425293, used cells 32, node heap has 1 buffer(s)
13577.57 hash searches/s, 202.47 non-hash searches/s
For a description of each data point, see Section 14.15.3, “InnoDB Standard Monitor and Lock Monitor Output”.
每个数据点的描述可以查看Section 14.15.3, “InnoDB Standard Monitor and Lock Monitor Output”。
l The INFORMATION_SCHEMA.INNODB_METRICS table provides most of the data points found in InnoDB Standard Monitor output, plus other data points. To view change buffer metrics and a description of each, issue the following query:
l INFORMATION_SCHEMA.INNODB_METRICS表提供了InnoDB Standard Monitor输出的大部分的数据点,另外还增加了其他的数据点。可以执行下面查询来查看change buffer的度量和每个数据点的描述:
mysql> SELECT NAME, COMMENT FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE NAME LIKE '%ibuf%'\G
For INNODB_METRICS table usage information, see Section 14.13.6, “InnoDB INFORMATION_SCHEMA Metrics Table”.
关于INNODB_METRICS表的使用信息可以查看Section 14.13.6, “InnoDB INFORMATION_SCHEMA Metrics Table”。
l The INFORMATION_SCHEMA.INNODB_BUFFER_PAGE table provides metadata about each page in the buffer pool, including change buffer index and change buffer bitmap pages. Change buffer pages are identified by PAGE_TYPE. IBUF_INDEX is the page type for change buffer index pages, and IBUF_BITMAP is the page type for change buffer bitmap pages.
l INFORMATION_SCHEMA.INNODB_BUFFER_PAGE提供了buffer pool里面每个页元数据,包括change buffer索引和change buffer位图页。change buffer页由PAGE_TYPE来标识。change buffer索引的页类型是IBUF_INDEX,change buffer位图页的页类型是IBUF_BITMAP。
Warning
Querying the INNODB_BUFFER_PAGE table can introduce significant performance overhead. To avoid impacting performance, reproduce the issue you want to investigate on a test instance and run your queries on the test instance.
查询INNODB_BUFFER_PAGE表可能会带来显著的性能损耗。为了避免对性能的影响, 请在测试环境上重现你想调查的问题,然后在测试环境上进行查询。
For example, you can query the INNODB_BUFFER_PAGE table to determine the approximate number of IBUF_INDEX and IBUF_BITMAP pages as a percentage of total buffer pool pages.
例如,你想查询INNODB_BUFFER_PAGE表来确认IBUF_INDEX和IBUF_BITMAP页的数量占整个buffer pool的比例。
SELECT
(SELECT COUNT(*) FROM INFORMATION_SCHEMA.INNODB_BUFFER_PAGE
WHERE PAGE_TYPE LIKE 'IBUF%'
) AS change_buffer_pages,
(
SELECT COUNT(*)
FROM INFORMATION_SCHEMA.INNODB_BUFFER_PAGE
) AS total_pages,
(
SELECT ((change_buffer_pages/total_pages)*100)
) AS change_buffer_page_percentage;
+---------------------+-------------+-------------------------------+
| change_buffer_pages | total_pages | change_buffer_page_percentage |
+---------------------+-------------+-------------------------------+
| 25 | 8192 | 0.3052 |
+---------------------+-------------+-------------------------------+
For information about other data provided by the INNODB_BUFFER_PAGE table, see Section 21.29.16, “The INFORMATION_SCHEMA INNODB_BUFFER_PAGE Table”. For related usage information, see Section 14.13.5, “InnoDB INFORMATION_SCHEMA Buffer Pool Tables”.
INNODB_BUFFER_PAGE表提供的其他的数据信息,可以查看Section 21.29.16, “The INFORMATION_SCHEMA INNODB_BUFFER_PAGE Table”。相关的使用信息可以查看Section 14.13.5, “InnoDB INFORMATION_SCHEMA Buffer Pool Tables”。
l Performance Schema provides change buffer mutex wait instrumentation for advanced performance monitoring. To view change buffer instrumentation, issue the following query:
l Performance Schema为高级性能监控提供了change buffer mutex wait instrumentation。查看change buffer instrumentation,可以执行下面的查询:
mysql> SELECT * FROM performance_schema.setup_instruments
WHERE NAME LIKE '%wait/synch/mutex/innodb/ibuf%';
+-------------------------------------------------------+---------+-------+
| NAME | ENABLED | TIMED |
+-------------------------------------------------------+---------+-------+
| wait/synch/mutex/innodb/ibuf_bitmap_mutex | YES | YES |
| wait/synch/mutex/innodb/ibuf_mutex | YES | YES |
| wait/synch/mutex/innodb/ibuf_pessimistic_insert_mutex | YES | YES |
+-------------------------------------------------------+---------+-------+
For information about monitoring InnoDB mutex waits, see Section 14.14.1, “Monitoring InnoDB Mutex Waits Using Performance Schema”.
关于监控InnoDB mutex waits的监控信息,可以查看Section 14.14.1, “Monitoring InnoDB Mutex Waits Using Performance Schema”。
14.2.5.6 Adaptive Hash Indexes
The adaptive hash index (AHI) lets InnoDB perform more like an in-memory database on systems with appropriate combinations of workload and ample memory for the buffer pool, without sacrificing any transactional features or reliability. This feature is enabled by the innodb_adaptive_hash_index option, or turned off by --skip-innodb_adaptive_hash_index at server startup.
adaptive hash index (AHI)通过适当的工作量负载,以及buffer pool中充足的内存,可以使得InnoDB像内存数据库一样运行,而且不会牺牲任何的事务特性或者可靠性。这个特性可以通过innodb_adaptive_hash_index参数打开,或者在实例启动的时候通过--skip-innodb_adaptive_hash_index关闭。
Based on the observed pattern of searches, MySQL builds a hash index using a prefix of the index key. The prefix of the key can be any length, and it may be that only some of the values in the B-tree appear in the hash index. Hash indexes are built on demand for those pages of the index that are often accessed.
基于搜索的观察模式,MySQL使用索引key的前缀来建立hash index。key的前缀可以是任何长度的,可有可能一些B-tree的值出现在hash index里。hash index是构建在这样的需求上的:一部分的索引页是要经常访问的。
If a table fits almost entirely in main memory, a hash index can speed up queries by enabling direct lookup of any element, turning the index value into a sort of pointer. InnoDB has a mechanism that monitors index searches. If InnoDB notices that queries could benefit from building a hash index, it does so automatically.
如果一个表大部分的时候都是完全在内存里的,通过开启任何元素的直接查找,把索引值放入到一个排序的点上,这样hash index就可以加速查询的速度。InnoDB有一个机制来监控索引的搜索。如果InnoDB注意到查询从构建的hash index上有优势的时候,它就会自动这么处理。
With some workloads, the speedup from hash index lookups greatly outweighs the extra work to monitor index lookups and maintain the hash index structure. Sometimes, the read/write lock that guards access to the adaptive hash index can become a source of contention under heavy workloads, such as multiple concurrent joins. Queries with LIKE operators and % wildcards also tend not to benefit from the AHI. For workloads where the adaptive hash index is not needed, turning it off reduces unnecessary performance overhead. Because it is difficult to predict in advance whether this feature is appropriate for a particular system, consider running benchmarks with it both enabled and disabled, using a realistic workload. The architectural changes in MySQL 5.6 and higher make more workloads suitable for disabling the adaptive hash index than in earlier releases, although it is still enabled by default.
在一些工作负载下,通过hash index查找带来的加速远超过监控索引查找和维护hash index结构带来的额外的工作量。有的时候,由read/write守护的adaptive hash index在一些沉重的工作负载下会成为竞争的源,特别是大量并发的join操作的时候。在查询中使用LIKE操作和%通配符也不会从AHI中获得优势。对于一些不需要adaptive hash index的工作场景,关闭它可以减少一些不必要的性能损耗。因为很难预测这种特性是否适合一个特别的系统,所以要使用现实负载分别对开启和关闭的情况进行基准测试。MySQL5.6及更高版本架构的更改使得更多的工作场景比早先的版本更适合关闭adaptive hash index,但是默认情况它还是开启的。
The hash index is always built based on an existing B-tree index on the table. InnoDB can build a hash index on a prefix of any length of the key defined for the B-tree, depending on the pattern of searches that InnoDB observes for the B-tree index. A hash index can be partial, covering only those pages of the index that are often accessed.
hash index总是构建于一个当前的B-tree index上的。InnoDB能够在B-tree key的任意长度的前缀上构建hash index。hash index可以是局部的,仅覆盖在经常访问的那些索引页上。
You can monitor the use of the adaptive hash index and the contention for its use in the SEMAPHORES section of the output of the SHOW ENGINE INNODB STATUS command. If you see many threads waiting on an RW-latch created in btr0sea.c, then it might be useful to disable adaptive hash indexing.
你可以使用SHOW ENGINE INNODB STATUS命令监控adaptive hash index的使用及争用情况。如果你看到很多的线程等待在btr0sea.c创建的RW-latch,那么关闭adaptive hash indexing就会非常有用。
For more information about the performance characteristics of hash indexes, see Section 8.3.8, “Comparison of B-Tree and Hash Indexes”.
更多hash index性能特性的信息可以查看Section 8.3.8, “Comparison of B-Tree and Hash Indexes”。
14.2.5.7 Physical Row Structure
The physical row structure of an InnoDB table depends on the row format specified when the table is created. By default, InnoDB uses the Antelope file format and its COMPACT row format. The REDUNDANT format is available to retain compatibility with older versions of MySQL. When you enable the innodb_file_per_table setting, you can also make use of the newer Barracuda file format, with its DYNAMIC and COMPRESSED row formats, as explained in Section 14.9, “InnoDB Row Storage and Row Formats” and Section 14.7, “InnoDB Table Compression”.
InnoDB表行的物理结构是基于表创建时选择的行的格式的。默认情况下,InnoDB使用Antelope文件格式和COMPACT的行格式。REDUNDANT格式是为了和MySQL以前的版本兼容。当你开启了innodb_file_per_table,你还可以使用更新的Barracuda文件格式,能够能够使用DYNAMIC和COMPRESSED行格式,如Section 14.9, “InnoDB Row Storage and Row Formats”和Section 14.7, “InnoDB Table Compression”所讲述的。
To check the row format of an InnoDB table, you can use SHOW TABLE STATUS. For example:
你可以使用SHOW TABLE STATUS来检查InnoDB表的当前格式。例如:
mysql> SHOW TABLE STATUS IN test1\G
*************************** 1. row ***************************
Name: t1
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 0
Avg_row_length: 0
Data_length: 16384
Max_data_length: 0
Index_length: 16384
Data_free: 0
Auto_increment: 1
Create_time: 2014-10-31 16:02:01
Update_time: NULL
Check_time: NULL
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment:
You can also check the row format of an InnoDB table by querying INFORMATION_SCHEMA.INNODB_SYS_TABLES.
你还可以通过查询INFORMATION_SCHEMA.INNODB_SYS_TABLES表来检查InnoDB表的行格式。
mysql> SELECT NAME, ROW_FORMAT FROM INFORMATION_SCHEMA.INNODB_SYS_TABLES WHERE NAME='test1/t1';
+----------+------------+
| NAME | ROW_FORMAT |
+----------+------------+
| test1/t1 | Compact |
+----------+------------+
The COMPACT row format decreases row storage space by about 20% at the cost of increasing CPU use for some operations. If your workload is a typical one that is limited by cache hit rates and disk speed, COMPACT format is likely to be faster. If the workload is a rare case that is limited by CPU speed, COMPACT format might be slower.
COMPACT行格式能够减少大约20%的行存储空间,以及增加少部分的CPU损耗。如果你的工作场景是普通的,主要的限制在缓存命中率和磁盘的速度上,那么COMPACT格式可能会更快。如果工作场景比较特殊,主要限制在CPU的速度上,那么COMPACT行格式可能会更慢。
Rows in InnoDB tables that use REDUNDANT row format have the following characteristics:
InnoDB使用REDUNDANT行格式会有以下的特定:
l Each index record contains a 6-byte header. The header is used to link together consecutive records, and also in row-level locking.
l 每个索引记录包含6-byte的记录头。这个记录头用来把连续的记录连接起来,还有行级锁。
l Records in the clustered index contain fields for all user-defined columns. In addition, there is a 6-byte transaction ID field and a 7-byte roll pointer field.
l clustered index的记录包含了所有用户定义的列。另外,还包含了 6-byte的事务ID和7-byte的回滚指针。
l If no primary key was defined for a table, each clustered index record also contains a 6-byte row ID field.
l 如果表上没有定义主键,每个clustered index记录还包含6-byte的行ID。
l Each secondary index record also contains all the primary key fields defined for the clustered index key that are not in the secondary index.
l 每个secondary index记录都包含所有主键列。
l A record contains a pointer to each field of the record. If the total length of the fields in a record is less than 128 bytes, the pointer is one byte; otherwise, two bytes. The array of these pointers is called the record directory. The area where these pointers point is called the data part of the record.
l 每个记录都包含一个指向每个列的指针。如果一条记录所有列的总长度小于128 bytes,这个指针是1byte;否则是2bytes。这些指针的数据称之为记录目录。这些指针指向的区域称之为记录的数据部分。
l Internally, InnoDB stores fixed-length character columns such as CHAR(10) in a fixed-length format. InnoDB does not truncate trailing spaces from VARCHAR columns.
l 在内部,InnoDB存储固定长度的字符列如CHAR(10)。InnoDB不会从VARCHAR列上截断尾部的空间。
l An SQL NULL value reserves one or two bytes in the record directory. Besides that, an SQL NULL value reserves zero bytes in the data part of the record if stored in a variable length column. In a fixed-length column, it reserves the fixed length of the column in the data part of the record. Reserving the fixed space for NULL values enables an update of the column from NULL to a non-NULL value to be done in place without causing fragmentation of the index page.
l NULL值会在记录目录里保留一到两个bytes。除此之外,如果是存储在可变长度的列上NULL在记录的数据部分只会占用0个byte。在定长列上,它会在记录的数据部分占用列的固定长度。为NULL值保留固定的空间能够使得把NULL修改成非NULL的时候不会产生索引页的分裂。
Rows in InnoDB tables that use COMPACT row format have the following characteristics:
InnoDB表使用COMPACT行格式有下面的特点:
l Each index record contains a 5-byte header that may be preceded by a variable-length header. The header is used to link together consecutive records, and also in row-level locking.
l 每个索引记录的都包含5-byte的头,在可变长度头之前。这个头用来把连续的记录连接起啦,也用于行级锁。
l The variable-length part of the record header contains a bit vector for indicating NULL columns. If the number of columns in the index that can be NULL is N, the bit vector occupies CEILING(N/8) bytes. (For example, if there are anywhere from 9 to 15 columns that can be NULL, the bit vector uses two bytes.) Columns that are NULL do not occupy space other than the bit in this vector. The variable-length part of the header also contains the lengths of variable-length columns. Each length takes one or two bytes, depending on the maximum length of the column. If all columns in the index are NOT NULL and have a fixed length, the record header has no variable-length part.
l 记录头部的可变长部分包含了一个bit的向量用来表明NULL列。如果索引里可能为NULL的列的数量是N,bit向量占用的上限是(N/8) bytes。(例如,有9到15个列可能是NULL的,那么bit向量使用2bytes。)NULL的列和占用bit的向量相比不占用空间。头部可变长的部分同样还包含可变长度列的长度值。如果索引里的所有列都是NOT NULL和定长的,记录的头部就没有可变长部分了。
l For each non-NULL variable-length field, the record header contains the length of the column in one or two bytes. Two bytes will only be needed if part of the column is stored externally in overflow pages or the maximum length exceeds 255 bytes and the actual length exceeds 127 bytes. For an externally stored column, the 2-byte length indicates the length of the internally stored part plus the 20-byte pointer to the externally stored part. The internal part is 768 bytes, so the length is 768+20. The 20-byte pointer stores the true length of the column.
l 对于每个non-NULL可变长的列,记录头部包含了一到两个bytes的列的长度。只有列被存储在了外部的溢出页上,或者最大长度超过255 bytes实际长度超过127 bytes的时候,头部才会占用两个bytes。对于外部存储的列,2-byte的长度表明了内部存储部分的长度加上20-byte的指针指向外部存储部分。内部部分是768 bytes,所以总的长度是768+20。20-byte的指针存储了列的真实长度。
l The record header is followed by the data contents of the non-NULL columns.
l 记录头部之后non-NULL列的数据内容。
l Records in the clustered index contain fields for all user-defined columns. In addition, there is a 6-byte transaction ID field and a 7-byte roll pointer field.
l clustered index里面的记录包含了所有用户定义的列。另外,还有6-byte的事务ID和7-byte的回滚指针。
l If no primary key was defined for a table, each clustered index record also contains a 6-byte row ID field.
l 如果表没有定义主键,每个clustered index记录还包括6-byte的行ID。
l Each secondary index record also contains all the primary key fields defined for the clustered index key that are not in the secondary index. If any of these primary key fields are variable length, the record header for each secondary index will have a variable-length part to record their lengths, even if the secondary index is defined on fixed-length columns.
l 每个secondary index记录也包括所有主键定义的列。如果任意一个主键列是可变长的,那么每个secondary index的记录头就会有一个可变长部分用于记录它们的长度,即使这个secondary index是被定义在一个定长列上的。
l Internally, InnoDB stores fixed-length character columns such as CHAR(10) in a fixed-length format. InnoDB does not truncate trailing spaces from VARCHAR columns.
l 在内部,InnoDB在定长格式下像CHAR(10)一样存储定长的字符列。InnoDB不会截断VARCHAR列的尾部空间。
l Internally, InnoDB attempts to store utf8 CHAR(N) and utf8mb4 CHAR(N) columns in N bytes by trimming trailing spaces. If the byte length of a CHAR(N) column value exceeds N bytes, InnoDB trims trailing spaces to a minimum of the column value byte length. The maximum length of a CHAR(N) column is the maximum character byte length × N, as reported by the CHARACTER_OCTET_LENGTH column of the INFORMATION_SCHEMA.COLUMNS table.
l 在内部,InnoDB试图通过裁剪尾部空间来把utf8 CHAR(N) and utf8mb4 CHAR(N)的列存储在N bytes的空间里。如果一个CHAR(N)列值的长度超过N bytes,InnoDB会截断尾部的空间。一个CHAR(N)列的最大长度字符byte长度× N,如果INFORMATION_SCHEMA.COLUMNS表CHARACTER_OCTET_LENGTH列所报告的。
l InnoDB reserves a minimum of N bytes for CHAR(N). Reserving the minimum space N in many cases enables column updates to be done in place without causing fragmentation of the index page.
l InnoDB为CHAR(N)预留最小的N bytes。在很多情况下预留的最小空间N能够列在update的时候在索引页上不会产生碎片。
l By comparison, for ROW_FORMAT=REDUNDANT, utf8 and uft8mb4 columns occupy the maximum character byte length × N. ROW_FORMAT=DYNAMIC and ROW_FORMAT=COMPRESSED handle CHAR storage in the same way as ROW_FORMAT=COMPACT.
l 通过比较,对于ROW_FORMAT=REDUNDANT,utf8 and uft8mb4列占用最大的字符byte的长度× N。ROW_FORMAT=DYNAMIC and ROW_FORMAT=COMPRESSED和以和ROW_FORMAT=COMPACT一样的方式处理CHAR的存储。
DYNAMIC and COMPRESSED row formats are variations of the COMPACT row format. For information about these row formats, see Section 14.9.3, “DYNAMIC and COMPRESSED Row Formats”.
DYNAMIC and COMPRESSED行格式是COMPACT的变种。更多关于行格式的信息可以查看Section 14.9.3, “DYNAMIC and COMPRESSED Row Formats”。
14.2.6 InnoDB Mutex and Read/Write Lock Implementation
In MySQL and InnoDB, multiple threads of execution access shared data structures. InnoDB synchronizes these accesses with its own implementation of mutexes and read/write locks. Historically, InnoDB protected the internal state of a read/write lock with an InnoDB mutex, and the the internal state of an InnoDB mutex was protected by a Pthreads mutex, as in IEEE Std 1003.1c (POSIX.1c).
在MySQL和InnoDB里,可以执行多个线程访问共享的数据结构。InnoDB会使用它自己的mutexes和read/write锁同步这些访问。之前,InnoDB似乎用一个InnoDB mutex来保护read/write锁的内部状态,InnoDB mutex的内部状态则由Pthreads mutex来保护,如同IEEE Std 1003.1c (POSIX.1c)一样。
On many platforms, Atomic operations can often be used to synchronize the actions of multiple threads more efficiently than Pthreads. Each operation to acquire or release a lock can be done in fewer CPU instructions, wasting less time when threads contend for access to shared data structures. This in turn means greater scalability on multi-core platforms.
在很多的平台上,原子操作通常会被用于同步多线程的动作,这样相比Pthreads更有效率。每个操作的获得和锁的释放可以在极少的CPU指令下完成,当多个线程竞争访问共享额数据结构时花费更少的时间。这种调整意味着在多核CPU平台上更具有可扩展性。
On platforms that support Atomic operations, InnoDB now implements mutexes and read/write locks with the built-in functions provided by the GNU Compiler Collection (GCC) for atomic memory access instead of using the Pthreads approach. More specifically, InnoDB compiled with GCC version 4.1.2 or later uses the atomic builtins instead of a pthread_mutex_t to implement InnoDB mutexes and read/write locks.
在支持原子操作的平台上,InnoDB现在通过使用GNU Compiler Collection (GCC)为原子内存访问而提供的内置函数实现了mutexes and read/write locks,而不是使用Pthreads的方法。更具体地说,通过GCC 4.1.2或者更新版本编译的InnoDB使用是哦那个原子的内置命令代替pthread_mutex_t实现了InnoDB的mutexes and read/write locks。
On 32-bit Microsoft Windows, InnoDB implements mutexes (but not read/write locks) with hand-written assembler instructions. Beginning with Microsoft Windows 2000, functions for Interlocked Variable Access are available that are similar to the built-in functions provided by GCC. On Windows 2000 and higher, InnoDB makes use of the Interlocked functions, which support read/write locks and 64-bit platforms.
在32-bit Microsoft Windows上,InnoDB通过hand-written汇编指令实现了mutex(不是read/write locks)。从Microsoft Windows 2000开始,Interlocked Variable Access的函数就可用了,它类似于GCC提供的内置函数。在Windows 2000及更高的版本里,InnoDB会使用Interlocked函数,它支持read/write locks以及64-bit的平台。
Solaris 10 introduced library functions for atomic operations, and InnoDB uses these functions by default. When MySQL is compiled on Solaris 10 or later with a compiler that does not support the built-in functions provided by the GNU Compiler Collection (GCC) for atomic memory access, InnoDB uses the library functions.
Solaris 10引进了原子操作库函数,InnoDB会默认使用这些函数。当MySQL在Solaris 10或者更高的版本使用不支持GNU Compiler Collection (GCC)为原子内存访问而提供内置函数进行编译时,InnoDB就会使用库函数。
On platforms where the GCC, Windows, or Solaris functions for atomic memory access are not available, InnoDB uses the traditional Pthreads method of implementing mutexes and read/write locks.
在有GCC的平台上,Windows, or Solaris为了原子内存方位的函数是不可用的,InnoDB会使用传统的Pthreads方法来实现mutexes and read/write locks。
When MySQL starts, InnoDB writes a message to the log file indicating whether atomic memory access is used for mutexes, for mutexes and read/write locks, or neither. If suitable tools are used to build InnoDB and the target CPU supports the atomic operations required, InnoDB uses the built-in functions for mutexing. If, in addition, the compare-and-swap operation can be used on thread identifiers (pthread_t), then InnoDB uses the instructions for read-write locks as well.
当MySQL启动的时候,InnoDB会写一条消息到日志文件里来表明原子内存访问是否用于mutexes,或者mutexes and read/write locks,又或者都不是。如果使用了合适的工具来构建InnoDB而且目标CPU也支持必须的原子操作,InnoDB就会为mutex使用内置函数。另外,如果比较和交换操作被用在了线程标识(pthread_t)上,InnoDB还会为read-write locks使用这个指令。
If you are building from source, ensure that the build process properly takes advantage of your platform capabilities.
如果你是从源码构建的,要确保在构建处理过冲中使用了当前平台特性的优势。
For more information about the performance implications of locking, see Section 8.11, “Optimizing Locking Operations”.
更多关于锁性能的影响,可以查看Section 8.11, “Optimizing Locking Operations”。
======================================================================下面是重复内容======================================================================
14.2 InnoDB Concepts and Architecture
14.2.1 MySQL and the ACID Model
14.2.2 InnoDB Multi-Versioning
14.2.5 InnoDB Table and Index Structures
14.2.6 InnoDB Mutex and Read/Write Lock Implementation
The information in this section provides background to help you get the most performance and functionality from using InnoDB tables. It is intended for:
这一章节的信息能够帮助你获得大部分的性能以及对InnoDB表使用的功能。它的目的是:
l Anyone switching to MySQL from another database system, to explain what things might seem familiar and which might be all-new.
l 任何想要从其他数据库系统切换到MySQL的,可以用来了解那些熟悉的以及全新的内容。
l Anyone moving from MyISAM tables to InnoDB, now that InnoDB is the default MySQL storage engine.
l 准备从MyISAM转移到InnoDB的,现在InnoDB是MySQL的默认存储引擎了。
l Anyone considering their application architecture or software stack, to understand the design considerations, performance characteristics, and scalability of InnoDB tables at a detailed level.
l 任何需要考虑应用的架构或者软件的堆栈,需要理解InnoDB表细节层面的设计考虑,性能特点,还有可扩展性。
In this section, you will learn:
在这一章节,你将会学习到:
l How InnoDB closely adheres to ACID principles.
l InnoDB是如何紧密遵循ACID原则的。
l How multi-version concurrency control (MVCC) keeps transactions from viewing or modifying each others' data before the appropriate time.
l 多版本并发控制(MVCC)是如何在适当时间之前阻止事务查看或者修改其他事务的数据。
l The physical layout of InnoDB-related objects on disk, such as tables, indexes, tablespaces, undo logs, and the redo log.
l InnoDB在磁盘上的相关物理布局,例如表,索引,表空间,undo日志,redo日志。
14.2.1 MySQL and the ACID Model
The ACID model is a set of database design principles that emphasize aspects of reliability that are important for business data and mission-critical applications. MySQL includes components such as the InnoDB storage engine that adhere closely to the ACID model, so that data is not corrupted and results are not distorted by exceptional conditions such as software crashes and hardware malfunctions. When you rely on ACID-compliant features, you do not need to reinvent the wheel of consistency checking and crash recovery mechanisms. In cases where you have additional software safeguards, ultra-reliable hardware, or an application that can tolerate a small amount of data loss or inconsistency, you can adjust MySQL settings to trade some of the ACID reliability for greater performance or throughput.
ACID模式是一组数据库设计的原则,用来强调业务数据和核心应用的重要性。MySQL包含的部件例如InnoDB存储引擎是紧密遵循ACID模式的,所以数据不会因为软件崩溃,硬件故障而造成数据的损坏和扭曲。当你要依赖于ACID的特性,你不需要重新开发一致性的部件来检查崩溃和恢复的机制。当你的软件有额外的保护,高可靠性的硬件,或者应用能够忍受少量的数据丢失或者数据不一致的时候,你可以通过调整MySQL的设定,用一些ACID的可靠性交换更高的性能或者吞吐量。
The following sections discuss how MySQL features, in particular the InnoDB storage engine, interact with the categories of the ACID model:
下面的章节讲述了InnoDB存储引擎的详细特性,ACID的相互影响:
l A: atomicity.
l C: consistency.
l I:: isolation.
l D: durability.
Atomicity
The atomicity aspect of the ACID model mainly involves InnoDB transactions. Related MySQL features include:
ACID的原子性主要涉及在InnoDB的事务上。MySQL的相关特性包括:
l Autocommit setting.
l COMMIT statement.
l ROLLBACK statement.
l Operational data from the INFORMATION_SCHEMA tables.
Consistency
The consistency aspect of the ACID model mainly involves internal InnoDB processing to protect data from crashes. Related MySQL features include:
ACID的一致性主要体现在InnoDB的内部处理来防止数据崩溃。相关的MySQL特性包括:
l InnoDB doublewrite buffer.
l InnoDB crash recovery.
Isolation
The isolation aspect of the ACID model mainly involves InnoDB transactions, in particular the isolation level that applies to each transaction. Related MySQL features include:
ACID的隔离性主要涉及InnoDB的事务,特别是每个事务的隔离级别。相关的MySQL特性包括:
l Autocommit setting.
l SET ISOLATION LEVEL statement.
l The low-level details of InnoDB locking. During performance tuning, you see these details through INFORMATION_SCHEMA tables.
l Innodb锁的底层细节。在性能调优的过程中,你可以通过INFORMATION_SCHEMA表来查看这些细节。
Durability
The durability aspect of the ACID model involves MySQL software features interacting with your particular hardware configuration. Because of the many possibilities depending on the capabilities of your CPU, network, and storage devices, this aspect is the most complicated to provide concrete guidelines for. (And those guidelines might take the form of buy “new hardware”.) Related MySQL features include:
ACID的持久化主要涉及在MySQL软件的特性,特别是硬件配置的影响。因为很多的发展潜力都依赖于CPU,网络,还有存储磁盘的性能,所以这部分提供的指导方针是最为复杂的。(这些指导方针也有可能包括购买新的硬件。)相关的MySQL特性包括:
l InnoDB doublewrite buffer, turned on and off by the innodb_doublewrite configuration option.
l InnoDB doublewrite buffer,通过innodb_doublewrite配置参数打开或关闭。
l Configuration option innodb_flush_log_at_trx_commit.
l Configuration option sync_binlog.
l Configuration option innodb_file_per_table.
l Write buffer in a storage device, such as a disk drive, SSD, or RAID array.
l Battery-backed cache in a storage device.
l 存储磁盘上有依靠电池的缓存。
l The operating system used to run MySQL, in particular its support for the fsync() system call.
l 在运行MySQL的操作系统上,要支持fsync()。
l Uninterruptible power supply (UPS) protecting the electrical power to all computer servers and storage devices that run MySQL servers and store MySQL data.
l 用不间断电源(UPS)来保护所有计算机和存储设备的电能,使得能够使MySQL实例正常运行并存储MySQL数据。
l Your backup strategy, such as frequency and types of backups, and backup retention periods.
l 备份策略,例如备份的频率和类型,还有备份的保留周期。
l For distributed or hosted data applications, the particular characteristics of the data centers where the hardware for the MySQL servers is located, and network connections between the data centers.
l 对于分布式的应用数据,还要关注MySQL实例所在的主机硬件和数据中心之间的网络连接。
14.2.2 InnoDB Multi-Versioning
InnoDB is a multi-versioned storage engine: it keeps information about old versions of changed rows, to support transactional features such as concurrency and rollback. This information is stored in the tablespace in a data structure called a rollback segment (after an analogous data structure in Oracle). InnoDB uses the information in the rollback segment to perform the undo operations needed in a transaction rollback. It also uses the information to build earlier versions of a row for a consistent read.
InnoDB是一个多版本的存储引擎:它会保存修改行的旧版本的信息,用来支持支持事务并发和回滚。这些信息被存储在回滚段的表空间里(和Oracle里相似的数据结构)。InnoDB使用回滚段里的信息在需要回滚的事务里来执行undo的操作。它也会使用这些信息来构建行的早期版本用于一致性读。
Internally, InnoDB adds three fields to each row stored in the database. A 6-byte DB_TRX_ID field indicates the transaction identifier for the last transaction that inserted or updated the row. Also, a deletion is treated internally as an update where a special bit in the row is set to mark it as deleted. Each row also contains a 7-byte DB_ROLL_PTR field called the roll pointer. The roll pointer points to an undo log record written to the rollback segment. If the row was updated, the undo log record contains the information necessary to rebuild the content of the row before it was updated. A 6-byte DB_ROW_ID field contains a row ID that increases monotonically as new rows are inserted. If InnoDB generates a clustered index automatically, the index contains row ID values. Otherwise, the DB_ROW_ID column does not appear in any index.
InnoDB会为每一行在内部添加三列的数据。6-byte DB_TRX_ID表明了最后的insert或者update事务的事务标识。同样,delete操作在内部是被当作update的,只是会用一个特殊的bit用来标识已经被删除了。每个行也包含一个7-byte DB_ROLL_PTR列被称为滚动指针(roll pointer)。滚动指针会指向写在回滚段上一行undo日志记录。如果数据行被修改了,undo日志记录包含了必要的信息用来重建行修改之前的内容。6-byte DB_ROW_ID列包含了一个行的ID,在有新数据被插入的时候它会单调递增。如果InnoDB自动产生了一个聚合索引,这个索引就会包含这个行ID的值。否则,DB_ROW_ID就不会出现在任何的索引里。
Undo logs in the rollback segment are divided into insert and update undo logs. Insert undo logs are needed only in transaction rollback and can be discarded as soon as the transaction commits. Update undo logs are used also in consistent reads, but they can be discarded only after there is no transaction present for which InnoDB has assigned a snapshot that in a consistent read could need the information in the update undo log to build an earlier version of a database row.
回滚段里的undo日志会把insert和update的内容分割开来。insert undo日志只有在事务回滚的时候是需要的,只要事务一提交就会被丢弃。update undo日志还会被用于一致性读,只有在InnoDB不会再分配一个更早期版本的快照用于一致性读的时候才会被丢弃。
Commit your transactions regularly, including those transactions that issue only consistent reads. Otherwise, InnoDB cannot discard data from the update undo logs, and the rollback segment may grow too big, filling up your tablespace.
定期提交事务,包括那些只执行一致性读的事务。否则,InnoDB不会释放undo日志里面的数据,回滚段会增长得非常大,甚至于会撑满你的表空间。
The physical size of an undo log record in the rollback segment is typically smaller than the corresponding inserted or updated row. You can use this information to calculate the space needed for your rollback segment.
回滚段里undo日志记录的物理大小通常情况下会小于其相关的insert或者update的行记录。你可以使用这个信息来评估回滚段需要的空间。
In the InnoDB multi-versioning scheme, a row is not physically removed from the database immediately when you delete it with an SQL statement. InnoDB only physically removes the corresponding row and its index records when it discards the update undo log record written for the deletion. This removal operation is called a purge, and it is quite fast, usually taking the same order of time as the SQL statement that did the deletion.
在InnoDB的多版本方案里,当执行一个delete语句的时候行数据并不会立即从数据库里物理移除。只有在InnoDB丢弃与delete相关的update undo日志记录的时候,相关的行和索引记录才会被物理移除。这个移除操作被称之为purge,它的速度非常快,通常会以delete语句操作的同样的时间顺序执行。
If you insert and delete rows in smallish batches at about the same rate in the table, the purge thread can start to lag behind and the table can grow bigger and bigger because of all the “dead” rows, making everything disk-bound and very slow. In such a case, throttle new row operations, and allocate more resources to the purge thread by tuning the innodb_max_purge_lag system variable. See Section 14.12, “InnoDB Startup Options and System Variables” for more information.
如果你已相同的速度小批量在表insert或者delete行时,purge线程就会开始落后,因为这些“被删除的行”表会变得越来越大,使得所有的东西都被磁盘锁限制,速度也会非常慢。在这种情况下,要减慢新行插入的速度,通过innodb_max_purge_lag系统参数为purge线程分配更多的资源。更多信息可见Section 14.12, “InnoDB Startup Options and System Variables”。
Multi-Versioning and Secondary Indexes
InnoDB multiversion concurrency control (MVCC) treats secondary indexes differently than clustered indexes. Records in a clustered index are updated in-place, and their hidden system columns point undo log entries from which earlier versions of records can be reconstructed. Unlike clustered index records, secondary index records do not contain hidden system columns nor are they updated in-place.
InnoDB多版本并发控制(MVCC)在处理辅助索引的时候会和聚合索引不一样。聚合索引的记录会就地更新,它们隐藏的系统列会指向undo日志记录,更早期版本的undo日志记录会被重新构建。不同于聚合索引记录,辅助索引记录既不包含隐藏的系统列,也不会就地进行更新。
When a secondary index column is updated, old secondary index records are delete-marked, new records are inserted, and delete-marked records are eventually purged. When a secondary index record is delete-marked or the secondary index page is updated by a newer transaction, InnoDB looks up the database record in the clustered index. In the clustered index, the record's DB_TRX_ID is checked, and the correct version of the record is retrieved from the undo log if the record was modified after the reading transaction was initiated.
当一个辅助索引被更新了,旧的辅助索引记录会被做个删除标记,插入新的记录,有删除标记的记录最后才会被删除。当辅助索引记录被做了删除标记,或者辅助索引页被更新的事务更新了,InnoDB会在聚合索引里查找数据库记录。在聚合索引里,记录的DB_TRX_ID会被检查,如果在读事务开始之后记录被修改了,就会从undo日志里去检索正确的版本的记录。
If a secondary index record is marked for deletion or the secondary index page is updated by a newer transaction, the covering index technique is not used. Instead of returning values from the index structure, InnoDB looks up the record in the clustered index.
如果辅助索引记录被作为删除标记或者辅助索引页被更新的事务更新了,索引覆盖技术则不会被使用。与从索引结构里返回值相反,InnoDB会从聚合索引了查找记录。
However, if the index condition pushdown (ICP) optimization is enabled, and parts of the WHERE condition can be evaluated using only fields from the index, the MySQL server still pushes this part of the WHERE condition down to the storage engine where it is evaluated using the index. If no matching records are found, the clustered index lookup is avoided. If matching records are found, even among delete-marked records, InnoDB looks up the record in the clustered index.
然而,如果index condition pushdown (ICP)优化被开启了,WHERE条件的部分使用索引中的列就能进行评估,MySQL实例仍然会吓退这部分的WHERE条件到使用索引评估的存储引擎。如果没有匹配的记录被找到,聚合索引查找就会被避免。如果找到了匹配的记录,甚至是在删除标记的记录里,InnoDB就会从聚合索引里查找记录。
14.2.3 InnoDB Redo Log
14.2.3.1 Group Commit for Redo Log Flushing
The redo log is a disk-based data structure used during crash recovery to correct data written by incomplete transactions. During normal operations, the redo log encodes requests to change InnoDB table data, which result from SQL statements or low-level API calls. Modifications that did not finish updating the data files before an unexpected shutdown are replayed automatically during initialization, and before the connections are accepted. For information about the role of the redo log in crash recovery, see Section 14.16.1, “The InnoDB Recovery Process”.
redo日志是一个基于磁盘的数据结构,用于崩溃恢复那些正确的数据。在正常的操作期间,redo日志编码请求通过SQL语句或者底层的API修改InnoDB表的数据。那些意外关机之前未能完成对数据文件的修改操作在初始化阶段会被自动重放,之前的连接也能够继续被接受。更多redo日志在崩溃恢复时的规则信息建Section 14.16.1, “The InnoDB Recovery Process”。
By default, the redo log is physically represented on disk as a set of files, named ib_logfile0 and ib_logfile1. MySQL writes to the redo log files in a circular fashion. Data in the redo log is encoded in terms of records affected; this data is collectively referred to as redo. The passage of data through the redo log is represented by an ever-increasing LSN value.
默认情况下,redo日志以一组文件在物理磁盘上,名字是ib_logfile0和ib_logfile1。MySQL以循环的方式吸入redo日志文件。Data in the redo log is encoded in terms of records affected; this data is collectively referred to as redo. 通过redo日志的数据表现为永远递增的LSN值。
Disk layout for the redo log is configured using the following options:
redo日志磁盘布局的配置可以使用下面那的参数:
l innodb_log_file_size: Defines the size of each redo log file in bytes. By default, redo log files are 50331648 bytes (48MB) in size. The combined size of log files (innodb_log_file_size * innodb_log_files_in_group) cannot exceed a maximum value that is slightly less than 512GB.
l innodb_log_file_size:定义每个redo日志文件的大小(bytes)。默认情况下,redo日志文件的大小是50331648 bytes (48MB)。整个日志文件的大小 (innodb_log_file_size * innodb_log_files_in_group)不能超过512GB。
l innodb_log_files_in_group: The number of log files in the log group. The default is to create two files named ib_logfile0 and ib_logfile1.
l innodb_log_files_in_group:分组日志的数量。默认的是创建两个文件分别是ib_logfile0 and ib_logfile1。
l innodb_log_group_home_dir: The directory path to the InnoDB log files. If you do not specify a value, the log files are created in the MySQL data directory (datadir).
l innodb_log_group_home_dir:InnoDB日志文件的路径。如果你没有指定值,日志文件会被创建到MySQL的数据目录里 (datadir)。
To change your initial redo log configuration, refer to Section 14.5.2, “Changing the Number or Size of InnoDB Redo Log Files”. For information about optimizing redo logging, see Section 8.5.4, “Optimizing InnoDB Redo Logging”.
为了修改你初始的日志配置,可以查阅Section 14.5.2, “Changing the Number or Size of InnoDB Redo Log Files”。关于redo日志优化的信息,可以查看Section 8.5.4, “Optimizing InnoDB Redo Logging”。
14.2.3.1 Group Commit for Redo Log Flushing
InnoDB, like any other ACID-compliant database engine, flushes the redo log of a transaction before it is committed. InnoDB uses group commit functionality to group multiple such flush requests together to avoid one flush for each commit. With group commit, InnoDB issues a single write to the log file to perform the commit action for multiple user transactions that commit at about the same time, significantly improving throughput.
InnoDB,和其他符合ACID的数据库引擎一样,会在事务提交之前刷新它的redo日志。InnoDB会使用群组提交功能,一起提高刷新请求,这样就避免了每个commit都要进行一次刷新。通过群组提交,InnoDB用单个写操作来为多个用户事务在同一时间点执行提交操作,这样能够显著提高吞吐量。
For more information about performance of COMMIT and other transactional operations, see Section 8.5.2, “Optimizing InnoDB Transaction Management”.
更多关于COMMIT和其他事务操作的性能信息,可见Section 8.5.2, “Optimizing InnoDB Transaction Management”。
14.2.4 InnoDB Undo Logs
An undo log (or rollback segment) is a storage area that holds copies of data modified by active transactions. If another transaction needs to see the original data (as part of a consistent read operation), the unmodified data is retrieved from this storage area. By default, this area is physically part of the system tablespace. However, as of MySQL 5.6.3, undo logs can reside in separate undo tablespaces. For more information, see Section 14.5.7, “Storing InnoDB Undo Logs in Separate Tablespaces”. For more information about undo logs and multi-versioning, see Section 14.2.2, “InnoDB Multi-Versioning”.
undo日志(或者是回滚段)是一块存储区域,用来存放活动事务的修改的数据拷贝。如果其他的事务需要查看原始的数据(作为一致性读操作的一部分),就会从这个存储区域去检索修改之前的数据。默认情况下,这一区域物理存在在系统表空间里。但是,从MySQL5.6.3开始,undo日志也能放到单独的undo表空间里。更多的信息查看Section 14.5.7, “Storing InnoDB Undo Logs in Separate Tablespaces”。更多关于undo日志和多版本的信息可以查看Section 14.2.2, “InnoDB Multi-Versioning”。
InnoDB supports 128 undo logs, each supporting up to 1023 concurrent data-modifying transactions, for a total limit of approximately 128K concurrent data-modifying transactions (read-only transactions do not count against the maximum limit). Each transaction is assigned to one of the undo logs, and remains tied to that undo log for the duration. The innodb_undo_logs option defines how many undo logs are used by InnoDB.
InnoDB支持128个undo日志,每个日志支持到1023个并发数据修改事务,总的限制大概是128K个并发数据修改事务(read-only事务不包括在限制之内)。每个事务都会分配到一个undo日志,and remains tied to that undo log for the duration。innodb_undo_logs参数定义了InnoDB能够用多少个undo日志。
14.2.5 InnoDB Table and Index Structures
14.2.5.1 Role of the .frm File for InnoDB Tables
14.2.5.2 Clustered and Secondary Indexes
14.2.5.3 InnoDB FULLTEXT Indexes
14.2.5.4 Physical Structure of an InnoDB Index
14.2.5.6 Adaptive Hash Indexes
14.2.5.7 Physical Row Structure
This section describes how InnoDB tables, indexes, and their associated metadata is represented at the physical level. This information is primarily useful for performance tuning and troubleshooting.
这一章节描述了InnoDB表,索引,以及它们对应的元数据是如何在物理层面表现的。这部分的信息对性能调优和故障排除是极为有用的。
14.2.5.1 Role of the .frm File for InnoDB Tables
MySQL stores its data dictionary information for tables in .frm files in database directories. Unlike other MySQL storage engines, InnoDB also encodes information about the table in its own internal data dictionary inside the tablespace. When MySQL drops a table or a database, it deletes one or more .frm files as well as the corresponding entries inside the InnoDB data dictionary. You cannot move InnoDB tables between databases simply by moving the .frm files.
MySQL存储它数据目录的信息在数据库目录的.frm文件里。不像MySQL其他的存储引擎,Innodb还会把表相关的编码信息放它自己的数据目录的表空间里。当MySQL删除一个表或者数据库的时候,它还会在InnoDB数据目录的.frm文件里面删除一个或多个以之对应的条目。你不能简单地通过移动.frm文件在不同的数据库里移动InnoDB表。
14.2.5.2 Clustered and Secondary Indexes
Every InnoDB table has a special index called the clustered index where the data for the rows is stored. Typically, the clustered index is synonymous with the primary key. To get the best performance from queries, inserts, and other database operations, you must understand how InnoDB uses the clustered index to optimize the most common lookup and DML operations for each table.
每一个InnoDB表都有一个特别的索引叫做聚合索引(clustered index),用来存储行的数据。通常情况下,聚合索引等同于主键。想要得到查询,insert以及其他数据库操作的最佳性能,你必须要理解InnoDB如何使用聚合索引来优化每个表的大部分的普通查询和DML操作。
l When you define a PRIMARY KEY on your table, InnoDB uses it as the clustered index. Define a primary key for each table that you create. If there is no logical unique and non-null column or set of columns, add a new auto-increment column, whose values are filled in automatically.
l 当你为表 定义了一个主键,InnoDB会以clustered index的方式使用它。为你创建的每表创建一个主键。如果没有逻辑唯一,非空的一个或多个列,可以添加一个自动增长的列,让它的值可以自动增加。
l If you do not define a PRIMARY KEY for your table, MySQL locates the first UNIQUE index where all the key columns are NOT NULL and InnoDB uses it as the clustered index.
l 如果你的表没有定义主键,MySQL定位到第一个非空的唯一索引,并会以clustered index的方式使用它。
l If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally generates a hidden clustered index on a synthetic column containing row ID values. The rows are ordered by the ID that InnoDB assigns to the rows in such a table. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order.
l 如果表没有主键或者合适的唯一索引,InnoDB会在内部生成一个隐藏的clustered index,这个索引会建立在一个包含行ID的合成列上。行的数据也会按照ID的顺排列。这个行ID是一个6-byte的字段,在新行插入的时候会进行单调递增。因此,以行ID排序的行在物理上是以插入的顺序排列的。
How the Clustered Index Speeds Up Queries
Accessing a row through the clustered index is fast because the index search leads directly to the page with all the row data. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record. (For example, MyISAM uses one file for data rows and another for index records.)
通过clustered index访问行的速度是非常快的,因为索引搜索会直接引向所有行数据的物理块。如果表是很大的,the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record. (例如,MyISAM是一个文件存储行数据,另一个文件用于索引记录。)
How Secondary Indexes Relate to the Clustered Index
All indexes other than the clustered index are known as secondary indexes. In InnoDB, each record in a secondary index contains the primary key columns for the row, as well as the columns specified for the secondary index. InnoDB uses this primary key value to search for the row in the clustered index.
除了clustered index,其他的索引都被称之为辅助索引(secondary index)。在InnoDB里,每个secondary index的记录都包含了主键列的值,同样也要为secondary index指定一个列。InnoDB使用这个主键值在clustered index里面查找行记录。
If the primary key is long, the secondary indexes use more space, so it is advantageous to have a short primary key.
如果主键列很长,那么secondary index就会使用更多的空间,所以使用较短的主键是比较有利的。
For coding guidelines to take advantage of InnoDB clustered and secondary indexes, see Section 8.3.2, “Using Primary Keys” Section 8.3, “Optimization and Indexes” Section 8.5, “Optimizing for InnoDB Tables” Section 8.3.2, “Using Primary Keys”.
对于InnoDB的clustered和secondary index的编码指南,可以查看Section 8.3.2, “Using Primary Keys” Section 8.3, “Optimization and Indexes” Section 8.5, “Optimizing for InnoDB Tables” Section 8.3.2, “Using Primary Keys”。
14.2.5.3 InnoDB FULLTEXT Indexes
FULLTEXT indexes are created on text-based columns (CHAR, VARCHAR, or TEXT columns) to help speed up queries and DML operations on data contained within those columns, omitting any words that are defined as stopwords.
FULLTEXT index创建在基于文本的列上(CHAR, VARCHAR, or TEXT columns)来加快查询和DML操作的速度,省略的words会被定义成stopwords。
A FULLTEXT index can be defined as part of a CREATE TABLE statement, or added later using ALTER TABLE or CREATE INDEX.
FULLTEXT index在CREATE TABLE语句的时候定义,或者是之后通过ALTER TABLE or CREATE INDEX定义。
Full-text searching is performed using MATCH() ... AGAINST syntax. For usage information, see Section 12.9, “Full-Text Search Functions”.
通过MATCH() ... AGAINST可以执行全文搜索。详细的使用信息可以查看Section 12.9, “Full-Text Search Functions”。
Full-Text Index Design
InnoDB FULLTEXT indexes have an inverted index design. Inverted indexes store a list of words, and for each word, a list of documents that the word appears in. To support proximity search, position information for each word is also stored, as a byte offset.
InnoDB FULLTEXT index是一个反向的索引设计。反向索引存储了一个words的列表,对于每个word,会呈现在一个documents的列表里。为了支持邻近搜索,每个word的位置信息也会被存储起来,作为一个字节偏移量。
Full-text Index Tables
For each InnoDB FULLTEXT index, a set of index tables is created, as shown in the following example:
对于每个InnoDB FULLTEXT index,会创建一个索引表的集合,如下面显示的例子:
CREATE TABLE opening_lines (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
opening_line TEXT(500),
author VARCHAR(200),
title VARCHAR(200),
FULLTEXT idx (opening_line)
) ENGINE=InnoDB;
mysql> SELECT table_id, name, space from INFORMATION_SCHEMA.INNODB_SYS_TABLES
WHERE name LIKE 'test/%';
+----------+----------------------------------------------------+-------+
| table_id | name | space |
+----------+----------------------------------------------------+-------+
| 333 | test/FTS_0000000000000147_00000000000001c9_INDEX_1 | 289 |
| 334 | test/FTS_0000000000000147_00000000000001c9_INDEX_2 | 290 |
| 335 | test/FTS_0000000000000147_00000000000001c9_INDEX_3 | 291 |
| 336 | test/FTS_0000000000000147_00000000000001c9_INDEX_4 | 292 |
| 337 | test/FTS_0000000000000147_00000000000001c9_INDEX_5 | 293 |
| 338 | test/FTS_0000000000000147_00000000000001c9_INDEX_6 | 294 |
| 330 | test/FTS_0000000000000147_BEING_DELETED | 286 |
| 331 | test/FTS_0000000000000147_BEING_DELETED_CACHE | 287 |
| 332 | test/FTS_0000000000000147_CONFIG | 288 |
| 328 | test/FTS_0000000000000147_DELETED | 284 |
| 329 | test/FTS_0000000000000147_DELETED_CACHE | 285 |
| 327 | test/opening_lines | 283 |
+----------+----------------------------------------------------+-------+
The first six tables represent the inverted index and are referred to as auxiliary index tables. When incoming documents are tokenized, the individual words (also referred to as “tokens”) are inserted into the index tables along with position information and the associated Document ID (DOC_ID). The words are fully sorted and partitioned among the six index tables based on the charactre set sort weight of the word's first character.
前六个表代表反向索引,被称为辅助索引表。当传入的文档被分词了,个体word(也被称为“tokens”)会沿着位置信息和被分配的Document ID (DOC_ID)插入到索引表里。words会基于首字符的字符集排序权重,被分割并完全存储在这六个表里。
The inverted index is partitioned into six auxiliary index tables to support parallel index creation. By default, two threads tokenize, sort, and insert words and associated data into the index tables. The number of threads is configurable using the innodb_ft_sort_pll_degree option. When creating FULLTEXT indexes on large tables, consider increasing the number of threads.
反向索引被分割到六个辅助的索引表里来支持并行的索引创建。默认情况下,会有两个线程来分词,排序,以及插入数据,把关联的数据放入到索引表里。这个线程的数量可以通过innodb_ft_sort_pll_degree配置。当要在一个很大的表上创建FULLTEXT index时候,可以考虑增加线程的数量。
Auxiliary index table names are prefixed with FTS_ and postfixed with INDEX_*. Each index table is associated with the indexed table by a hex value in the index table name that matches the table_id of the indexed table. For example, the table_id of the test/opening_lines table is 327, for which the hex value is 0x147. As shown in the preceding example, the “147” hex value appears in the names of index tables that are associated with the test/opening_lines table.
辅助索引表的名字是以FTS_作为前缀,INDEX_*作为后缀。每个索引的表的名字是一个十六进制的值,并且和索引表的table_id匹配。例如,test/opening_lines表的table_id是327,它的十六进制值是0x147。如之前例子里所显示的,“147”的十六进制值呈现在索引表的名字里,这也是分配给test/opening_lines表的table_id。
A hex value representing the index_id of the FULLTEXT index also appears in auxiliary index table names. For example, in the auxiliary table name test/FTS_0000000000000147_00000000000001c9_INDEX_1, the hex value 1c9 has a decimal value of 457. The index defined on the opening_lines table (idx) can be identified by querying the INFORMATION_SCHEMA.INNODB_SYS_INDEXES table for this value (457).
代表FULLTEXT索引index_id的十六进制值也会呈现在辅助索引表的名字里。例如,辅助表的名字test/FTS_0000000000000147_00000000000001c9_INDEX_1,十六进制值1c9的十进制值是457.opening_lines table (idx)上的索引定义可以通过查询INFORMATION_SCHEMA.INNODB_SYS_INDEXES表得知是(457)。
mysql> SELECT index_id, name, table_id, space from INFORMATION_SCHEMA.INNODB_SYS_INDEXES
WHERE index_id=457;
+----------+------+----------+-------+
| index_id | name | table_id | space |
+----------+------+----------+-------+
| 457 | idx | 327 | 283 |
+----------+------+----------+-------+
Index tables are stored in their own tablespace when innodb_file_per_table is enabled. If innodb_file_per_table is disabled, index tables are stored in the InnoDB system tablespace (space 0).
如果innodb_file_per_table开启的话,索引表会存储在它们自身的表空间里。如果innodb_file_per_table是关闭的,索引表会存储在InnoDB系统表空间里 (space 0)。
Note
Due to a bug introduced in MySQL 5.6.5, index tables are created in the InnoDB system tablespace (space 0) when innodb_file_per_table is enabled. The bug is fixed in MySQL 5.6.20 and MySQL 5.7.5 (Bug#18635485).
由于在MySQL5.6.5引进的一个bug,当innodb_file_per_table开启的时候索引表会创建在InnoDB系统表空间里(space 0)。这个bug已经在MySQL 5.6.20 and MySQL 5.7.5 (Bug#18635485)修复。
The other index tables shown in the preceding example are used for deletion handling and for storing the internal state of the FULLTEXT index.
之前例子里显示的其他索引表被用于删除处理,以及存储FULLTEXT索引的内部状态。
l FTS_*_DELETED and FTS_*_DELETED_CACHE: Contain the document IDs (DOC_ID) for documents that are deleted but whose data is not yet removed from the full-text index. The FTS_*_DELETED_CACHE is the in-memory version of the FTS_*_DELETED table.
l FTS_*_DELETED and FTS_*_DELETED_CACHE: 包含了已经删除的但是full-text index里的数据还未移除的文档的document IDs (DOC_ID)。FTS_*_DELETED_CACHE是FTS_*_DELETED表的内存里的版本。
l FTS_*_BEING_DELETED and FTS_*_BEING_DELETED_CACHE: Contain the document IDs (DOC_ID) for documents that are deleted and whose data is currently in the process of being removed from the full-text index. The FTS_*_BEING_DELETED_CACHE table is the in-memory version of the FTS_*_BEING_DELETED table.
l FTS_*_BEING_DELETED and FTS_*_BEING_DELETED_CACHE: 包含了那些已经删除的文档,而且它在full-text index里的数据目前正在开始移除的文档的document IDs (DOC_ID)。FTS_*_BEING_DELETED_CACHE表是FTS_*_BEING_DELETED表的在内存里的版本。
l FTS_*_CONFIG: Stores information about the internal state of the FULLTEXT index. Most importantly, it stores the FTS_SYNCED_DOC_ID, which identifies documents that have been parsed and flushed to disk. In case of crash recovery, FTS_SYNCED_DOC_ID values are used to identify documents that have not been flushed to disk so that the documents can be re-parsed and added back to the FULLTEXT index cache. To view the data in this table, query the INFORMATION_SCHEMA.INNODB_FT_CONFIG table.
l FTS_*_CONFIG: 存储了FULLTEXT index内部状态的信息。最为重要的是,它存储了FTS_SYNCED_DOC_ID(确认那些已经被解析了,被刷新到磁盘上的文档)。在崩溃恢复的情况下,FTS_SYNCED_DOC_ID的值用来鉴定那些还未刷新到磁盘上的文档,那么这些文档就可以重新解析并添加回FULLTEXT index cache。要查询这个表的数据,可以查询INFORMATION_SCHEMA.INNODB_FT_CONFIG表。
Full-Text Index Cache
When a document is inserted, it is tokenized, and the individual words and associated data are inserted into the FULLTEXT index. This process, even for small documents, could result in numerous small insertions into the auxiliary index tables, making concurrent access to these tables a point of contention. To avoid this problem, InnoDB uses a FULLTEXT index cache to temporarily cache index table insertions for recently inserted rows. This in-memory cache structure holds insertions until the cache is full and then batch flushes them to disk (to the auxiliary index tables). You can query the INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE table to view tokenized data for recently inserted rows.
当文本被插入的时候,它会被分词,单个的word和相关的数据会被插入到FULLTEXT索引里。这个处理过程,即使针对的是小的文档,也会产生很多小的插入到辅助索引表的操作,在一个连接点上产生很多最这些表的并发访问。为了避免这样的问题,InnoDB使用FULLTEXT index cache来临时缓存最近插入到索引表的行数据。这个在内存中的缓存结构会保存插入的数据,直到缓存慢了,然后就会把它们批量刷新到磁盘上(到辅助索引表里)。你可以通过查询INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE表来查看最近插入行的分词数据。
The caching and batch flushing behavior avoids frequent updates to auxiliary index tables, which could result in concurrent access issues during busy insert and update times. The batching technique also avoids multiple insertions for the same word, and minimizes duplicate entries. Instead of flushing each word individually, insertions for the same word are merged and flushed to disk as a single entry, improving insertion efficiency while keeping auxiliary index tables as small as possible.
缓存和批量刷新操作避免了频繁更新辅助索引表,避免了在繁忙的insert和update时间上的并发访问题。批处理技术还能够避免插入多个相同的word,最大限度地减少重复条目。相对与刷新每个单独的word,同一个word的插入工作会被合并,然后把单个的条目刷新到磁盘上,这样就提升了插入操作的效率,也使得了辅助索引表尽可能地小。
The innodb_ft_cache_size variable is used to configure the full-text index cache size (on a per-table basis), which affects how often the full-text index cache is flushed. You can also define a global full-text index cache size limit for all tables in a given instance using the innodb_ft_total_cache_size option.
innodb_ft_cache_size变量用于配置full-text index cache的大小(基于每个表),决定了full-text index cache多久刷新一次。你可以使用innodb_ft_total_cache_size参数定义一个全局的针对所有表的full-text index cache大小的限制。
The full-text index cache stores the same information as auxiliary index tables. However, the full-text index cache only caches tokenized data for recently inserted rows. The data that is already flushed to disk (to the full-text auxiliary tables) is not brought back into the full-text index cache when queried. The data in auxiliary index tables is queried directly, and results from the auxiliary index tables are merged with results from the full-text index cache before being returned.
full-text index cache存储了和辅助索引表相同的信息。但是,full-text index cache只缓存了最近插入行的分词数据。已经刷新到磁盘(到full-text辅助表)的数据不会返回到 full-text index cache里。辅助索引表的数据是可以直接查询的,辅助索引表的结果把full-text index cache的结果进行合并的。
InnoDB Full-Text Document ID and FTS_DOC_ID Column
InnoDB uses a unique document identifier referred to as a Document ID (DOC_ID) to map words in the full-text index to document records where the word appears. The mapping requires an FTS_DOC_ID column on the indexed table. If an FTS_DOC_ID column is not defined, InnoDB automatically adds a hidden FTS_DOC_ID column when the full-text index is created. The following example demonstrates this behavior.
InnoDB使用唯一的文档标识作为Document ID (DOC_ID),用来把full-text index里面的word映射到文档记录里。映射会要求在索引表上有一个FTS_DOC_ID列。如果FTS_DOC_ID没有定义,InnoDB会在full-text index创建的时候自动创建一个隐藏的FTS_DOC_ID列。下面的例子显示了这种行为。
The following table definition does not include an FTS_DOC_ID column:
下面表没有定义FTS_DOC_ID列:
CREATE TABLE opening_lines (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
opening_line TEXT(500),
author VARCHAR(200),
title VARCHAR(200)
) ENGINE=InnoDB;
When you create a full-text index on the table using CREATE FULLTEXT INDEX syntax, a warning is returned which reports that InnoDB is rebuilding the table to add the FTS_DOC_ID column.
当你使用CREATE FULLTEXT INDEX在表上创建一个full-text index的时候,会返回一个警告表示InnoDB在表上添加了一个FTS_DOC_ID列。
mysql> CREATE FULLTEXT INDEX idx ON opening_lines(opening_line);
Query OK, 0 rows affected, 1 warning (0.19 sec)
Records: 0 Duplicates: 0 Warnings: 1
mysql> SHOW WARNINGS;
+---------+------+--------------------------------------------------+
| Level | Code | Message |
+---------+------+--------------------------------------------------+
| Warning | 124 | InnoDB rebuilding table to add column FTS_DOC_ID |
+---------+------+--------------------------------------------------+
The same warning is returned when using ALTER TABLE to add a full-text index to a table that does not have an FTS_DOC_ID column. If you create a full-text index at CREATE TABLE time and do not specify an FTS_DOC_ID column, InnoDB adds a hidden FTS_DOC_ID column, without warning.
如果没有FTS_DOC_ID列的话在ALTER TABLE添加full-text index的时候也会返回相同的警告。如果你在CREATE TABLE的时候没有没有指定FTS_DOC_ID列,同时也添加了full-text index,InnoDB会添加一个隐藏的FTS_DOC_ID列,而不会有警告。
Defining an FTS_DOC_ID column at CREATE TABLE time reduces the time required to create a full-text index on a table that is already loaded with data. If an FTS_DOC_ID column is defined on a table prior to loading data, the table and its indexes do not have to be rebuilt to add the new column. If you are not concerned with CREATE FULLTEXT INDEX performance, leave out the FTS_DOC_ID column to have InnoDB create it for you. InnoDB creates a hidden FTS_DOC_ID column along with a unique index (FTS_DOC_ID_INDEX) on the FTS_DOC_ID column. If you want to create your own FTS_DOC_ID column, the column must be defined as BIGINT UNSIGNED NOT NULL and named FTS_DOC_ID (all upper case), as in the following example:
在CREATE TABLE的时候就明确定义了FTS_DOC_ID列,这样因为在创建full-text index的可以加载已有的数据,索引可以减少需要的时间。如果FTS_DOC_ID列在加载数据之前就已经定义了,表和索引不需要重建来添加新的列。如果你不关心CREATE FULLTEXT INDEX的性能,可以忽略FTS_DOC_ID列,InnoDB会为你创建它。Innodb创建的隐藏列FTS_DOC_ID上还会有一个唯一索引(FTS_DOC_ID_INDEX)。如果你想要自己创建FTS_DOC_ID列,这列必须定义成BIGINT UNSIGNED NOT NULL并命名为FTS_DOC_ID,如下面的例子:
Note
The FTS_DOC_ID column does not need to be defined as an AUTO_INCREMENT column but AUTO_INCREMENT could make loading data easier.
FTS_DOC_ID不需要定义成一个AUTO_INCREMENT列,但是AUTO_INCREMENT会使得数据加载得更容易。
CREATE TABLE opening_lines (
FTS_DOC_ID BIGINT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
opening_line TEXT(500),
author VARCHAR(200),
title VARCHAR(200)
) ENGINE=InnoDB;
If you choose to define the FTS_DOC_ID column yourself, you are responsible for managing the column to avoid empty or duplicate values. FTS_DOC_ID values cannot be reused, which means FTS_DOC_ID values must be ever increasing.
如果你选择自己定义FTS_DOC_ID列,那你就要负责管理列不能为空或者有重复值。FTS_DOC_ID的值是不能够重复利用的,那就意味这FTS_DOC_ID的值必须是永远增加的。
Optionally, you can create the required unique FTS_DOC_ID_INDEX (all upper case) on the FTS_DOC_ID column.
根据情况,你可以在FTS_DOC_ID列上创建需要的唯一索引FTS_DOC_ID_INDEX。
CREATE UNIQUE INDEX FTS_DOC_ID_INDEX on opening_lines(FTS_DOC_ID);
If you do not create the FTS_DOC_ID_INDEX, InnoDB creates it automatically.
如果你没有创建FTS_DOC_ID_INDEX,InnoDB会自动创建它。
Before MySQL 5.6.31, the permitted gap between the largest used FTS_DOC_ID value and new FTS_DOC_ID value is 10000. In MySQL 5.6.31 and later, the permitted gap is 65535.
在MySQL5.6.31之前,在FTS_DOC_ID最大使用的值和新的FTS_DOC_ID值之间允许的间隙是10000。在MySQL5.6.31及以后的版本,这个允许的间隙是65535。
InnoDB Full-Text Index Deletion Handling
Deleting a record that has a full-text index column could result in numerous small deletions in the auxiliary index tables, making concurrent access to these tables a point of contention. To avoid this problem, the Document ID (DOC_ID) of a deleted document is logged in a special FTS_*_DELETED table whenever a record is deleted from an indexed table, and the indexed record remains in the full-text index. Before returning query results, information in the FTS_*_DELETED table is used to filter out deleted Document IDs. The benefit of this design is that deletions are fast and inexpensive. The drawback is that the size of the index is not immediately reduced after deleting records. To remove full-text index entries for deleted records, you must run OPTIMIZE TABLE on the indexed table with innodb_optimize_fulltext_only=ON to rebuild the full-text index. For more information, see Optimizing InnoDB Full-Text Indexes.
删除一条有full-text index的记录会引起许多在辅助索引上上的删除操作,在时间连接时间点上有很多对这些表的并发访问。为了避免这样的问题,每当从索引表里删除条记录时,被删除文档的Document ID (DOC_ID)会被记录在指定的FTS_*_DELETED表里,索引记录仍然会保留在full-text index里。在返回查询结果之前,FTS_*_DELETED表里的信息会被用于过滤出已删除的Document IDs。这样设计的好处是删除操作会是快速廉价的。缺点是在删除记录后索引的大小不会立刻减小。为了移除已删除记录的full-text index条目,你必须在索引表上通过innodb_optimize_fulltext_only=ON运行OPTIMIZE TABLE来重建full-text index。更多相关的信息可以查看Optimizing InnoDB Full-Text Indexes。
InnoDB Full-Text Index Transaction Handling
InnoDB FULLTEXT indexes have special transaction handling characteristics due its caching and batch processing behavior. Specifically, updates and insertions on a FULLTEXT index are processed at transaction commit time, which means that a FULLTEXT search can only see committed data. The following example demonstrates this behavior. The FULLTEXT search only returns a result after the inserted lines are committed.
InnoDB FULLTEXT index由于它的缓存和批处理行为,有一个特殊的事务处理特点。FULLTEXT index上的update和insert操作会在事务提交的时候处理,这就意味着FULLTEXT搜索只能看到已提交的数据。下面的例子演示了这种情况。FULLTEXT搜索只能返回已提交的insert数据。
mysql> CREATE TABLE opening_lines (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
opening_line TEXT(500),
author VARCHAR(200),
title VARCHAR(200),
FULLTEXT idx (opening_line)
) ENGINE=InnoDB;
mysql> BEGIN;
Query OK, 0 rows affected (0.00 sec)
mysql> INSERT INTO opening_lines(opening_line,author,title) VALUES
('Call me Ishmael.','Herman Melville','Moby-Dick'),
('A screaming comes across the sky.','Thomas Pynchon','Gravity\'s Rainbow'),
('I am an invisible man.','Ralph Ellison','Invisible Man'),
('Where now? Who now? When now?','Samuel Beckett','The Unnamable'),
('It was love at first sight.','Joseph Heller','Catch-22'),
('All this happened, more or less.','Kurt Vonnegut','Slaughterhouse-Five'),
('Mrs. Dalloway said she would buy the flowers herself.','Virginia Woolf','Mrs. Dalloway'),
('It was a pleasure to burn.','Ray Bradbury','Fahrenheit 451');
Query OK, 8 rows affected (0.00 sec)
Records: 8 Duplicates: 0 Warnings: 0
mysql> SELECT COUNT(*) FROM opening_lines WHERE MATCH(opening_line) AGAINST('Ishmael');
+----------+
| COUNT(*) |
+----------+
| 0 |
+----------+
mysql> COMMIT;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT COUNT(*) FROM opening_lines WHERE MATCH(opening_line) AGAINST('Ishmael');
+----------+
| COUNT(*) |
+----------+
| 1 |
+----------+
Monitoring InnoDB Full-Text Indexes
You can monitor and examine the special text-processing aspects of InnoDB FULLTEXT indexes by querying the following INFORMATION_SCHEMA tables:
你可以通过查询下面的INFORMATION_SCHEMA表来监控检查InnoDB FULLTEXT index的各方面内容:
INNODB_FT_CONFIG
INNODB_FT_INDEX_TABLE
INNODB_FT_INDEX_CACHE
INNODB_FT_DEFAULT_STOPWORD
INNODB_FT_DELETED
INNODB_FT_BEING_DELETED
You can also view basic information for FULLTEXT indexes and tables by querying INNODB_SYS_INDEXES and INNODB_SYS_TABLES.
你还能够通过查询INNODB_SYS_INDEXES and INNODB_SYS_TABLES来查看FULLTEXT index和表的基本信息。
See Section 14.13.4, “InnoDB INFORMATION_SCHEMA FULLTEXT Index Tables” for more information.
更多的信息查看Section 14.13.4, “InnoDB INFORMATION_SCHEMA FULLTEXT Index Tables”。
14.2.5.4 Physical Structure of an InnoDB Index
All InnoDB indexes are B-trees where the index records are stored in the leaf pages of the tree. The default size of an index page is 16KB.
所有的InnoDB索引都是B-trees结构的,索引的记录存储在树的叶子节点上。索引页的默认大小是16KB。
When new records are inserted into an InnoDB clustered index, InnoDB tries to leave 1/16 of the page free for future insertions and updates of the index records. If index records are inserted in a sequential order (ascending or descending), the resulting index pages are about 15/16 full. If records are inserted in a random order, the pages are from 1/2 to 15/16 full. If the fill factor of an index page drops below 1/2, InnoDB tries to contract the index tree to free the page.
当有新的记录被 插入到InnoDB clustered index的时候,InnoDB会留下1/16的页空余空间用来添加和修改索引记录。如果索引记录是连续的顺序插入的(升序或降序),那么结果达到页空间的15/16就会满了。如果记录是随机顺序插入的,那么达到1/2到15/16的时候就会满了。如果因为删除因素索引页小于了1/2,InnoDB就会视图收缩索引树来节约页空间。
You can configure the page size for all InnoDB tablespaces in a MySQL instance by setting the innodb_page_size configuration option before creating the instance. Once the page size for an instance is set, you cannot change it. Supported sizes are 16KB, 8KB, and 4KB, corresponding to the option values 16k, 8k, and 4k.
你可以在创建实例之前通过设定innodb_page_size参数指定所有InnoDB表空间的页大小。一旦实例的页大小被设定了,你就不能再修改它了。可支持的大小有16KB, 8KB, and 4KB,相对应的参数值是16k, 8k, and 4k。
A MySQL instance using a particular InnoDB page size cannot use data files or log files from an instance that uses a different page size.
MySQL实例使用的InnoDB页大小比较特殊,一个实例里使用的数据文件或者日志文件不能使用不同的页大小。
14.2.5.5 Change Buffer
The change buffer is a special data structure that caches changes to secondary index pages when affected pages are not in the buffer pool. The buffered changes, which may result from INSERT, UPDATE, or DELETE operations (DML), are merged later when the pages are loaded into the buffer pool by other read operations.
change buffer是一个特殊的数据结构,用于在被影响的页不在buffer pool的时候缓存secondary index页的修改。由于DML操作而产生的在buffer中缓存的修改操作,会在由于其他的读操作加载到buffer pool的时候进行合并。
Unlike clustered indexes, secondary indexes are usually non-unique, and inserts into secondary indexes happen in a relatively random order. Similarly, deletes and updates may affect secondary index pages that are not adjacently located in an index tree. Merging cached changes at a later time, when affected pages are read into the buffer pool by other operations, avoids substantial random access I/O that would be required to read-in secondary index pages from disk.
不像clustered index,secondary index通常是非唯一的,插入到secondary index的操作也很有可能是随机顺序的。同样地,delete和update操作会使得secondary index页在索引数里不是相邻的。当受影响的页由于其他操作读到buffer pool的时候,缓存里的修改合并操作会在稍后的时间执行,这样就避免了大量随机的I/O访问,也就避免了从磁盘里读取要求的secondary index页。
Periodically, the purge operation that runs when the system is mostly idle, or during a slow shutdown, writes the updated index pages to disk. The purge operation can write disk blocks for a series of index values more efficiently than if each value were written to disk immediately.
可以在系统大部分空闲的时候周期性运行purge操作,或者在缓慢的关机过程中把修改的索引页写入到磁盘里。和每个值立刻写入到磁盘的方式相比,purge操作可以把一系列的索引值写入到磁盘块里,这样更有效率。
Change buffer merging may take several hours when there are numerous secondary indexes to update and many affected rows. During this time, disk I/O is increased, which can cause a significant slowdown for disk-bound queries. Change buffer merging may also continue to occur after a transaction is committed. In fact, change buffer merging may continue to occur after a server shutdown and restart (see Section 14.19.2, “Forcing InnoDB Recovery” for more information).
当有大量的secondary index要更新,涉及到非常多的行时,change buffer的合并操作会要花费数个小时。在这个时间段里,磁盘I/O将会提升,基于磁盘的查询操作也会显著放慢。change buffer的合并操作也会发生在事务提交之后。实际上,change buffer的合并操作也会发生在实例关闭和重启的时候。
In memory, the change buffer occupies part of the InnoDB buffer pool. On disk, the change buffer is part of the system tablespace, so that index changes remain buffered across database restarts.
在内存里,change buffer会占用部分的InnoDB buffer pool。在磁盘上,change buffer占用的是一部分的系统表空间,所以在数据库重启的过程中索引的修改仍然会保留在 buffer里。
The type of data cached in the change buffer is governed by the innodb_change_buffering configuration option. For more information see, Section 14.4.5, “Configuring InnoDB Change Buffering”. You can also configure the maximum change buffer size. For more information, see Section 14.4.5.1, “Configuring the Change Buffer Maximum Size”.
change buffer通过innodb_change_buffering配置参数管理缓存里的数据。更多的信息查看Section 14.4.5, “Configuring InnoDB Change Buffering”。你还可以修改配置最大的change buffer的大小。更多的信息查看Section 14.4.5.1, “Configuring the Change Buffer Maximum Size”。
Monitoring the Change Buffer
The following options are available for change buffer monitoring:
下面的参数可用于监控change buffer:
l InnoDB Standard Monitor output includes status information for the change buffer. To view monitor data, issue the SHOW ENGINE INNODB STATUS command.
l InnoDB Standard Monitor的输出包括了change buffer的状态信息。可以通过执行SHOW ENGINE INNODB STATUS来查看监控数据。
mysql> SHOW ENGINE INNODB STATUS\G
Change buffer status information is located under the INSERT BUFFER AND ADAPTIVE HASH INDEX heading and appears similar to the following:
change buffer的状态信息在INSERT BUFFER AND ADAPTIVE HASH INDEX的表头下面,呈现出的信息像下面着怎样的:
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 0, seg size 2, 0 merges
merged operations:
insert 0, delete mark 0, delete 0
discarded operations:
insert 0, delete mark 0, delete 0
Hash table size 4425293, used cells 32, node heap has 1 buffer(s)
13577.57 hash searches/s, 202.47 non-hash searches/s
For a description of each data point, see Section 14.15.3, “InnoDB Standard Monitor and Lock Monitor Output”.
每个数据点的描述可以查看Section 14.15.3, “InnoDB Standard Monitor and Lock Monitor Output”。
l The INFORMATION_SCHEMA.INNODB_METRICS table provides most of the data points found in InnoDB Standard Monitor output, plus other data points. To view change buffer metrics and a description of each, issue the following query:
l INFORMATION_SCHEMA.INNODB_METRICS表提供了InnoDB Standard Monitor输出的大部分的数据点,另外还增加了其他的数据点。可以执行下面查询来查看change buffer的度量和每个数据点的描述:
mysql> SELECT NAME, COMMENT FROM INFORMATION_SCHEMA.INNODB_METRICS WHERE NAME LIKE '%ibuf%'\G
For INNODB_METRICS table usage information, see Section 14.13.6, “InnoDB INFORMATION_SCHEMA Metrics Table”.
关于INNODB_METRICS表的使用信息可以查看Section 14.13.6, “InnoDB INFORMATION_SCHEMA Metrics Table”。
l The INFORMATION_SCHEMA.INNODB_BUFFER_PAGE table provides metadata about each page in the buffer pool, including change buffer index and change buffer bitmap pages. Change buffer pages are identified by PAGE_TYPE. IBUF_INDEX is the page type for change buffer index pages, and IBUF_BITMAP is the page type for change buffer bitmap pages.
l INFORMATION_SCHEMA.INNODB_BUFFER_PAGE提供了buffer pool里面每个页元数据,包括change buffer索引和change buffer位图页。change buffer页由PAGE_TYPE来标识。change buffer索引的页类型是IBUF_INDEX,change buffer位图页的页类型是IBUF_BITMAP。
Warning
Querying the INNODB_BUFFER_PAGE table can introduce significant performance overhead. To avoid impacting performance, reproduce the issue you want to investigate on a test instance and run your queries on the test instance.
查询INNODB_BUFFER_PAGE表可能会带来显著的性能损耗。为了避免对性能的影响, 请在测试环境上重现你想调查的问题,然后在测试环境上进行查询。
For example, you can query the INNODB_BUFFER_PAGE table to determine the approximate number of IBUF_INDEX and IBUF_BITMAP pages as a percentage of total buffer pool pages.
例如,你想查询INNODB_BUFFER_PAGE表来确认IBUF_INDEX和IBUF_BITMAP页的数量占整个buffer pool的比例。
SELECT
(SELECT COUNT(*) FROM INFORMATION_SCHEMA.INNODB_BUFFER_PAGE
WHERE PAGE_TYPE LIKE 'IBUF%'
) AS change_buffer_pages,
(
SELECT COUNT(*)
FROM INFORMATION_SCHEMA.INNODB_BUFFER_PAGE
) AS total_pages,
(
SELECT ((change_buffer_pages/total_pages)*100)
) AS change_buffer_page_percentage;
+---------------------+-------------+-------------------------------+
| change_buffer_pages | total_pages | change_buffer_page_percentage |
+---------------------+-------------+-------------------------------+
| 25 | 8192 | 0.3052 |
+---------------------+-------------+-------------------------------+
For information about other data provided by the INNODB_BUFFER_PAGE table, see Section 21.29.16, “The INFORMATION_SCHEMA INNODB_BUFFER_PAGE Table”. For related usage information, see Section 14.13.5, “InnoDB INFORMATION_SCHEMA Buffer Pool Tables”.
INNODB_BUFFER_PAGE表提供的其他的数据信息,可以查看Section 21.29.16, “The INFORMATION_SCHEMA INNODB_BUFFER_PAGE Table”。相关的使用信息可以查看Section 14.13.5, “InnoDB INFORMATION_SCHEMA Buffer Pool Tables”。
l Performance Schema provides change buffer mutex wait instrumentation for advanced performance monitoring. To view change buffer instrumentation, issue the following query:
l Performance Schema为高级性能监控提供了change buffer mutex wait instrumentation。查看change buffer instrumentation,可以执行下面的查询:
mysql> SELECT * FROM performance_schema.setup_instruments
WHERE NAME LIKE '%wait/synch/mutex/innodb/ibuf%';
+-------------------------------------------------------+---------+-------+
| NAME | ENABLED | TIMED |
+-------------------------------------------------------+---------+-------+
| wait/synch/mutex/innodb/ibuf_bitmap_mutex | YES | YES |
| wait/synch/mutex/innodb/ibuf_mutex | YES | YES |
| wait/synch/mutex/innodb/ibuf_pessimistic_insert_mutex | YES | YES |
+-------------------------------------------------------+---------+-------+
For information about monitoring InnoDB mutex waits, see Section 14.14.1, “Monitoring InnoDB Mutex Waits Using Performance Schema”.
关于监控InnoDB mutex waits的监控信息,可以查看Section 14.14.1, “Monitoring InnoDB Mutex Waits Using Performance Schema”。
14.2.5.6 Adaptive Hash Indexes
The adaptive hash index (AHI) lets InnoDB perform more like an in-memory database on systems with appropriate combinations of workload and ample memory for the buffer pool, without sacrificing any transactional features or reliability. This feature is enabled by the innodb_adaptive_hash_index option, or turned off by --skip-innodb_adaptive_hash_index at server startup.
adaptive hash index (AHI)通过适当的工作量负载,以及buffer pool中充足的内存,可以使得InnoDB像内存数据库一样运行,而且不会牺牲任何的事务特性或者可靠性。这个特性可以通过innodb_adaptive_hash_index参数打开,或者在实例启动的时候通过--skip-innodb_adaptive_hash_index关闭。
Based on the observed pattern of searches, MySQL builds a hash index using a prefix of the index key. The prefix of the key can be any length, and it may be that only some of the values in the B-tree appear in the hash index. Hash indexes are built on demand for those pages of the index that are often accessed.
基于搜索的观察模式,MySQL使用索引key的前缀来建立hash index。key的前缀可以是任何长度的,可有可能一些B-tree的值出现在hash index里。hash index是构建在这样的需求上的:一部分的索引页是要经常访问的。
If a table fits almost entirely in main memory, a hash index can speed up queries by enabling direct lookup of any element, turning the index value into a sort of pointer. InnoDB has a mechanism that monitors index searches. If InnoDB notices that queries could benefit from building a hash index, it does so automatically.
如果一个表大部分的时候都是完全在内存里的,通过开启任何元素的直接查找,把索引值放入到一个排序的点上,这样hash index就可以加速查询的速度。InnoDB有一个机制来监控索引的搜索。如果InnoDB注意到查询从构建的hash index上有优势的时候,它就会自动这么处理。
With some workloads, the speedup from hash index lookups greatly outweighs the extra work to monitor index lookups and maintain the hash index structure. Sometimes, the read/write lock that guards access to the adaptive hash index can become a source of contention under heavy workloads, such as multiple concurrent joins. Queries with LIKE operators and % wildcards also tend not to benefit from the AHI. For workloads where the adaptive hash index is not needed, turning it off reduces unnecessary performance overhead. Because it is difficult to predict in advance whether this feature is appropriate for a particular system, consider running benchmarks with it both enabled and disabled, using a realistic workload. The architectural changes in MySQL 5.6 and higher make more workloads suitable for disabling the adaptive hash index than in earlier releases, although it is still enabled by default.
在一些工作负载下,通过hash index查找带来的加速远超过监控索引查找和维护hash index结构带来的额外的工作量。有的时候,由read/write守护的adaptive hash index在一些沉重的工作负载下会成为竞争的源,特别是大量并发的join操作的时候。在查询中使用LIKE操作和%通配符也不会从AHI中获得优势。对于一些不需要adaptive hash index的工作场景,关闭它可以减少一些不必要的性能损耗。因为很难预测这种特性是否适合一个特别的系统,所以要使用现实负载分别对开启和关闭的情况进行基准测试。MySQL5.6及更高版本架构的更改使得更多的工作场景比早先的版本更适合关闭adaptive hash index,但是默认情况它还是开启的。
The hash index is always built based on an existing B-tree index on the table. InnoDB can build a hash index on a prefix of any length of the key defined for the B-tree, depending on the pattern of searches that InnoDB observes for the B-tree index. A hash index can be partial, covering only those pages of the index that are often accessed.
hash index总是构建于一个当前的B-tree index上的。InnoDB能够在B-tree key的任意长度的前缀上构建hash index。hash index可以是局部的,仅覆盖在经常访问的那些索引页上。
You can monitor the use of the adaptive hash index and the contention for its use in the SEMAPHORES section of the output of the SHOW ENGINE INNODB STATUS command. If you see many threads waiting on an RW-latch created in btr0sea.c, then it might be useful to disable adaptive hash indexing.
你可以使用SHOW ENGINE INNODB STATUS命令监控adaptive hash index的使用及争用情况。如果你看到很多的线程等待在btr0sea.c创建的RW-latch,那么关闭adaptive hash indexing就会非常有用。
For more information about the performance characteristics of hash indexes, see Section 8.3.8, “Comparison of B-Tree and Hash Indexes”.
更多hash index性能特性的信息可以查看Section 8.3.8, “Comparison of B-Tree and Hash Indexes”。
14.2.5.7 Physical Row Structure
The physical row structure of an InnoDB table depends on the row format specified when the table is created. By default, InnoDB uses the Antelope file format and its COMPACT row format. The REDUNDANT format is available to retain compatibility with older versions of MySQL. When you enable the innodb_file_per_table setting, you can also make use of the newer Barracuda file format, with its DYNAMIC and COMPRESSED row formats, as explained in Section 14.9, “InnoDB Row Storage and Row Formats” and Section 14.7, “InnoDB Table Compression”.
InnoDB表行的物理结构是基于表创建时选择的行的格式的。默认情况下,InnoDB使用Antelope文件格式和COMPACT的行格式。REDUNDANT格式是为了和MySQL以前的版本兼容。当你开启了innodb_file_per_table,你还可以使用更新的Barracuda文件格式,能够能够使用DYNAMIC和COMPRESSED行格式,如Section 14.9, “InnoDB Row Storage and Row Formats”和Section 14.7, “InnoDB Table Compression”所讲述的。
To check the row format of an InnoDB table, you can use SHOW TABLE STATUS. For example:
你可以使用SHOW TABLE STATUS来检查InnoDB表的当前格式。例如:
mysql> SHOW TABLE STATUS IN test1\G
*************************** 1. row ***************************
Name: t1
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 0
Avg_row_length: 0
Data_length: 16384
Max_data_length: 0
Index_length: 16384
Data_free: 0
Auto_increment: 1
Create_time: 2014-10-31 16:02:01
Update_time: NULL
Check_time: NULL
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment:
You can also check the row format of an InnoDB table by querying INFORMATION_SCHEMA.INNODB_SYS_TABLES.
你还可以通过查询INFORMATION_SCHEMA.INNODB_SYS_TABLES表来检查InnoDB表的行格式。
mysql> SELECT NAME, ROW_FORMAT FROM INFORMATION_SCHEMA.INNODB_SYS_TABLES WHERE NAME='test1/t1';
+----------+------------+
| NAME | ROW_FORMAT |
+----------+------------+
| test1/t1 | Compact |
+----------+------------+
The COMPACT row format decreases row storage space by about 20% at the cost of increasing CPU use for some operations. If your workload is a typical one that is limited by cache hit rates and disk speed, COMPACT format is likely to be faster. If the workload is a rare case that is limited by CPU speed, COMPACT format might be slower.
COMPACT行格式能够减少大约20%的行存储空间,以及增加少部分的CPU损耗。如果你的工作场景是普通的,主要的限制在缓存命中率和磁盘的速度上,那么COMPACT格式可能会更快。如果工作场景比较特殊,主要限制在CPU的速度上,那么COMPACT行格式可能会更慢。
Rows in InnoDB tables that use REDUNDANT row format have the following characteristics:
InnoDB使用REDUNDANT行格式会有以下的特定:
l Each index record contains a 6-byte header. The header is used to link together consecutive records, and also in row-level locking.
l 每个索引记录包含6-byte的记录头。这个记录头用来把连续的记录连接起来,还有行级锁。
l Records in the clustered index contain fields for all user-defined columns. In addition, there is a 6-byte transaction ID field and a 7-byte roll pointer field.
l clustered index的记录包含了所有用户定义的列。另外,还包含了 6-byte的事务ID和7-byte的回滚指针。
l If no primary key was defined for a table, each clustered index record also contains a 6-byte row ID field.
l 如果表上没有定义主键,每个clustered index记录还包含6-byte的行ID。
l Each secondary index record also contains all the primary key fields defined for the clustered index key that are not in the secondary index.
l 每个secondary index记录都包含所有主键列。
l A record contains a pointer to each field of the record. If the total length of the fields in a record is less than 128 bytes, the pointer is one byte; otherwise, two bytes. The array of these pointers is called the record directory. The area where these pointers point is called the data part of the record.
l 每个记录都包含一个指向每个列的指针。如果一条记录所有列的总长度小于128 bytes,这个指针是1byte;否则是2bytes。这些指针的数据称之为记录目录。这些指针指向的区域称之为记录的数据部分。
l Internally, InnoDB stores fixed-length character columns such as CHAR(10) in a fixed-length format. InnoDB does not truncate trailing spaces from VARCHAR columns.
l 在内部,InnoDB存储固定长度的字符列如CHAR(10)。InnoDB不会从VARCHAR列上截断尾部的空间。
l An SQL NULL value reserves one or two bytes in the record directory. Besides that, an SQL NULL value reserves zero bytes in the data part of the record if stored in a variable length column. In a fixed-length column, it reserves the fixed length of the column in the data part of the record. Reserving the fixed space for NULL values enables an update of the column from NULL to a non-NULL value to be done in place without causing fragmentation of the index page.
l NULL值会在记录目录里保留一到两个bytes。除此之外,如果是存储在可变长度的列上NULL在记录的数据部分只会占用0个byte。在定长列上,它会在记录的数据部分占用列的固定长度。为NULL值保留固定的空间能够使得把NULL修改成非NULL的时候不会产生索引页的分裂。
Rows in InnoDB tables that use COMPACT row format have the following characteristics:
InnoDB表使用COMPACT行格式有下面的特点:
l Each index record contains a 5-byte header that may be preceded by a variable-length header. The header is used to link together consecutive records, and also in row-level locking.
l 每个索引记录的都包含5-byte的头,在可变长度头之前。这个头用来把连续的记录连接起啦,也用于行级锁。
l The variable-length part of the record header contains a bit vector for indicating NULL columns. If the number of columns in the index that can be NULL is N, the bit vector occupies CEILING(N/8) bytes. (For example, if there are anywhere from 9 to 15 columns that can be NULL, the bit vector uses two bytes.) Columns that are NULL do not occupy space other than the bit in this vector. The variable-length part of the header also contains the lengths of variable-length columns. Each length takes one or two bytes, depending on the maximum length of the column. If all columns in the index are NOT NULL and have a fixed length, the record header has no variable-length part.
l 记录头部的可变长部分包含了一个bit的向量用来表明NULL列。如果索引里可能为NULL的列的数量是N,bit向量占用的上限是(N/8) bytes。(例如,有9到15个列可能是NULL的,那么bit向量使用2bytes。)NULL的列和占用bit的向量相比不占用空间。头部可变长的部分同样还包含可变长度列的长度值。如果索引里的所有列都是NOT NULL和定长的,记录的头部就没有可变长部分了。
l For each non-NULL variable-length field, the record header contains the length of the column in one or two bytes. Two bytes will only be needed if part of the column is stored externally in overflow pages or the maximum length exceeds 255 bytes and the actual length exceeds 127 bytes. For an externally stored column, the 2-byte length indicates the length of the internally stored part plus the 20-byte pointer to the externally stored part. The internal part is 768 bytes, so the length is 768+20. The 20-byte pointer stores the true length of the column.
l 对于每个non-NULL可变长的列,记录头部包含了一到两个bytes的列的长度。只有列被存储在了外部的溢出页上,或者最大长度超过255 bytes实际长度超过127 bytes的时候,头部才会占用两个bytes。对于外部存储的列,2-byte的长度表明了内部存储部分的长度加上20-byte的指针指向外部存储部分。内部部分是768 bytes,所以总的长度是768+20。20-byte的指针存储了列的真实长度。
l The record header is followed by the data contents of the non-NULL columns.
l 记录头部之后non-NULL列的数据内容。
l Records in the clustered index contain fields for all user-defined columns. In addition, there is a 6-byte transaction ID field and a 7-byte roll pointer field.
l clustered index里面的记录包含了所有用户定义的列。另外,还有6-byte的事务ID和7-byte的回滚指针。
l If no primary key was defined for a table, each clustered index record also contains a 6-byte row ID field.
l 如果表没有定义主键,每个clustered index记录还包括6-byte的行ID。
l Each secondary index record also contains all the primary key fields defined for the clustered index key that are not in the secondary index. If any of these primary key fields are variable length, the record header for each secondary index will have a variable-length part to record their lengths, even if the secondary index is defined on fixed-length columns.
l 每个secondary index记录也包括所有主键定义的列。如果任意一个主键列是可变长的,那么每个secondary index的记录头就会有一个可变长部分用于记录它们的长度,即使这个secondary index是被定义在一个定长列上的。
l Internally, InnoDB stores fixed-length character columns such as CHAR(10) in a fixed-length format. InnoDB does not truncate trailing spaces from VARCHAR columns.
l 在内部,InnoDB在定长格式下像CHAR(10)一样存储定长的字符列。InnoDB不会截断VARCHAR列的尾部空间。
l Internally, InnoDB attempts to store utf8 CHAR(N) and utf8mb4 CHAR(N) columns in N bytes by trimming trailing spaces. If the byte length of a CHAR(N) column value exceeds N bytes, InnoDB trims trailing spaces to a minimum of the column value byte length. The maximum length of a CHAR(N) column is the maximum character byte length × N, as reported by the CHARACTER_OCTET_LENGTH column of the INFORMATION_SCHEMA.COLUMNS table.
l 在内部,InnoDB试图通过裁剪尾部空间来把utf8 CHAR(N) and utf8mb4 CHAR(N)的列存储在N bytes的空间里。如果一个CHAR(N)列值的长度超过N bytes,InnoDB会截断尾部的空间。一个CHAR(N)列的最大长度字符byte长度× N,如果INFORMATION_SCHEMA.COLUMNS表CHARACTER_OCTET_LENGTH列所报告的。
l InnoDB reserves a minimum of N bytes for CHAR(N). Reserving the minimum space N in many cases enables column updates to be done in place without causing fragmentation of the index page.
l InnoDB为CHAR(N)预留最小的N bytes。在很多情况下预留的最小空间N能够列在update的时候在索引页上不会产生碎片。
l By comparison, for ROW_FORMAT=REDUNDANT, utf8 and uft8mb4 columns occupy the maximum character byte length × N. ROW_FORMAT=DYNAMIC and ROW_FORMAT=COMPRESSED handle CHAR storage in the same way as ROW_FORMAT=COMPACT.
l 通过比较,对于ROW_FORMAT=REDUNDANT,utf8 and uft8mb4列占用最大的字符byte的长度× N。ROW_FORMAT=DYNAMIC and ROW_FORMAT=COMPRESSED和以和ROW_FORMAT=COMPACT一样的方式处理CHAR的存储。
DYNAMIC and COMPRESSED row formats are variations of the COMPACT row format. For information about these row formats, see Section 14.9.3, “DYNAMIC and COMPRESSED Row Formats”.
DYNAMIC and COMPRESSED行格式是COMPACT的变种。更多关于行格式的信息可以查看Section 14.9.3, “DYNAMIC and COMPRESSED Row Formats”。
14.2.6 InnoDB Mutex and Read/Write Lock Implementation
In MySQL and InnoDB, multiple threads of execution access shared data structures. InnoDB synchronizes these accesses with its own implementation of mutexes and read/write locks. Historically, InnoDB protected the internal state of a read/write lock with an InnoDB mutex, and the the internal state of an InnoDB mutex was protected by a Pthreads mutex, as in IEEE Std 1003.1c (POSIX.1c).
在MySQL和InnoDB里,可以执行多个线程访问共享的数据结构。InnoDB会使用它自己的mutexes和read/write锁同步这些访问。之前,InnoDB似乎用一个InnoDB mutex来保护read/write锁的内部状态,InnoDB mutex的内部状态则由Pthreads mutex来保护,如同IEEE Std 1003.1c (POSIX.1c)一样。
On many platforms, Atomic operations can often be used to synchronize the actions of multiple threads more efficiently than Pthreads. Each operation to acquire or release a lock can be done in fewer CPU instructions, wasting less time when threads contend for access to shared data structures. This in turn means greater scalability on multi-core platforms.
在很多的平台上,原子操作通常会被用于同步多线程的动作,这样相比Pthreads更有效率。每个操作的获得和锁的释放可以在极少的CPU指令下完成,当多个线程竞争访问共享额数据结构时花费更少的时间。这种调整意味着在多核CPU平台上更具有可扩展性。
On platforms that support Atomic operations, InnoDB now implements mutexes and read/write locks with the built-in functions provided by the GNU Compiler Collection (GCC) for atomic memory access instead of using the Pthreads approach. More specifically, InnoDB compiled with GCC version 4.1.2 or later uses the atomic builtins instead of a pthread_mutex_t to implement InnoDB mutexes and read/write locks.
在支持原子操作的平台上,InnoDB现在通过使用GNU Compiler Collection (GCC)为原子内存访问而提供的内置函数实现了mutexes and read/write locks,而不是使用Pthreads的方法。更具体地说,通过GCC 4.1.2或者更新版本编译的InnoDB使用是哦那个原子的内置命令代替pthread_mutex_t实现了InnoDB的mutexes and read/write locks。
On 32-bit Microsoft Windows, InnoDB implements mutexes (but not read/write locks) with hand-written assembler instructions. Beginning with Microsoft Windows 2000, functions for Interlocked Variable Access are available that are similar to the built-in functions provided by GCC. On Windows 2000 and higher, InnoDB makes use of the Interlocked functions, which support read/write locks and 64-bit platforms.
在32-bit Microsoft Windows上,InnoDB通过hand-written汇编指令实现了mutex(不是read/write locks)。从Microsoft Windows 2000开始,Interlocked Variable Access的函数就可用了,它类似于GCC提供的内置函数。在Windows 2000及更高的版本里,InnoDB会使用Interlocked函数,它支持read/write locks以及64-bit的平台。
Solaris 10 introduced library functions for atomic operations, and InnoDB uses these functions by default. When MySQL is compiled on Solaris 10 or later with a compiler that does not support the built-in functions provided by the GNU Compiler Collection (GCC) for atomic memory access, InnoDB uses the library functions.
Solaris 10引进了原子操作库函数,InnoDB会默认使用这些函数。当MySQL在Solaris 10或者更高的版本使用不支持GNU Compiler Collection (GCC)为原子内存访问而提供内置函数进行编译时,InnoDB就会使用库函数。
On platforms where the GCC, Windows, or Solaris functions for atomic memory access are not available, InnoDB uses the traditional Pthreads method of implementing mutexes and read/write locks.
在有GCC的平台上,Windows, or Solaris为了原子内存方位的函数是不可用的,InnoDB会使用传统的Pthreads方法来实现mutexes and read/write locks。
When MySQL starts, InnoDB writes a message to the log file indicating whether atomic memory access is used for mutexes, for mutexes and read/write locks, or neither. If suitable tools are used to build InnoDB and the target CPU supports the atomic operations required, InnoDB uses the built-in functions for mutexing. If, in addition, the compare-and-swap operation can be used on thread identifiers (pthread_t), then InnoDB uses the instructions for read-write locks as well.
当MySQL启动的时候,InnoDB会写一条消息到日志文件里来表明原子内存访问是否用于mutexes,或者mutexes and read/write locks,又或者都不是。如果使用了合适的工具来构建InnoDB而且目标CPU也支持必须的原子操作,InnoDB就会为mutex使用内置函数。另外,如果比较和交换操作被用在了线程标识(pthread_t)上,InnoDB还会为read-write locks使用这个指令。
If you are building from source, ensure that the build process properly takes advantage of your platform capabilities.
如果你是从源码构建的,要确保在构建处理过冲中使用了当前平台特性的优势。
For more information about the performance implications of locking, see Section 8.11, “Optimizing Locking Operations”.
更多关于锁性能的影响,可以查看Section 8.11, “Optimizing Locking Operations”。
MySQL 5.6 Reference Manual-14.2 InnoDB Concepts and Architecture的更多相关文章
- MySQL :: MySQL 5.0 Reference Manual :: 14.4 The MEMORY (HEAP) Storage Engine
MySQL :: MySQL 5.0 Reference Manual :: 14.4 The MEMORY (HEAP) Storage Engine The MEMORY (HEAP) Stora ...
- MySQL :: MySQL 8.0 Reference Manual :: B.6.4.3 Problems with NULL Values https://dev.mysql.com/doc/refman/8.0/en/problems-with-null.html
MySQL :: MySQL 8.0 Reference Manual :: B.6.4.3 Problems with NULL Values https://dev.mysql.com/doc/r ...
- MySQL 5.7 Reference Manual :: 4.5.4 mysqldump & mysql — Database Backup & Restore Program
MySQL :: MySQL 5.7 Reference Manual :: 4.5.4 mysqldump — A Database Backup Programhttps://dev.mysql. ...
- [MySQL Reference Manual]14 InnoDB存储引擎
14 InnoDB存储引擎 14 InnoDB存储引擎 14.1 InnoDB说明 14.1.1 InnoDB作为默认存储引擎 14.1.1.1 存储引擎的趋势 14.1.1.2 InnoDB变成默认 ...
- 【MySQL 5.7 Reference Manual】15.4.2 Change Buffer(变更缓冲)
15.4.2 Change Buffer(变更缓冲) The change buffer is a special data structure that caches changes to se ...
- MySQL 5.7 Reference Manual】15.4.2 Change Buffer(变更缓冲)
15.4.2 Change Buffer(变更缓冲) The change buffer is a special data structure that caches changes to se ...
- [MySQL Reference Manual] 8 优化
8.优化 8.优化 8.1 优化概述 8.2 优化SQL语句 8.2.1 优化SELECT语句 8.2.1.1 SELECT语句的速度 8.2.1.2 WHERE子句优化 8.2.1.3 Range优 ...
- MySQL 5.6 Reference Manual-14.6 InnoDB Table Management
14.6 InnoDB Table Management 14.6.1 Creating InnoDB Tables 14.6.2 Moving or Copying InnoDB Tables to ...
- MySQL 5.6 Reference Manual-14.1 Introduction to InnoDB
14.1 Introduction to InnoDB 14.1.1 InnoDB as the Default MySQL Storage Engine 14.1.2 Checking InnoDB ...
随机推荐
- Vue解决跨域之反向代理
目录 : config/index.js module.exports = { dev: { // Paths assetsSubDirectory: 'static', assetsPublicPa ...
- Java8自定义条件让集合分组
/** * 将一个指定类型对象的集合按照自定义的一个操作分组: 每组对应一个List.最终返回结果类型是:List<List<T>> * * @param <T> ...
- Context、Select(day01)
Oracle sql: 4天 plsql: 2天 proc: 2天 数据库介绍 1.1 数据库简介 1.1.1 数据管理技术的发展 人工管理阶段:20世纪50年代中期之前 文件管理阶段:20世纪的50 ...
- laravel Job 和事件
在做项目的时候,一直对Job和Event有个疑惑.感觉两者是相同的东西,搞不清楚两者的区别在哪里!经过一段时间的琢磨和查找了相关的资料,对Job和Event做了一些总结,以便记忆. Job Job既可 ...
- 【1】Django概述
道生一,一生二,二生三,三生万物 无名天地之始,有名万物之母 ——老子 python程序web项目开发,是非常重要的一部分,Python为基础的web项目开发的框架有很多,django无疑是最强大we ...
- BZOJ 2085 luogu P3502 [POI2010]Hamsters (KMP、Floyd、倍增)
数组开小毁一生-- 题目链接: https://www.lydsy.com/JudgeOnline/problem.php?id=2085 这题在洛谷上有个条件是"互不包含",其实 ...
- adchos 文本混淆工具
#-*- coding:utf-8 -*- import jieba import random import codecs import sys import string import chard ...
- Spring Cloud-个人理解的微服务演变过程(一)
最初架构 说明:最初我们架构是垂直的 所有功能都在一个项目里面 随着业务和用户的增长 原来一台服务器已经不能支撑现有的请求数 这个时候我们就需要部署多台服务器 集群模式 说明:我们使用nginx做代理 ...
- orcale 多表连接
多表连接:
- 如何预防SQL注入,XSS漏洞(spring,java)
SQL注入简介 SQL注入是由于程序员对用户输入的参数没有做好校验,让不法分子钻了SQL的空子, 比如:我们一个登录界面,要求用户输入用户名和密码: 用户名: ' or 1=1-- 密码: 点击登录之 ...