Brought to you by Rick James

The Problem

You have a "huge" table, and you need to ALTER it. By "huge" I mean that the ALTER would take an unacceptable amount of time.

Things you might want to do. and for which this is well suited: 
    ⚈  Add/drop columns 
    ⚈  Modify datatypes (INT -> BIGINT, SIGNED -> UNSIGNED) 
    ⚈  Modify character set / collation of column(s) 
    ⚈  Add/drop indexes 
    ⚈  Add/drop/modify PARTITIONing 
    ⚈  Normalize/Denormalize 
    ⚈  Change Engine (MyISAM to InnoDB)

⚈添加/删除列 
    ⚈修改数据类型(INT - > BIGINT,SIGNED - > UNSIGNED) 
    ⚈修改列的字符集/排序 
    ⚈添加/删除索引 
    ⚈添加/删除/修改PARTITIONing 
    ⚈标准化/非规范化 
    ⚈更改引擎(MyISAM到InnoDB)

Future Solutions

Percona's pt-online-schema-change
(aka pt-osc) can do an ALTER with very little downtime. It does, however, require adding a TRIGGER to the table.

gh-ost
is a new and promising competitor to pt-online-schema-change; it uses the binlog.

The ALTER in version 5.6.7 can do lots of ALTERs without blocking other operations. For a list of what 5.7 can and cannot do via ALGORITHM=INPLACE, see 5.7 ALTER TABLE
版本5.6.7中的ALTER可以执行大量ALTER而不会阻止其他操作。有关5.7可通过ALGORITHM = INPLACE执行和不可执行的操作的列表,请参阅 5.7更改表

If you don't have the latest version, or you can't use Percona's solution, read on. 
如果您没有最新版本,或者您无法使用Percona的解决方案,请继续阅读。

Overview of Solution

1.  Build the 'Alter' script to copy rows into a new table (in clumps of rows) 
    2.  Push code to add a Database Layer between client(s) and the database 
    3.  Push code to augment the Database Layer to handle the effect of the Alter 
    4.  Turn on the Alter 
    5.  Push code to deactivate the Alter

The Layer may already be in place; it is 'best practice'. However, after seeing what is ahead (below), you may want to clean up the Layer some. One example is to change a BEGIN...COMMIT into a single API call (if practical). If this leads to a cleaner API, then that helps the API. Later, when you get to step 3, life will be simpler for other reasons.

Steps 2 and 3 are separate because I am assuming you also cannot afford screwups. Step 2 focuses on the API from Client to Layer; Step 3 focuses on tweaking the Layer for this Alter -- they are separable focuses, and is safer to think about only one at a time.

"Turning on the Alter" is essentially running the script (Step 1) to copy the table over. This may take hours or days.

The guiding principles... 
    ⚈  If the existing table continues to be read/written by the identical code as before, all queries on it should continue to work correctly. 
    ⚈  At all times, rows with id <= $left_off (a "highwater mark") will be correctly transformed, inserted, updated, etc. 
    ⚈  At the end, when $left_off is at the end of the table, the transformation is complete.

Shortcut

Most of the rest of this discussion centers on the complex case of a table that is being modified -- any row could be UPDATEd, INSERTs could go anywhere into the table. And it assumes a single machine, or a single Master.

Some likely special cases are covered near the end ("Alternative...")

Assumptions/Restrictions/Complications

Assumptions 
    ⚈  This discussion assumes you can walk through the original table using an explicit PRIMARY or UNIQUE key. 
    ⚈  Single-column (not 'compound') key is used to walk through the table. 
    ⚈  You have enough disk space to simultaneously hold both the original table and the new table(s). 
    ⚈  There is enough 'wiggle room' in the performance of the databases so that the overhead of this process (100%?) can be handled. 
    ⚈  INSERTs are single-row, UPDATEs are single-table, DELETEs are single-row 
    ⚈  You can modify the Layer to change how INSERTs, etc, operate. 
    ⚈  SELECTs need no modification. (Nice, eh?) 
    ⚈  UPDATE statements do not modify the column being used to walk through the table. 
    ⚈  INSERT..ON DUPLICATE KEY UPDATE.. is not used 
    ⚈  INSERT IGNORE does not depend on a secondary UNIQUE key 
    ⚈  There are no FOREIGN KEYs in the definition of this table, nor any in other tables referencing this table. 
    ⚈  Only one table is involved 
    ⚈  You have all the write operations (no ad hoc queries from users) 
    ⚈  Self-joins

假设 
    ⚈本讨论假定您可以使用显式PRIMARYUNIQUE键遍历原始表。 
    ⚈单列(非'复合')键用于遍历表格。 
    ⚈您有足够的磁盘空间来同时保存原始表和新表。 
    ⚈数据库的性能有足够的“摆动空间”,这样就可以处理这个过程(100%?)的开销。 
    ⚈INSERT是单行,UPDATE是单表,DELETE是单行 
    ⚈您可以修改Layer以更改INSERT等的操作方式。 
    ⚈选择不需要修改。(很好,嗯?) 
    ⚈UPDATE语句不会修改用于遍历表的列。 
    ⚈   INSERT..ON DUPLICATE密钥更新..不使用 
    ⚈   INSERT IGNORE不依赖于二次UNIQUE键 
    ⚈没有外键在该表的定义,也没有任何其它表中引用此表。 
    ⚈只涉及一个表 
    ⚈您拥有所有写操作(没有来自用户的临时查询) 
    ⚈自连接

Applicability 
    ⚈  Engine -- The outline given here should work equally well for MyISAM or InnoDB. 
    ⚈  Replication -- You can perform this on the read/write Master and have it propagate gracefully to the slaves. 
    ⚈  Cluster (Galera, etc) -- It should work.

适用性 
    ⚈引擎 - 此处给出的大纲应该对MyISAM或InnoDB同样有效。 
    ⚈复制 - 您可以在读/写主服务器上执行此操作,并使其正常传播到从服务器。 
    ⚈群集(Galera等) - 它应该工作。

What to do if assumptions are not met? 
    ⚈  No PRIMARY/UNIQUE key -- A non-unique key can be used, but this may lead to arbitrarily large locks on the table. 
    ⚈  No explicit keys at all -- Punt. (LIMIT+OFFSET is not viable!) 
    ⚈  Inadequate disk space -- Punt. Get space first. 
    ⚈  If the system gets bogged down, two dynamic tunables can be tweaked to slow down the Alter and make it less invasive. 
    ⚈  Multi-row INSERT -- design the Layer's API to make it easy to determine which rows are below $left_off. 
    ⚈  Multi-table UPDATE -- This is a challenge -- especially 'self-joins' that interrogate later parts of the table to decide on changes to earlier parts. 
    ⚈  DELETE .. IN(...) -- pass the IN list in an array for easy handling 
    ⚈  DELETE .. WHERE (multi-row) -- no problem 
    ⚈  UPDATE that changes the key -- You must recognize the situation and write some messy code. (It would be handy if the API helped make this obvious.) 
    ⚈  INSERT..ON DUPLICATE KEY UPDATE.. -- Don't know. 
    ⚈  INSERT IGNORE -- The problem is that it might get INSERTed before the conflicting record is inserted. No workaround available. 
    ⚈  FOREIGN KEYS -- have not investigated 
    ⚈  Extra tables for normalization -- not a big deal 
    ⚈  ad hoc write queries -- must disallow for the duration of the Alter. 
    ⚈  Compound index -- The code is more complex, but doable. See 
Iterating through a compound key

如果不符合假设该怎么办? 
    ⚈无PRIMARY / UNIQUE键 - 可以使用非唯一键,但这可能会导致表上任意大的锁定。 
    ⚈根本没有明确的钥匙 - Punt。(LIMIT + OFFSET不可行!) 
    ⚈磁盘空间不足 - Punt。首先获得空间。 
    ⚈如果系统陷入困境,可以调整两个动态可调参数来减慢Alter的速度并减少其侵入性。 
    ⚈多行INSERT - 设计Layer的API,以便于确定哪些行低于$ left_off。 
    ⚈多表更新 - 这是一个挑战 - 尤其是“自联接”,它会询问表格的后续部分,以决定对早期部分的更改。 
    ⚈DELETE.. IN(...) - 传递数组中的IN列表以便于处理 
    ⚈删除.. WHERE(多行) - 没问题 
    ⚈更新密钥的更新 - 您必须认识到情况和写一些凌乱的代码。(如果API帮助使这明显这将是得心应手。) 
    ⚈   INSERT..ON重复键UPDATE .. -不知道。 
    ⚈   INSERT IGNORE -问题是,插入冲突的记录之前,它可能会插入。没有可用的解决方法。 
    ⚈ 外国钥匙   - 尚未调查 
    ⚈标准化的额外表格 - 不是什么大问题 
    ⚈临时写入查询 - 必须在Alter期间不允许。 
    ⚈复合索引 - 代码更复杂,但可行。请参阅 
迭代复合键

Database Layer

It is "best practice" to have a "Layer" between your clients and your database. SQL statements are only known to the Layer, not to the clients. By having the Layer, you are segregating "business logic" from "database details". For the purposes here, the "database details" will be changing; it is better to have such code isolated.

The Layer would be called from the Client with calls like "Insert this stuff...". If the 'stuff' needs massaguing to be compatible with MySQL (eg, timestamps), then the Layer is the 'right' place to do the conversion. The Layer is also a good place to hide any "normalization" and "lookup" tables -- clients should no care whether a 'name' is stored in the table directly, or normalized into another table and only an id is put into the "Fact" table.

Keep the API clean and simple; hide any database complexity in the Layer. And, for this task, it will get complex.

For the task at hand, we will depend on the Layer to make it easy to migrate from one table to another, and to do that without any real knowledge of "business logic". This ignorance of the client side of things makes it easier and safer to write the code and have confidence that it is correct.

The Alter

This is a script that does the conversion by copying the data into a new table, which already has the 'new' schema elements (added columns, dropped indexes, etc).

⚈  CREATE TABLE -- the new table, with all new indexes/datatypes, etc. 
    ⚈  Change the row in the helper table Migrate (running=1, etc) (below) 
    ⚈  Loop (below) 
    ⚈  Deactivate (running=0) 
    ⚈  RENAME TABLE existing TO old, new TO existing;

Each iteration does: 
    ⚈  Migrates a "clump" of, say, 100 rows from the old table to the new. 
    ⚈  Any transforms (normalization, datatype conversion, etc) are done for those rows. 
    ⚈  Locks out all 'write' operations while handling the clump. (More later)

The starting values for Migrate: 
    ⚈  running = 1 
    ⚈  clump_size = 100 (1000 if existing table is InnoDB with no secondary keys); <100 if lots of indexes) 
    ⚈  delay = 1 (second) 
    ⚈  left_off_* -- whatever key is less than first value (eg, 0 or '')

The Loop should watch for running out of rows. When that happens, it needs to do the Deactivate and RENAME _before_ the Unlock.

One Iteration (Copy one Clump)

⚈  Fetch the next row from Migrate 
    ⚈  Find the key of the 100th row after where you 'left_off'. 
    ⚈  Lock (exclusive) 
    ⚈  INSERT INTO new SELECT * FROM existing WHERE id > $left_off AND id <= $hundredth_hence; 
    ⚈  If no more rows, Deactivate and RENAME 
    ⚈  Update row in Migrate 
    ⚈  Unlock 
    ⚈  Sleep 1 second (dynamically tunable)

Notes: 
    ⚈  The "Find" step is outside the Lock/Unlock so as to minimize the time in lock state. 
    ⚈  Risk: with the Find step outside, you could end up with more than 100 rows to move. (Probably not a big deal.) 
    ⚈  Lock/Unlock -- an exclusive LOCK on the Migrate table. Note that all INSERTs/UPDATEs/etc must also Lock that table 
    ⚈  The SELECT step should include any transforms needed, build normalization records, and whatever else is required.

Layer's INSERTs

The database Layer is vital for giving us a simple way to modify all INSERT/UPDATE/REPLACE/DELETE statements. All client write operations must be going through the Layer. (SELECTs should go through the Layer, but that does not matter for this discussion.)

I will discuss 'simple' write operations only.

Around every atomic operation, you need to add the following: 
    ⚈  Lock (read-lock) on Migrate 
    ⚈  Fetch the row from Migrate 
    ⚈  If not running, then skip most of these steps. (Alter has not yet started, or has finished.) 
    ⚈  Perform the SQL statement on old table 
    ⚈  Modify statement to include AND id <= $left_off, and perform it on new table. 
    ⚈  Unlock

For "transactions" in BEGIN..COMMIT, either make the whole transaction a single API call, or carefully coordinate the Lock/Unlock with BEGIN..COMMIT. BEGIN should map to the Lock and Fetch steps; COMMIT should map to the Unlock step.

REPLACE should be split into what it actually does -- a DELETE and an INSERT. Both of these sound be inside a single Lock/Unlock pair.

INSERT .. ON DUPLICATE KEY UPDATE .., especially if including a SELECT can probably be broken into two steps as discussed in [staging_table#normalization][High Speed Ingestion]]

As noted earlier, some multi-row and multi-table write operations get more complicated.

Helper Table: Migrate

This table has only 1 row.

      CREATE TABLE Migrate (
running TINYINT UNSIGNED NOT NULL DEFAULT '0',
clump_size INT UNSIGNED NOT NULL DEFAULT '100',
delay FLOAT NOT NULL DEFAULT '1',
left_off_1 ... (INT, VARCHAR, etc, matching first field in KEY being used)
left_off_2 ... (more field(s), if needed)
clumps_moved INT UNSIGNED NOT NULL DEFAULT '0',
rows_moved BIGINT UNSIGNED NOT NULL DEFAULT '0',
lock_time DOUBLE NOT NULL DEFAULT '0',
move_time DOUBLE NOT NULL DEFAULT '0',
sleep_time DOUBLE NOT NULL DEFAULT '0',
last_move TIMESTAMP NOT NULL
) ENGINE=MyISAM;

Discussion: 
    ⚈  running is 1 while Alter is running. INSERTs, etc, check this to know whether to INSERT into the new table. 
    ⚈  clump_size says how many rows to move per iteration; "100" is used as an example. This can be dynamically tuned. 
    ⚈  delay is how long to sleep(), in seconds, between iterations; 1 second is a good first guess. This is tunable. 
    ⚈  left_off_* contain the value(s) of the last key used by the last clump. (Initialize to 0 or '', or whatever is less than first key.) 
    ⚈  clumps_moved and rows_moved tally the progress. 
    ⚈  the *_time fields keep track of the seconds consumed by the phases of the loop. 
    ⚈  last_move is a timestamp, overwritten at the end of each iteration. At the end of Alter, it is the finish time. (If the value is too long ago, but not finished, maybe it is broken.) 
    ⚈  Engine=MyISAM was chosen to avoid any risk of involving this in BEGIN..COMMIT transactions. 
    ⚈  Replicating the table is optional; it has no function on Slaves. 
    ⚈  It was not intended to JOIN this table with any other; I don't know if "left_off_*" is useful.

The table must be CREATEd before the Layer has the code in it that looks at the table. The imporant value (at that point) is running=0.

Tuning on the fly

What to watch... 
    ⚈  In Slaves, keep an eye on Seconds_behind_master (or a heartbeat if you have such) 
    ⚈  (lock_time / clumps_moved) -- This should be close to zero; if it is high, then Client processes are not getting enough time. 
    ⚈  (move_time / clumps_moved) -- The average time that it takes to move a clump. This impacts latency of Client write operations. 
    ⚈  sleep_time versus move_time -- which is getting more time, the Clients, or the Alter.

The following two values in the Migrate table can be changed at will; changes take effect at the start of the next clump.

clump_size (nominally 100) effectively controls how long INSERTs and other writes are blocked at a time. Decrease during busy times; increase during lulls.

delay (nominally 1 second; declared FLOAT), controls the gap between clumps. It is intended to let other operations get in. Also, it somewhat controls the overloading of the replication stream. Increase delay if replication gets behind.

Deactivation

After the Alter is finished, the code automatically "deactivates" (by setting running=0) itself. No further writes are needed to the new table (which has been RENAMEd out of existence anyway).

However, the Layer is still checking Migrate, but this is no longer necessary. To clean up the code, you need another code push. This code push should be no more complicated than turning off a flag (a code flag, not the dynamic 'running' flag). The flag should have been built in when Alter is originally written. It will control whether to look at the Migrate table, or simply follow the "not running" path and skip the lock and unlock.

I recommend leaving these vestiges of the code in place; you are likely to do another Alter in the future. Meanwhile, the code will be sufficiently efficient. (A few "if false" statements are insignifant.)

Alternative - Log table

If the table to be Altered is a simple, append-only, "log", then the job can be done a lot simpler. There is no need for the Database Layer (though that might be best practice). The Migrate script can probably be run faster.

⚈  Migrate 1K-10K records per iteration, no extra locking needed. 
    ⚈  Key being used must be the one for which new rows are sent to the 'end' of the table. 
    ⚈  The final step that involves the RENAME gets tricky.

The only step that needs careful design (and I don't have the details ironed out...)

Plan A. When the Migrate script discovers that there are no rows left to move, it LOCKs the existing and new tables, does the RENAME, and UNLOCKs. (Could some INSERT slip in? Especially for InnoDB?)

Plan B. When the Migrate script discovers that there are no rows left to move, it does the RENAME, then checks for any extra rows that snuck in. If there are extra rows, it moves them. (What happens to any INSERTs that are trying to run when the RENAME occurs? Maybe they wait? Maybe they die? Does your code automatically retry in that case?)

Alternative - Dual Master

If you are set up with a pair of Masters replicating from each other (for quick recovery from a crash), then this is a much simpler approach. Or it is simpler for the developers -- it can be done entirely by the SEs.

The idea is to do some ALTER on offline server(s), failover, then do the rest of the ALTER(s).

Caveat: This only applies if the replication stream will not be confused by a Master having the 'old' schema, and the a Slave having the 'new' schema. (Note: RBR will probably be messed up by COLUMN changes, but not INDEX changes?) If any INSERT or UPDATE that would be confused, then you need to use the main approach so that you can have two flavors of statement and know which to apply to the existing versus the new tables.

Caution: Replication will need to be stopped or replication will be stuck, perhaps for as long as the ALTER takes. If the ALTER takes a week, you need enough disk space for a week's worth of binlogs and/or relay logs.

Caution: A really big table could take _days_ to ALTER. There will be multiple (different machines), non-overlaping, ALTERs. Plan accordingly.

Plan A -- Allow the ALTER to replicate, but carefully staged: The idea is to run the ALTER on the Backup Master, and do the failover just when it is ready to start on the Live Master. You can spot that time by knowing when the ALTER finishes on the Backup Master. If the ALTER turns out to be "wrong", fail back to the original Live Master and lick your wounds. Now you have the equivalent of the Backup Master having crashed.

Plan B -- Manually do one machine at a time. This requires SET SESSION SQL_LOG_BIN=0; in front of the ALTER each time. 
    1.  One at a time, take a Slave out of rotation and do the ALTER on it. Do multiple Slaves at a time, but not more than you can afford to have offline (Out Of Rotation) simultaneously. 
    2.  Ditto for the Backup Master. Note: Any slaves connected to this server will become 'stale', so should be offline. 
    3.  Failover to the Backup Master 
    4.  ALTER what had been the Live Master. Again note that its slaves should be offline for the length of the ALTER.

Caveat: The Backup Master will be unavailable for failover during the ALTER. However, you can abort the ALTER if you need to failover.

Caveat: Any machine offline should probably have replication turned on (though stuck) so that the I/O thread is receiving updates. This should avoid having binlogs purged out from under you. However, it means that the receiving machine will have a pileup in the relay logs -- be sure to have enough disk space for such.

Postlog

Original writing -- Feb, 2011;   Refreshed: Sep, 2017


-- Rick James

学习笔记:ALTERing a Huge MySQL Table - 对一个超大表做alter调整的更多相关文章

  1. VSTO学习笔记(八)向 Word 2010 中写入表结构

    原文:VSTO学习笔记(八)向 Word 2010 中写入表结构 前几天公司在做CMMI 3级认证,需要提交一系列的Word文档,其中有一种文档要求添加公司几个系统的数据库中的表结构.我临时接到了这项 ...

  2. .NET CORE学习笔记系列(2)——依赖注入[4]: 创建一个简易版的DI框架[上篇]

    原文https://www.cnblogs.com/artech/p/net-core-di-04.html 本系列文章旨在剖析.NET Core的依赖注入框架的实现原理,到目前为止我们通过三篇文章从 ...

  3. Spring MVC 学习笔记2 - 利用Spring Tool Suite创建一个web 项目

    Spring MVC 学习笔记2 - 利用Spring Tool Suite创建一个web 项目 Spring Tool Suite 是一个带有全套的Spring相关支持功能的Eclipse插件包. ...

  4. Linux内核分析第七周学习笔记——Linux内核如何装载和启动一个可执行程序

    Linux内核分析第七周学习笔记--Linux内核如何装载和启动一个可执行程序 zl + 原创作品转载请注明出处 + <Linux内核分析>MOOC课程http://mooc.study. ...

  5. Django 学习笔记(六)MySQL配置

    环境:Ubuntu16.4 工具:Python3.5 一.安装MySQL数据库 终端命令: sudo apt-get install mysql-server sudo apt-get install ...

  6. Python学习笔记整理总结【MySQL】

    一. 数据库介绍 1.什么是数据库?数据库(Database)是按照数据结构来组织.存储和管理数据的仓库.每个数据库都有一个或多个不同的API用于创建,访问,管理,搜索和复制所保存的数据.我们也可以将 ...

  7. MYSQL学习笔记——数据库范式及MYSQL优化整体思路

    一.数据库范式                                                                               为了建立冗余较小.结构合理的 ...

  8. 数据库学习笔记(二)MySQL数据库进阶

    MySQL 进阶 关于连表 左右连表: join 上下连表: union #自动去重 (当两张表里的数据,有重复的才会自动去重) union all #不去重 #上下连表示例: select sid, ...

  9. [原创]java WEB学习笔记49:文件上传基础,基于表单的文件上传,使用fileuoload 组件

    本博客为原创:综合 尚硅谷(http://www.atguigu.com)的系统教程(深表感谢)和 网络上的现有资源(博客,文档,图书等),资源的出处我会标明 本博客的目的:①总结自己的学习过程,相当 ...

随机推荐

  1. Maven 打包遇到的问题

    [ERROR] No compiler is provided in this environment. Perhaps you are running on a JRE rather than a ...

  2. redis学习(一) redis的介绍与安装

    redis简单介绍 redis全称remote-dictionary-server 直译为远程字典服务器, 是一个高性能的key-value存储系统,也被称为数据结构服务器,因为其所存储的数据值(va ...

  3. 复刻smartbits的国产网络测试工具minismb-使用burst模式

    复刻smartbits的国产网络性能测试工具minismb,是一款专门用于测试智能路由器,网络交换机的性能和稳定性的软硬件相结合的工具.可以通过此工具测试任何ip网络设备的端口吞吐率,带宽,并发连接数 ...

  4. windows环境下搭建Java开发环境(二):Tomcat安装和配置

    一.资源下载 官网:http://tomcat.apache.org/ 本人安装的是Tomcat8.5,安装包百度云资源:链接:https://pan.baidu.com/s/17SDFsoS0yAP ...

  5. SpringMvc RequestMappingHandlerMapping

    RequestMappingHandlerMapping是SpringMvc中一个比较核心的类,查看下它的类结构图: InitializingBean是个很神奇的接口,在Spring每个容器的bean ...

  6. MySQL升5.6引发的问题

    昨天项目MySQL数据库从5.5升级到5.6,导致部分表无法进行更新操作,报如下错误: When @@GLOBAL.ENFORCE_GTID_CONSISTENCY = , updates to no ...

  7. C# Web 数据注解Data Annotations、模型状态ModelState、数据验证

    C#中的模型状态与数据注解,为我们提供了很便利的请求数据的验证. 1. ModelState ModelState在进行数据验证的时候很有用的,它是: 1)验证数据,以及保存数据对应的错误信息. 2) ...

  8. [转]Easily Add a Ribbon into a WinForms Application

    本文转自:https://www.codeproject.com/articles/364272/easily-add-a-ribbon-into-a-winforms-application-cs ...

  9. wpf 控件大小随窗体大小改变而改变

    WPF可以直接通过设置图形类控件的水平和垂直Alighment为Stretch实现用一个ViewBox装上所有的Window内容然后当window缩放时就可以一起放大缩小了ViewBox的显示机制是, ...

  10. asp.net错误记录

    //登录 jquery-ajax post请求后台无法获取数据,get就可以!!!原因:get方法,后台接收用Request.QueryString["para1"];post方法 ...