http://geek.rohitkalhans.com/2013/09/enhancedMTS-deepdive.html   科学上网

Introduction

Re-applying binary logs generated from highly concurrent master on the slave has always been an area of focus. It is important for various reasons. First, in real-time systems, it becomes extremely important for the slave to keep up with the master. This can only be guaranteed if the slaves’ performance in reapplying the transactions from the binary log is similar (or at-least comparable) to that of master, which is accepting queries directly from multiple clients. Second, in synchronous replication scenarios, having a fast slaves, aids in reducing the response times as seen by the clients to the master. This can be made possible by applying transactions from the binary log in parallel. However if left uncontrolled, a simple round-robin multi-threaded applying will lead to inconsistency and the slave will no longer be the exact replica of the leader.

The infamous out of order commit problem

The Out of order execution of transaction on the slave if left uncontrolled will lead to the slave diverging from the master. Here is an example: consider two transactions T1 and T2 being applied on an initial state.

On Master we apply T1 and T2 in that order.

State0: x= 1, y= 1
T1: { x:= Read(y);
x:= x+1;
Write(x);
Commit; }
State1: x= 2, y= 1
T2: { y:= Read(x);
y:=y+1;
Write(y);
Commit; }
State2: x= 2, y= 3

On the slave however these two transactions commit out of order (Say T2 and then T1).

State0: x= 1, y= 1
T2: { y:= Read(x);
y:= y+1;
Write(y);
Commit; }
State1: x= 1, y= 2
T1: { x:= Read(y);
x:=x+1;
Write(x);
Commit; }
State2: x= 3, y= 2

As we see above the final state state 2 is different in the two cases. Needless to say that we need to control the transactions that can execute in parallel.

Controlled parallelization

The above problem can be solved by controlling what transactions can be executed in parallel with the ones being executed by the slave. This means we need to have some kind of information in the transactions themselves. Interesting to note that we can use the information of parallelization from the master on the slave. Since we have multiple transactions committing at the same time on the master, we can store the information of the transactions that were in the "process of committing" when this transaction committed. Now let's define the phrase "process of committing".
 

The process of committing: On the slave we need to make sure that the transactions that we schedule for parallel execution will be the one which do not have conflicting read and write set. This is the only and the necessary requirement for the slave  workers to work without conflicts. This also implies that if the transactions being executed in parallel do not have intersecting read and write sets, we don't care if they are committed out of order. Since MySQL uses lock based scheduling, all the transactions that have entered the prepared stage but not as yet committed will have disjoint read and write sets and hence can be executed in parallel.

 

Logical clock and commit parent

We have introduced a logical clock. Now before I am tackled by a mathematician from one side and a computer engineer from the other, let me explain. It is a simple counter which is stepped when a binlog group of transaction commits on the master. Essentially this clock is stepped every time the leader execute the flush stage of binlog group commit. The value of this clock is recorded on each transaction when it enters the prepare stage. This recorded value is the "commit parent"
 

The pseudo code is as follows.

During Prepare
trx.commit_parent= commit_clock.get_timestamp();
 
During Commit
for every binlog group
  commit_clock.step();
As it is evident by now the transactions with the same commit parent follow our guiding principle of slave side parallelization i.e. transactions that have entered the prepared stage but has not as yet committed, and hence can be executed in parallel.

In the example we will take up three transactions (T1 T2 T3), two of which have been committed as a part of the same binlog group. T1 enters the prepare stage and get the commit parent as 0 since none of the group have been committed as yet. T1 assigns itself as the leader and then goes on to flush its transaction/statement cache. In the meanwhile transaction T2 enters the prepare stage. It is also assigned the same commit parent "0"(CP as used in the figure) since the commit clock has not as yet been stepped. T2 then goes on a wait for the leader to flush its cache in to the binlog. After the flush has been completed by the leader, it signals T2 to continue and both of them enter the Sync stage, where the leader thread  calls fsync() there by finishing the binlog commit process. The  transaction T3 however enters the prepare stage after the previous group has been synced and there-by ends up getting the next CP.

Another thing to note here is that the "group" of transactions that are being executed in parallel are not bounded by binlog commit group. There is a possibility that a transaction have entered the binlog prepare stage but could not make it to the current binlog group. Our approach takes care of such cases and makes sure that we relax the boundary of the group being executed in parallel on the slave.

 
On the slave we use the existing infrastructure of DB partitioned MTS to execute the tranactions in parallel, simply by modifying the scheduling logic.
 

Conclusion

This feature provides the great enhancement to the existing MySQL replication. To know more about the configuration options of this enhancement refer to this post.
This feature is available inMySQL 5.7.2 release. You can try it out and let us know the feedback.

 

About the author

Rohit Kalhans is a Software Development Engineer based out of Banglore India. His area of focus revolves around Row-based replication and Multi-threaded parallel slave. His interests include system programming, highly-available and scalable systems. In his free time, he plays guitar and piano and loves to write short stories and poems. More Information can be found on his homepage.

MySQL 5.7: Enhanced Multi-threaded slaves的更多相关文章

  1. MySQL复制配置(多主一从)

    复制多主一从 replicaion 原理 复制有三个步骤:(分为三个线程 slave:io线程 sql线程 master:io线程) 1.master将改变记录到二进制日志(binary log)中( ...

  2. MySql集群FAQ----mysql主从配置与集群区别、集群中需要多少台计算机呢?为什么? 等

    抽取一部分显示在这里,如下, What's the difference in using Clustervs using replication? 在复制系统中,一个MySQL主服务器会更新一个或多 ...

  3. MySQL数据很大的时候

    众所周知,mysql在数据量很大的时候查询的效率是很低的,因为假如你需要 OFFSET 100000 LIMIT 5 这样的数据,数据库就需要跳过前100000条数据,才能返回给你你需要的5条数据.由 ...

  4. mysql 2006

    1.在my.ini文件中添加或者修改以下两个变量:wait_timeout=2880000interactive_timeout = 2880000 关于两个变量的具体说明可以google或者看官方手 ...

  5. MySQL高可用架构:mysql+keepalived实现

    系统环境及架构 #主机名 系统版本 mysql版本 ip地址 mysqlMaster <a href="https://www.linuxprobe.com/" title= ...

  6. MySQL之备份

    MySQL备份和备份 备份/还原 冷备:需要停止当前正在运行mysqld,然后直接拷贝或打包数据文件. 半热备:mysqldump+binlog --适合数据量比较小的应用 在线热备:AB复制 --实 ...

  7. linux MySql 的主从复制部署

    MySql 复制 mysql 复制:将某一台主机上的 Mysql 数据复制到其它主机(slaves)上,并重新执行一遍从而实现 当前主机上的 mysql 数据与(master)主机上数据保持一致的过程 ...

  8. java面试一日一题:讲对mysql的MVCC的理解

    问题:请讲下对mysql中MVCC的理解 分析:这个问题要回答的是对MVCC的理解,以及MVCC解决了什么问题这几个方面入手. 回答要点: 主要从以下几点去考虑, 1.什么是MVCC? 2.MVCC用 ...

  9. BlackArch-Tools

    BlackArch-Tools 简介 安装在ArchLinux之上添加存储库从blackarch存储库安装工具替代安装方法BlackArch Linux Complete Tools List 简介 ...

随机推荐

  1. 转载: Vim 练级攻略

    转自:http://coolshell.cn/articles/5426.html  酷壳 vim的学习曲线相当的大(参看各种文本编辑器的学习曲线),所以,如果你一开始看到的是一大堆VIM的命令分类, ...

  2. A Blind Watermarking for 3-D Dynamic Mesh Model Using Distribution of Temporal Wavelet Coefficients

    这周看了一篇动态网格序列水印的论文,由于目前在网格序列上做水印的工作特别少,加之我所看的这篇论文中的叙述相对简洁,理解起来颇为困难.好在请教了博士师兄,思路明朗了许多,也就把这思路整理在此了. 论文作 ...

  3. bzoj1150

    haha,贪心,边界条件折腾了我一会儿 #include<cstdio> #include<cctype> #include<queue> #include< ...

  4. Windows 窗体—— 键盘输入工作原理

    方法 注释 PreFilterMessage 此方法在应用程序级截获排队的(也称为已发送的)Windows 消息. PreProcessMessage 此方法在 Windows 消息处理前在窗体和控件 ...

  5. Using FastCGI to Host PHP Applications on IIS 7 -IIS7 怎么配置 PHP5

    This article describes how to configure the FastCGI module and PHP to host PHP applications on IIS 7 ...

  6. centos6.4 安装erlang

    erlang官网: http://www.erlang.org 下载程序去年:

  7. libpcap/wwinpcap

    winpcap(windows packet capture)是windows平台下一个免费,公共的网络访问系统.开发winpcap这个项目的目的在于为win32应用程序提供访问网络底层的能力.win ...

  8. Nuget~让包包带上自己的配置信息

    我们知道一般开发组件之后,组件都有相关配置项,最常见的作法就是把它写到web.config里,而如果你将这个文件直接放到nuget里打包,在进行安装包包时,会提示你这个文件已经存在,不能去覆盖原来的c ...

  9. 数据库MySQL常用命令复习

    -- 查看数据库 show databases; -- 创建数据库 create database '数据库名'; -- 删除数据库 drop database '数据库名'; -- 选库 use ' ...

  10. 实现CCLayer只显示一个矩形可见区域

    转自:http://blog.csdn.net/while0/article/details/11004147 CCLayer的区域可能会比较大,怎样让它只显示其中一部分区域呢?  这个还是有很多场景 ...