MHA在线切换的步骤及原理
在日常工作中,会碰到如下的场景,如mysql数据库升级,主服务器硬件升级等,这个时候就需要将写操作切换到另外一台服务器上,那么如何进行在线切换呢?同时,要求切换过程短,对业务的影响比较小。
MHA就提供了这样一种优雅的方式,只会堵塞业务0.5~2s的时间,在这段时间内,业务无法读取和写入。
集群信息
角色 IP地址 ServerID 类型
Master 192.168.244.10 1 写入
Candicate master 192.168.244.20 2 读
Slave 192.168.244.30 3 读
Monitor host 192.168.244.40 监控集群组
MHA具体的搭建步骤和原理,可参考另外一篇博客:
在线切换的步骤
1. 关闭MHA监控
# masterha_stop --conf=/etc/masterha/app1.cnf
2. 在线切换
# /usr/local/bin/masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.244.20 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000
其中,
--orig_master_is_new_slave是将原master切换为新主的slave,默认情况下,是不添加的。
--running_updates_limit默认为1s,即如果主从延迟时间(Seconds_Behind_Master),或master show processlist中dml操作大于1s,则不会执行切换。
在线切换的输出
Tue Apr :: - [info] MHA::MasterRotate version 0.56.
Tue Apr :: - [info] Starting online master switch..
Tue Apr :: - [info]
Tue Apr :: - [info] * Phase : Configuration Check Phase..
Tue Apr :: - [info]
Tue Apr :: - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Apr :: - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Tue Apr :: - [info] Reading server configuration from /etc/masterha/app1.cnf..
Tue Apr :: - [info] GTID failover mode =
Tue Apr :: - [info] Current Alive Master: 192.168.244.10(192.168.244.10:)
Tue Apr :: - [info] Alive Slaves:
Tue Apr :: - [info] 192.168.244.20(192.168.244.20:) Version=5.6.-log (oldest major version between slaves) log
-bin:enabledTue Apr :: - [info] Replicating from 192.168.244.10(192.168.244.10:)
Tue Apr :: - [info] Primary candidate for the new Master (candidate_master is set)
Tue Apr :: - [info] 192.168.244.30(192.168.244.30:) Version=5.6.-log (oldest major version between slaves) log
-bin:enabledTue Apr :: - [info] Replicating from 192.168.244.10(192.168.244.10:) It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.244.10(192.168
.244.10:)? (YES/no): yes
Tue Apr :: - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Tue Apr :: - [info] ok.
Tue Apr :: - [info] Checking MHA is not monitoring or doing failover..
Tue Apr :: - [info] Checking replication health on 192.168.244.20..
Tue Apr :: - [info] ok.
Tue Apr :: - [info] Checking replication health on 192.168.244.30..
Tue Apr :: - [info] ok.
Tue Apr :: - [info] 192.168.244.20 can be new master.
Tue Apr :: - [info]
From:
192.168.244.10(192.168.244.10:) (current master)
+--192.168.244.20(192.168.244.20:)
+--192.168.244.30(192.168.244.30:) To:
192.168.244.20(192.168.244.20:) (new master)
+--192.168.244.30(192.168.244.30:)
+--192.168.244.10(192.168.244.10:) Starting master switch from 192.168.244.10(192.168.244.10:) to 192.168.244.20(192.168.244.20:)? (yes/NO): yes
Tue Apr :: - [info] Checking whether 192.168.244.20(192.168.244.20:) is ok for the new master..
Tue Apr :: - [info] ok.
Tue Apr :: - [info] 192.168.244.10(192.168.244.10:): SHOW SLAVE STATUS returned empty result. To check replication
filtering rules, temporarily executing CHANGE MASTER to a dummy host.Tue Apr :: - [info] 192.168.244.10(192.168.244.10:): Resetting slave pointing to the dummy host.
Tue Apr :: - [info] ** Phase : Configuration Check Phase completed.
Tue Apr :: - [info]
Tue Apr :: - [info] * Phase : Rejecting updates Phase..
Tue Apr :: - [info]
Tue Apr :: - [info] Executing master ip online change script to disable write on the current master:
Tue Apr :: - [info] /usr/local/bin/master_ip_online_change --command=stop --orig_master_host=192.168.244.10 --orig_ma
ster_ip=192.168.244.10 --orig_master_port= --orig_master_user='monitor' --orig_master_password='monitor123' --new_master_host=192.168.244.20 --new_master_ip=192.168.244.20 --new_master_port= --new_master_user='monitor' --new_master_password='monitor123' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slaveTue Apr :: Set read_only on the new master.. ok.
Tue Apr :: Set read_only= on the orig master.. ok.
Tue Apr :: Killing all application threads..
Tue Apr :: done.
Disabling the VIP an old master: 192.168.244.10
SIOCSIFFLAGS: Cannot assign requested address
Tue Apr :: - [info] ok.
Tue Apr :: - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Tue Apr :: - [info] Executing FLUSH TABLES WITH READ LOCK..
Tue Apr :: - [info] ok.
Tue Apr :: - [info] Orig master binlog:pos is mysql-bin.:.
Tue Apr :: - [info] Waiting to execute all relay logs on 192.168.244.20(192.168.244.20:)..
Tue Apr :: - [info] master_pos_wait(mysql-bin.:) completed on 192.168.244.20(192.168.244.20:). Executed
events.Tue Apr :: - [info] done.
Tue Apr :: - [info] Getting new master's binlog name and position..
Tue Apr :: - [info] mysql-bin.:
Tue Apr :: - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_
HOST='192.168.244.20', MASTER_PORT=, MASTER_LOG_FILE='mysql-bin.000009', MASTER_LOG_POS=, MASTER_USER='repl', MASTER_PASSWORD='xxx';Tue Apr :: - [info] Executing master ip online change script to allow write on the new master:
Tue Apr :: - [info] /usr/local/bin/master_ip_online_change --command=start --orig_master_host=192.168.244.10 --orig_m
aster_ip=192.168.244.10 --orig_master_port= --orig_master_user='monitor' --orig_master_password='monitor123' --new_master_host=192.168.244.20 --new_master_ip=192.168.244.20 --new_master_port= --new_master_user='monitor' --new_master_password='monitor123' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slaveTue Apr :: Set read_only= on the new master.
Enabling the VIP 192.168.244.188 on the new master: 192.168.244.20
Tue Apr :: - [info] ok.
Tue Apr :: - [info]
Tue Apr :: - [info] * Switching slaves in parallel..
Tue Apr :: - [info]
Tue Apr :: - [info] -- Slave switch on host 192.168.244.30(192.168.244.30:) started, pid:
Tue Apr :: - [info]
Tue Apr :: - [info] Log messages from 192.168.244.30 ...
Tue Apr :: - [info]
Tue Apr :: - [info] Waiting to execute all relay logs on 192.168.244.30(192.168.244.30:)..
Tue Apr :: - [info] master_pos_wait(mysql-bin.:) completed on 192.168.244.30(192.168.244.30:). Executed
events.Tue Apr :: - [info] done.
Tue Apr :: - [info] Resetting slave 192.168.244.30(192.168.244.30:) and starting replication from the new master
92.168.244.20(192.168.244.20:)..Tue Apr :: - [info] Executed CHANGE MASTER.
Tue Apr :: - [info] Slave started.
Tue Apr :: - [info] End of log messages from 192.168.244.30 ...
Tue Apr :: - [info]
Tue Apr :: - [info] -- Slave switch on host 192.168.244.30(192.168.244.30:) succeeded.
Tue Apr :: - [info] Unlocking all tables on the orig master:
Tue Apr :: - [info] Executing UNLOCK TABLES..
Tue Apr :: - [info] ok.
Tue Apr :: - [info] Starting orig master as a new slave..
Tue Apr :: - [info] Resetting slave 192.168.244.10(192.168.244.10:) and starting replication from the new master
92.168.244.20(192.168.244.20:)..Tue Apr :: - [info] Executed CHANGE MASTER.
Tue Apr :: - [info] Slave started.
Tue Apr :: - [info] All new slave servers switched successfully.
Tue Apr :: - [info]
Tue Apr :: - [info] * Phase : New master cleanup phase..
Tue Apr :: - [info]
Tue Apr :: - [info] 192.168.244.20: Resetting slave info succeeded.
Tue Apr :: - [info] Switching master to 192.168.244.20(192.168.244.20:) completed successfully.
MHA在线切换的原理
1. 检查当前的配置信息及主从服务器的信息
包括读取MHA的配置文件/etc/masterha/app1.cnf及检查当前slave的健康状态
2. 阻止对当前master的更新
主要通过如下步骤:
1> 等待1.5s($time_until_kill_threads*100ms),等待当前连接断开。
2> 执行 read_only=1,阻止新的DML操作
3> 等待0.5s,等待当前DML操作完成。
4> kill掉所有连接。
5> FLUSH NO_WRITE_TO_BINLOG TABLES
6> FLUSH TABLES WITH READ LOCK
3. 等待新master执行完所有的relay log
Waiting to execute all relay logs on 192.168.244.20(192.168.244.20:)..
4. 将新master的read_only设置为off,并添加VIP
5. slave切换到新master上。
1> 等待slave(192.168.244.30)应用完原主从复制产生的relay log,然后执行change master操作切换到新master上。
2> 释放原master上加的锁。
3> 因masterha_master_switch命令行中带有--orig_master_is_new_slave参数,故原master也切换为新master的从。
6. 清理新master的相关信息。
主要是执行了reset slave all操作,清除之前的复制信息。
MHA在线切换需满足的条件
MHA在执行在线切换之前,会判断当前的主从复制信息,只有满足了以下条件,才能执行切换动作:
1. 所有SLAVE的IO线程和SQL线程都在运行。
2. 所有slave的Seconds_Behind_Master小于或等于running_updates_limit的值,该参数如果没有显示指定的话,则默认为1s
3. 在master上,通过show processlist输出,没有一个DML操作的时间大于running_updates_limit的值。
在线切换时,打开general log,各个服务器的操作信息
注:在执行masterha_master_switch命令时,会有两次确认操作
1. It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.244.10(192.168
.244.10:3306)? (YES/no):
2. Starting master switch from 192.168.244.10(192.168.244.10:3306) to 192.168.244.20(192.168.244.20:3306)? (yes/NO):
以下输出中间都有两次空白,其中第一次空白之前的输出对应第一次确认之前,第二次之前的输出对应第二次确认之前。
原master 192.168.244.10
:: Connect monitor@node4 on
Query set autocommit=
Query SELECT CONNECTION_ID() AS Value
:: Connect monitor@node4 on
Query set autocommit=
Query SELECT CONNECTION_ID() AS Value
Query SET wait_timeout=
Query SELECT @@global.server_id As Value
Query SELECT VERSION() AS Value
Query SELECT @@global.gtid_mode As Value
Query SHOW GLOBAL VARIABLES LIKE 'log_bin'
Query SHOW MASTER STATUS
Query SELECT @@global.datadir AS Value
Query SELECT @@global.slave_parallel_workers AS Value
Query SHOW SLAVE STATUS
Query SELECT @@global.read_only As Value
Query SELECT @@global.relay_log_purge As Value :: Query FLUSH NO_WRITE_TO_BINLOG TABLES
Query SELECT GET_LOCK('MHA_Master_High_Availability_Monitor', '') AS Value
Query SHOW PROCESSLIST :: Query SHOW SLAVE STATUS
Query CHANGE MASTER TO MASTER_HOST='dummy_host'
:: Query SHOW SLAVE STATUS
Query RESET SLAVE /*!50516 ALL */
Query SELECT RELEASE_LOCK('MHA_Master_High_Availability_Monitor') As Value
Quit
Connect monitor@node4 on
Query set autocommit=
Query SELECT CONNECTION_ID() AS Value
Query SET sql_log_bin=
Query SHOW PROCESSLIST
Query SELECT @@global.read_only As Value
Query SET GLOBAL read_only=
Query SELECT @@global.read_only As Value
Query SHOW PROCESSLIST
Query SET sql_log_bin=
Quit
Connect monitor@node4 on
Query set autocommit=
Query SELECT CONNECTION_ID() AS Value
Query SET wait_timeout=
Query FLUSH TABLES WITH READ LOCK
Query SHOW MASTER STATUS
:: Query UNLOCK TABLES
Query CHANGE MASTER TO MASTER_HOST = '192.168.244.20' MASTER_USER = 'repl' MASTER_PASSWORD = <secret> MASTE
R_PORT = MASTER_LOG_FILE = 'mysql-bin.000010' MASTER_LOG_POS = Query SET GLOBAL relay_log_purge=
Query START SLAVE
Connect Out repl@192.168.244.20:
Query SHOW SLAVE STATUS
Query SELECT RELEASE_LOCK('MHA_Master_High_Availability_Failover') As Value
Quit
新master 192.168.244.20
:: Connect monitor@node4 on
Query set autocommit=
Query SELECT CONNECTION_ID() AS Value
:: Connect monitor@node4 on
Query set autocommit=
Query SELECT CONNECTION_ID() AS Value
Query SET wait_timeout=
Query SELECT @@global.server_id As Value
Query SELECT VERSION() AS Value
Query SELECT @@global.gtid_mode As Value
Query SHOW GLOBAL VARIABLES LIKE 'log_bin'
Query SHOW MASTER STATUS
Query SELECT @@global.datadir AS Value
Query SELECT @@global.slave_parallel_workers AS Value
Query SHOW SLAVE STATUS
Query SELECT @@global.read_only As Value
Query SELECT @@global.relay_log_purge As Value
Query SELECT @@global.relay_log_info_repository AS Value
Query SELECT @@global.datadir AS Value
Query SELECT @@global.relay_log_info_file AS Value
Query SHOW SLAVE STATUS
Query SELECT Repl_slave_priv AS Value FROM mysql.user WHERE user = 'repl' :: Query SELECT GET_LOCK('MHA_Master_High_Availability_Failover', '') AS Value
Query SHOW SLAVE STATUS
Query SHOW SLAVE STATUS :: Query SHOW PROCESSLIST
Connect monitor@node4 on
Query set autocommit=
Query SELECT CONNECTION_ID() AS Value
Query SELECT @@global.read_only As Value
Query SELECT @@global.read_only As Value
Quit
Query SHOW SLAVE STATUS
Query SELECT MASTER_POS_WAIT('mysql-bin.000017','',) AS Result
Query STOP SLAVE SQL_THREAD
Query SHOW SLAVE STATUS
Query SHOW MASTER STATUS
Connect monitor@node4 on
Query set autocommit=
Query SELECT CONNECTION_ID() AS Value
Query SET sql_log_bin=
Query SELECT @@global.read_only As Value
Query SET GLOBAL read_only=
Query SET sql_log_bin=
Quit
Query SELECT @@global.read_only As Value
Connect repl@node3 on
Query SELECT UNIX_TIMESTAMP()
Query SHOW VARIABLES LIKE 'SERVER_ID'
Query SET @master_heartbeat_period=
Query SET @master_binlog_checksum= @@global.binlog_checksum
Query SELECT @master_binlog_checksum
Query SELECT @@GLOBAL.GTID_MODE
Query SHOW VARIABLES LIKE 'SERVER_UUID'
Query SET @slave_uuid= '8a1093c8-1d00-11e7-954f-000c299a5715'
Binlog Dump Log: 'mysql-bin.000010' Pos:
:: Connect repl@node1 on
Query SELECT UNIX_TIMESTAMP()
Query SHOW VARIABLES LIKE 'SERVER_ID'
Query SET @master_heartbeat_period=
Query SET @master_binlog_checksum= @@global.binlog_checksum
Query SELECT @master_binlog_checksum
Query SELECT @@GLOBAL.GTID_MODE
Query SHOW VARIABLES LIKE 'SERVER_UUID'
Query STOP SLAVE
Query SET @slave_uuid= '2a6365e0-1d05-11e7-956d-000c29c64704'
Binlog Dump Log: 'mysql-bin.000010' Pos:
Query SHOW SLAVE STATUS
Query RESET SLAVE /*!50516 ALL */
Query SHOW SLAVE STATUS
Query SELECT RELEASE_LOCK('MHA_Master_High_Availability_Failover') As Value
Quit
slave 192.168.244.30
:: Connect monitor@node4 on
Query set autocommit=
Query SELECT CONNECTION_ID() AS Value
:: Connect monitor@node4 on
Query set autocommit=
Query SELECT CONNECTION_ID() AS Value
Query SET wait_timeout=
Query SELECT @@global.server_id As Value
Query SELECT VERSION() AS Value
Query SELECT @@global.gtid_mode As Value
Query SHOW GLOBAL VARIABLES LIKE 'log_bin'
Query SHOW MASTER STATUS
Query SELECT @@global.datadir AS Value
Query SELECT @@global.slave_parallel_workers AS Value
Query SHOW SLAVE STATUS
Query SELECT @@global.read_only As Value
Query SELECT @@global.relay_log_purge As Value
Query SELECT @@global.relay_log_info_repository AS Value
Query SELECT @@global.datadir AS Value
Query SELECT @@global.relay_log_info_file AS Value
Query SHOW SLAVE STATUS
Query SELECT Repl_slave_priv AS Value FROM mysql.user WHERE user = 'repl' :: Query SELECT GET_LOCK('MHA_Master_High_Availability_Failover', '') AS Value
Query SHOW SLAVE STATUS
Query SHOW SLAVE STATUS :: Query SHOW SLAVE STATUS
:: Query SHOW SLAVE STATUS
Query SELECT MASTER_POS_WAIT('mysql-bin.000017','',) AS Result
Query STOP SLAVE SQL_THREAD
Query SHOW SLAVE STATUS
Query STOP SLAVE
Query STOP SLAVE
Query SHOW SLAVE STATUS
Query RESET SLAVE
Query CHANGE MASTER TO MASTER_HOST = '192.168.244.20' MASTER_USER = 'repl' MASTER_PASSWORD = <secret> MASTE
R_PORT = MASTER_LOG_FILE = 'mysql-bin.000010' MASTER_LOG_POS = Query SET GLOBAL relay_log_purge=
Query START SLAVE
Connect Out repl@192.168.244.20:
Query SHOW SLAVE STATUS
:: Query SELECT RELEASE_LOCK('MHA_Master_High_Availability_Failover') As Value
Quit
参考
《深入浅出MySQL》
MHA在线切换的步骤及原理的更多相关文章
- MySQL高可用方案MHA在线切换的步骤及原理
在日常工作中,会碰到如下的场景,如mysql数据库升级,主服务器硬件升级等,这个时候就需要将写操作切换到另外一台服务器上,那么如何进行在线切换呢?同时,要求切换过程短,对业务的影响比较小. MHA就提 ...
- MHA在线切换过程
MHA 在线切换是MHA除了自动监控切换换提供的另外一种方式,多用于诸如硬件升级,MySQL数据库迁移等等.该方式提供快速切换和优雅的阻塞写入,无关关闭原有服务器,整个切换过程在0.5-2s 的时间左 ...
- MHA的在线切换后的一些总结(mha方案来自网络)
mha方案来自:http://www.cnblogs.com/xuanzhi201111/p/4231412.html MHA的在线切换 192.168.2.131 [root bin]$ maste ...
- MHA手动切换 原创1(主故障)
MHA提供了3种方式用于实现故障转移,分别自动故障转移,需要启用MHA监控: 在无监控的情况下的手动故障转移以及基于在线手动切换. 三种方式可以应对MySQL主从故障的任意场景.本文主要描述在无监控的 ...
- 使用DBMS_REDEFINITION在线切换普通表到分区表
随着数据库数据量的不断增长,有些表须要由普通的堆表转换为分区表的模式.有几种不同的方法来对此进行操作.诸如导出表数据,然后创建分区表再导入数据到分区表.使用EXCHANGE PARTITION方式来转 ...
- 关于mha手动切换的一些记录(mha方案来自网络)
mha方案出自:http://www.cnblogs.com/xuanzhi201111/p/4231412.html 当主服务器故障时,人工手动调用MHA来进行故障切换操作,具体命令如下: 先停MH ...
- (5.12)mysql高可用系列——复制中的在线切换GTID模式/增加节点/删除节点
目录 [0]需求 前提,已经假设好基于传统异步复制的主库和从库1. [0.1]传统异步切换成基于GTID的无损模式 [0.2]增加特殊要求的从库 [1]操作环境 [2]构建 复制->半同步复制 ...
- java动态编译 (java在线执行代码后端实现原理)(二)
在上一篇java动态编译 (java在线执行代码后端实现原理(一))文章中实现了 字符串编译成字节码,然后通过反射来运行代码的demo.这一篇文章提供一个如何防止死循环的代码占用cpu的问题. 思路: ...
- java动态编译 (java在线执行代码后端实现原理)
需求:要实现一个web网页中输入java代码,然后能知道编译结果以及执行结果 类似于菜鸟java在线工具的效果:https://c.runoob.com/compile/10 刚开始从什么概念都没有到 ...
随机推荐
- 老李推荐:第14章8节《MonkeyRunner源码剖析》 HierarchyViewer实现原理-获取控件列表并建立控件树 5
看这段代码之前还是请回到“图13-6-1 NotesList控件列表”中重温一下一个控件的每个属性名和值是怎么组织起来的: android.widget.FrameLayout@41901ab0 dr ...
- 庆祝POPTEST签约企业培训
庆祝POPTEST签约企业培训 POPTEST与众多培训企业进行技术PK,由于企业认可POPTEST的技术实力,从众多竞争对手中脱颖而出,成功中标清华控股子公司性能测试培训.
- (转)Java并发编程:并发容器之ConcurrentHashMap
下面这部分内容转载自: http://www.haogongju.net/art/2350374 JDK5中添加了新的concurrent包,相对同步容器而言,并发容器通过一些机制改进了并发性能.因为 ...
- Myeclipse8.5开发-安装一:Myeclipse8.5注册码生成程序
环境:Myeclipces8.5 1.安装Myeclipces8.5. 2.打开Myeclipces新建任意项目. 3.新建MyEclipseKeyGen.java类.点击运行,控制台输入您的注册名, ...
- ECP系统J2EE架构开发平台
一 体系结构 ECP平台是一个基于J2EE架构设计的大型分布式企业协同管理平台,通过采用成熟的J2EE的多层企业架构体系,充分保证了系统的健壮性.开放性和扩展性.可选择部署于多种系统环境,满足不同类型 ...
- JSSDK微信自定义分享
背景:15年之前的微信分享只需要加入一段js就可以实现.后来微信官方全部禁止了.现在的微信分享全部得使用jssdk. 一.分享功能: 在微信内(必须在微信里)打开网站页面,分享给朋友或者分享到朋友圈时 ...
- spring 动态创建数据源
项目需求如下,公司对外提供服务,公司本身有个主库,另外公司会为每个新客户创建一个数据库,客户的数据库地址,用户名,密码,都保存在主数据库中.由于不断有新的客户加入,所以要求,项目根据主数据库中的信息, ...
- 【stm32中断优先级--珍藏版】
看了这么久,一直不理解中断优先级,还有中断嵌套.stm32提供了多种嵌套方式,搞的我真是头昏脑涨. 今天终于看到了一个通俗解释中断优先级的博客.算是理解了一点吧. 原文地址:http://blog.s ...
- Android -- 贝塞尔实现水波纹动画(划重点!!)
1,昨天看到了一个挺好的ui效果,是使用贝塞尔曲线实现的,就和大家来分享分享,还有,在写博客的时候我经常会把自己在做某种效果时的一些问题给写出来,而不是像很多文章直接就给出了解决方法,这里给大家解释一 ...
- WPF 杂谈——入门介绍
对于WPF的技术笔者是又爱又恨.现在WPF的市场并不是很锦气.如果以WPF来吃饭的话,只怕会饿死在街头.同时现在向面WEB开发更是如火冲天.所以如果是新生的话,最好不要以WPF为主.做为选择性来学习一 ...