mha方案来自:http://www.cnblogs.com/xuanzhi201111/p/4231412.html

MHA的在线切换

192.168.2.131 [root bin]$ masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.2.129 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000
Mon Jan 19 01:51:39 2015 - [info] MHA::MasterRotate version 0.56.
Mon Jan 19 01:51:39 2015 - [info] Starting online master switch..
Mon Jan 19 01:51:39 2015 - [info]
Mon Jan 19 01:51:39 2015 - [info] * Phase 1: Configuration Check Phase..
Mon Jan 19 01:51:39 2015 - [info]
Mon Jan 19 01:51:39 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jan 19 01:51:39 2015 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Jan 19 01:51:39 2015 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Jan 19 01:51:39 2015 - [info] GTID failover mode = 0
Mon Jan 19 01:51:39 2015 - [info] Current Alive Master: 192.168.2.128(192.168.2.128:3306)
Mon Jan 19 01:51:39 2015 - [info] Alive Slaves:
Mon Jan 19 01:51:39 2015 - [info] 192.168.2.129(192.168.2.129:3306) Version=5.5.30-log (oldest major version between slaves) log-bin:enabled
Mon Jan 19 01:51:39 2015 - [info] Replicating from 192.168.2.128(192.168.2.128:3306)
Mon Jan 19 01:51:39 2015 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jan 19 01:51:39 2015 - [info] 192.168.2.130(192.168.2.130:3306) Version=5.5.25-log (oldest major version between slaves) log-bin:enabled
Mon Jan 19 01:51:39 2015 - [info] Replicating from 192.168.2.128(192.168.2.128:3306)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.2.128(192.168.2.128:3306)? (YES/no): yes
Mon Jan 19 01:51:46 2015 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Mon Jan 19 01:51:46 2015 - [info] ok.
Mon Jan 19 01:51:46 2015 - [info] Checking MHA is not monitoring or doing failover..
Mon Jan 19 01:51:46 2015 - [info] Checking replication health on 192.168.2.129..
Mon Jan 19 01:51:46 2015 - [info] ok.
Mon Jan 19 01:51:46 2015 - [info] Checking replication health on 192.168.2.130..
Mon Jan 19 01:51:46 2015 - [info] ok.
Mon Jan 19 01:51:46 2015 - [info] 192.168.2.129 can be new master.
Mon Jan 19 01:51:46 2015 - [info]
From:
192.168.2.128(192.168.2.128:3306) (current master)
+--192.168.2.129(192.168.2.129:3306)
+--192.168.2.130(192.168.2.130:3306)

To:
192.168.2.129(192.168.2.129:3306) (new master)
+--192.168.2.130(192.168.2.130:3306)
+--192.168.2.128(192.168.2.128:3306)

Starting master switch from 192.168.2.128(192.168.2.128:3306) to 192.168.2.129(192.168.2.129:3306)? (yes/NO): yes
Mon Jan 19 01:51:50 2015 - [info] Checking whether 192.168.2.129(192.168.2.129:3306) is ok for the new master..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] 192.168.2.128(192.168.2.128:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Mon Jan 19 01:51:50 2015 - [info] 192.168.2.128(192.168.2.128:3306): Resetting slave pointing to the dummy host.
Mon Jan 19 01:51:50 2015 - [info] ** Phase 1: Configuration Check Phase completed.
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] * Phase 2: Rejecting updates Phase..
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] Executing master ip online change script to disable write on the current master:
Mon Jan 19 01:51:50 2015 - [info] /usr/local/bin/master_ip_online_change --command=stop --orig_master_host=192.168.2.128 --orig_master_ip=192.168.2.128 --orig_master_port=3306 --orig_master_user='root' --orig_master_password='123456' --new_master_host=192.168.2.129 --new_master_ip=192.168.2.129 --new_master_port=3306 --new_master_user='root' --new_master_password='123456' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave
Mon Jan 19 01:51:50 2015 173112 Set read_only on the new master.. ok.
Mon Jan 19 01:51:50 2015 178943 Drpping app user on the orig master..
Mon Jan 19 01:51:50 2015 180438 Set read_only=1 on the orig master.. ok.
Mon Jan 19 01:51:50 2015 183258 Killing all application threads..
Mon Jan 19 01:51:50 2015 183387 done.
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Mon Jan 19 01:51:50 2015 - [info] Executing FLUSH TABLES WITH READ LOCK..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] Orig master binlog:pos is mysql-bin.000017:107.
Mon Jan 19 01:51:50 2015 - [info] Waiting to execute all relay logs on 192.168.2.129(192.168.2.129:3306)..
Mon Jan 19 01:51:50 2015 - [info] master_pos_wait(mysql-bin.000017:107) completed on 192.168.2.129(192.168.2.129:3306). Executed 0 events.
Mon Jan 19 01:51:50 2015 - [info] done.
Mon Jan 19 01:51:50 2015 - [info] Getting new master's binlog name and position..
Mon Jan 19 01:51:50 2015 - [info] mysql-bin.000005:61791
Mon Jan 19 01:51:50 2015 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.2.129', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000005', MASTER_LOG_POS=61791, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Mon Jan 19 01:51:50 2015 - [info] Executing master ip online change script to allow write on the new master:
Mon Jan 19 01:51:50 2015 - [info] /usr/local/bin/master_ip_online_change --command=start --orig_master_host=192.168.2.128 --orig_master_ip=192.168.2.128 --orig_master_port=3306 --orig_master_user='root' --orig_master_password='123456' --new_master_host=192.168.2.129 --new_master_ip=192.168.2.129 --new_master_port=3306 --new_master_user='root' --new_master_password='123456' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave
Mon Jan 19 01:51:50 2015 443208 Set read_only=0 on the new master.
Mon Jan 19 01:51:50 2015 444741 Creating app user on the new master..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] * Switching slaves in parallel..
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] -- Slave switch on host 192.168.2.130(192.168.2.130:3306) started, pid: 23040
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] Log messages from 192.168.2.130 ...
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] Waiting to execute all relay logs on 192.168.2.130(192.168.2.130:3306)..
Mon Jan 19 01:51:50 2015 - [info] master_pos_wait(mysql-bin.000017:107) completed on 192.168.2.130(192.168.2.130:3306). Executed 0 events.
Mon Jan 19 01:51:50 2015 - [info] done.
Mon Jan 19 01:51:50 2015 - [info] Resetting slave 192.168.2.130(192.168.2.130:3306) and starting replication from the new master 192.168.2.129(192.168.2.129:3306)..
Mon Jan 19 01:51:50 2015 - [info] Executed CHANGE MASTER.
Mon Jan 19 01:51:50 2015 - [info] Slave started.
Mon Jan 19 01:51:50 2015 - [info] End of log messages from 192.168.2.130 ...
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] -- Slave switch on host 192.168.2.130(192.168.2.130:3306) succeeded.
Mon Jan 19 01:51:50 2015 - [info] Unlocking all tables on the orig master:
Mon Jan 19 01:51:50 2015 - [info] Executing UNLOCK TABLES..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] Starting orig master as a new slave..
Mon Jan 19 01:51:50 2015 - [info] Resetting slave 192.168.2.128(192.168.2.128:3306) and starting replication from the new master 192.168.2.129(192.168.2.129:3306)..
Mon Jan 19 01:51:50 2015 - [info] Executed CHANGE MASTER.
Mon Jan 19 01:51:50 2015 - [info] Slave started.
Mon Jan 19 01:51:50 2015 - [info] All new slave servers switched successfully.
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] * Phase 5: New master cleanup phase..
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] 192.168.2.129: Resetting slave info succeeded.
Mon Jan 19 01:51:50 2015 - [info] Switching master to 192.168.2.129(192.168.2.129:3306) completed successfully.
192.168.2.131 [root bin]$

192.168.2.131 [root bin]$ masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.2.129 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000
Mon Jan 19 01:51:39 2015 - [info] MHA::MasterRotate version 0.56.
Mon Jan 19 01:51:39 2015 - [info] Starting online master switch..
Mon Jan 19 01:51:39 2015 - [info]
Mon Jan 19 01:51:39 2015 - [info] * Phase 1: Configuration Check Phase..
Mon Jan 19 01:51:39 2015 - [info]
Mon Jan 19 01:51:39 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jan 19 01:51:39 2015 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Jan 19 01:51:39 2015 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Jan 19 01:51:39 2015 - [info] GTID failover mode = 0
Mon Jan 19 01:51:39 2015 - [info] Current Alive Master: 192.168.2.128(192.168.2.128:3306)
Mon Jan 19 01:51:39 2015 - [info] Alive Slaves:
Mon Jan 19 01:51:39 2015 - [info] 192.168.2.129(192.168.2.129:3306) Version=5.5.30-log (oldest major version between slaves) log-bin:enabled
Mon Jan 19 01:51:39 2015 - [info] Replicating from 192.168.2.128(192.168.2.128:3306)
Mon Jan 19 01:51:39 2015 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jan 19 01:51:39 2015 - [info] 192.168.2.130(192.168.2.130:3306) Version=5.5.25-log (oldest major version between slaves) log-bin:enabled
Mon Jan 19 01:51:39 2015 - [info] Replicating from 192.168.2.128(192.168.2.128:3306) It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.2.128(192.168.2.128:3306)? (YES/no): yes
Mon Jan 19 01:51:46 2015 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Mon Jan 19 01:51:46 2015 - [info] ok.
Mon Jan 19 01:51:46 2015 - [info] Checking MHA is not monitoring or doing failover..
Mon Jan 19 01:51:46 2015 - [info] Checking replication health on 192.168.2.129..
Mon Jan 19 01:51:46 2015 - [info] ok.
Mon Jan 19 01:51:46 2015 - [info] Checking replication health on 192.168.2.130..
Mon Jan 19 01:51:46 2015 - [info] ok.
Mon Jan 19 01:51:46 2015 - [info] 192.168.2.129 can be new master.
Mon Jan 19 01:51:46 2015 - [info]
From:
192.168.2.128(192.168.2.128:3306) (current master)
+--192.168.2.129(192.168.2.129:3306)
+--192.168.2.130(192.168.2.130:3306) To:
192.168.2.129(192.168.2.129:3306) (new master)
+--192.168.2.130(192.168.2.130:3306)
+--192.168.2.128(192.168.2.128:3306) Starting master switch from 192.168.2.128(192.168.2.128:3306) to 192.168.2.129(192.168.2.129:3306)? (yes/NO): yes
Mon Jan 19 01:51:50 2015 - [info] Checking whether 192.168.2.129(192.168.2.129:3306) is ok for the new master..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] 192.168.2.128(192.168.2.128:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Mon Jan 19 01:51:50 2015 - [info] 192.168.2.128(192.168.2.128:3306): Resetting slave pointing to the dummy host.
Mon Jan 19 01:51:50 2015 - [info] ** Phase 1: Configuration Check Phase completed.
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] * Phase 2: Rejecting updates Phase..
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] Executing master ip online change script to disable write on the current master:
Mon Jan 19 01:51:50 2015 - [info] /usr/local/bin/master_ip_online_change --command=stop --orig_master_host=192.168.2.128 --orig_master_ip=192.168.2.128 --orig_master_port=3306 --orig_master_user='root' --orig_master_password='123456' --new_master_host=192.168.2.129 --new_master_ip=192.168.2.129 --new_master_port=3306 --new_master_user='root' --new_master_password='123456' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave
Mon Jan 19 01:51:50 2015 173112 Set read_only on the new master.. ok.
Mon Jan 19 01:51:50 2015 178943 Drpping app user on the orig master..
Mon Jan 19 01:51:50 2015 180438 Set read_only=1 on the orig master.. ok.
Mon Jan 19 01:51:50 2015 183258 Killing all application threads..
Mon Jan 19 01:51:50 2015 183387 done.
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Mon Jan 19 01:51:50 2015 - [info] Executing FLUSH TABLES WITH READ LOCK..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] Orig master binlog:pos is mysql-bin.000017:107.
Mon Jan 19 01:51:50 2015 - [info] Waiting to execute all relay logs on 192.168.2.129(192.168.2.129:3306)..
Mon Jan 19 01:51:50 2015 - [info] master_pos_wait(mysql-bin.000017:107) completed on 192.168.2.129(192.168.2.129:3306). Executed 0 events.
Mon Jan 19 01:51:50 2015 - [info] done.
Mon Jan 19 01:51:50 2015 - [info] Getting new master's binlog name and position..
Mon Jan 19 01:51:50 2015 - [info] mysql-bin.000005:61791
Mon Jan 19 01:51:50 2015 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.2.129', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000005', MASTER_LOG_POS=61791, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Mon Jan 19 01:51:50 2015 - [info] Executing master ip online change script to allow write on the new master:
Mon Jan 19 01:51:50 2015 - [info] /usr/local/bin/master_ip_online_change --command=start --orig_master_host=192.168.2.128 --orig_master_ip=192.168.2.128 --orig_master_port=3306 --orig_master_user='root' --orig_master_password='123456' --new_master_host=192.168.2.129 --new_master_ip=192.168.2.129 --new_master_port=3306 --new_master_user='root' --new_master_password='123456' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave
Mon Jan 19 01:51:50 2015 443208 Set read_only=0 on the new master.
Mon Jan 19 01:51:50 2015 444741 Creating app user on the new master..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] * Switching slaves in parallel..
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] -- Slave switch on host 192.168.2.130(192.168.2.130:3306) started, pid: 23040
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] Log messages from 192.168.2.130 ...
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] Waiting to execute all relay logs on 192.168.2.130(192.168.2.130:3306)..
Mon Jan 19 01:51:50 2015 - [info] master_pos_wait(mysql-bin.000017:107) completed on 192.168.2.130(192.168.2.130:3306). Executed 0 events.
Mon Jan 19 01:51:50 2015 - [info] done.
Mon Jan 19 01:51:50 2015 - [info] Resetting slave 192.168.2.130(192.168.2.130:3306) and starting replication from the new master 192.168.2.129(192.168.2.129:3306)..
Mon Jan 19 01:51:50 2015 - [info] Executed CHANGE MASTER.
Mon Jan 19 01:51:50 2015 - [info] Slave started.
Mon Jan 19 01:51:50 2015 - [info] End of log messages from 192.168.2.130 ...
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] -- Slave switch on host 192.168.2.130(192.168.2.130:3306) succeeded.
Mon Jan 19 01:51:50 2015 - [info] Unlocking all tables on the orig master:
Mon Jan 19 01:51:50 2015 - [info] Executing UNLOCK TABLES..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] Starting orig master as a new slave..
Mon Jan 19 01:51:50 2015 - [info] Resetting slave 192.168.2.128(192.168.2.128:3306) and starting replication from the new master 192.168.2.129(192.168.2.129:3306)..
Mon Jan 19 01:51:50 2015 - [info] Executed CHANGE MASTER.
Mon Jan 19 01:51:50 2015 - [info] Slave started.
Mon Jan 19 01:51:50 2015 - [info] All new slave servers switched successfully.
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] * Phase 5: New master cleanup phase..
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] 192.168.2.129: Resetting slave info succeeded.
Mon Jan 19 01:51:50 2015 - [info] Switching master to 192.168.2.129(192.168.2.129:3306) completed successfully.
192.168.2.131 [root bin]$

在许多情况下, 需要将现有的主服务器迁移到另外一台服务器上。 比如主服务器硬件故障,RAID 控制卡需要重建,将主服务器移到性能更好的服务器上等等。维护主服务器引起性能下降, 导致停机时间至少无法写入数据。 另外, 阻塞或杀掉当前运行的会话会导致主主之间数据不一致的问题发生。 MHA 提供快速切换和优雅的阻塞写入,这个切换过程只需要 0.5-2s 的时间,这段时间内数据是无法写入的。在很多情况下,0.5-2s 的阻塞写入是可以接受的。因此切换主服务器不需要计划分配维护时间窗口。

MHA在线切换的大概过程:
(1)检测复制设置和确定当前主服务器
(2)确定新的主服务器
(3)阻塞写入到当前主服务器
(4)等待所有从服务器赶上复制
(5)授予写入到新的主服务器
(6)重新设置从服务器

注意,在线切换的时候应用架构需要考虑以下两个问题:

1.自动识别master和slave的问题(master的机器可能会切换),如果采用了vip的方式,基本可以解决这个问题。

2.负载均衡的问题(可以定义大概的读写比例,每台机器可承担的负载比例,当有机器离开集群时,需要考虑这个问题)

为了保证数据完全一致性,在最快的时间内完成切换,MHA的在线切换必须满足以下条件才会切换成功,否则会切换失败。

(1)所有slave的IO线程都在运行

(2)所有slave的SQL线程都在运行

(3)所有的show slave status的输出中Seconds_Behind_Master参数小于或者等于running_updates_limit秒,如果在切换过程中不指定running_updates_limit,那么默认情况下running_updates_limit为1秒。

(4)在master端,通过show processlist输出,没有一个更新花费的时间大于running_updates_limit秒。

在线切换步骤如下:

在MHA Manager服务器192.168.2.131上操作,首先,停掉MHA监控:

192.168.2.131 [root ~]$ masterha_stop --conf=/etc/masterha/app1.cnf
Stopped app1 successfully.
[1]+ Exit 1 nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 (wd: /usr/local/bin)
(wd now: ~)
192.168.2.131 [root ~]$

执行在线切换命令:(以下是0.53版本的manager和node包报的错)

Starting master switch from 192.168.2.128(192.168.2.128:3306) to 192.168.2.129(192.168.2.129:3306)? (yes/NO): yes
Sun Jan 18 20:06:17 2015 - [info] Checking whether 192.168.2.129(192.168.2.129:3306) is ok for the new master..
Sun Jan 18 20:06:17 2015 - [info] ok.
Sun Jan 18 20:06:17 2015 - [info] 192.168.2.128(192.168.2.128:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Sun Jan 18 20:06:17 2015 - [info] 192.168.2.128(192.168.2.128:3306): Resetting slave pointing to the dummy host.
Sun Jan 18 20:06:17 2015 - [info] ** Phase 1: Configuration Check Phase completed.
Sun Jan 18 20:06:17 2015 - [info]
Sun Jan 18 20:06:17 2015 - [info] * Phase 2: Rejecting updates Phase..
Sun Jan 18 20:06:17 2015 - [info]
Sun Jan 18 20:06:17 2015 - [info] Executing master ip online change script to disable write on the current master:
Sun Jan 18 20:06:17 2015 - [info] /usr/local/bin/master_ip_online_change --command=stop --orig_master_host=192.168.2.128 --orig_master_ip=192.168.2.128 --orig_master_port=3306 --new_master_host=192.168.2.129 --new_master_ip=192.168.2.129 --new_master_port=3306
Got Error: DBI connect(';host=192.168.2.129;port=3306;mysql_connect_timeout=4','',...) failed: Access denied for user 'root'@'192.168.2.131' (using password: NO) at /usr/local/share/perl5/MHA/DBHelper.pm line 181
at /usr/local/bin/master_ip_online_change line 138 Sun Jan 18 20:06:17 2015 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln178] Got ERROR: at /usr/local/bin/masterha_master_switch line 53

原因是脚本master_ip_online_change不完整,需要自己进行相应的修改,脚本中new_master_password这个变量获取不到,导致在线切换失败,所以进行了相关的硬编码,直接把mysql的root用户密码赋值给变量new_master_password,但mha4mysql-manager-0.56和mha4mysql-node-0.56版本已经不需要自己把密码直接赋值了,它自己能读出来,之前版本貌似在读new_master_password变量时,总获取不到值(perl脚本我也不太懂,需要大家一起来改善,哈哈)

下面来看来0.56版本的执行情况:

192.168.2.131 [root bin]$ masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.2.129 --new_master_port=3306  --orig_master_is_new_slave --running_updates_limit=10000

192.168.2.131 [root bin]$ masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.2.129 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000
Mon Jan 19 01:51:39 2015 - [info] MHA::MasterRotate version 0.56.
Mon Jan 19 01:51:39 2015 - [info] Starting online master switch..
Mon Jan 19 01:51:39 2015 - [info]
Mon Jan 19 01:51:39 2015 - [info] * Phase 1: Configuration Check Phase..
Mon Jan 19 01:51:39 2015 - [info]
Mon Jan 19 01:51:39 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jan 19 01:51:39 2015 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Jan 19 01:51:39 2015 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Jan 19 01:51:39 2015 - [info] GTID failover mode = 0
Mon Jan 19 01:51:39 2015 - [info] Current Alive Master: 192.168.2.128(192.168.2.128:3306)
Mon Jan 19 01:51:39 2015 - [info] Alive Slaves:
Mon Jan 19 01:51:39 2015 - [info] 192.168.2.129(192.168.2.129:3306) Version=5.5.30-log (oldest major version between slaves) log-bin:enabled
Mon Jan 19 01:51:39 2015 - [info] Replicating from 192.168.2.128(192.168.2.128:3306)
Mon Jan 19 01:51:39 2015 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jan 19 01:51:39 2015 - [info] 192.168.2.130(192.168.2.130:3306) Version=5.5.25-log (oldest major version between slaves) log-bin:enabled
Mon Jan 19 01:51:39 2015 - [info] Replicating from 192.168.2.128(192.168.2.128:3306)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.2.128(192.168.2.128:3306)? (YES/no): yes
Mon Jan 19 01:51:46 2015 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Mon Jan 19 01:51:46 2015 - [info] ok.
Mon Jan 19 01:51:46 2015 - [info] Checking MHA is not monitoring or doing failover..
Mon Jan 19 01:51:46 2015 - [info] Checking replication health on 192.168.2.129..
Mon Jan 19 01:51:46 2015 - [info] ok.
Mon Jan 19 01:51:46 2015 - [info] Checking replication health on 192.168.2.130..
Mon Jan 19 01:51:46 2015 - [info] ok.
Mon Jan 19 01:51:46 2015 - [info] 192.168.2.129 can be new master.
Mon Jan 19 01:51:46 2015 - [info]
From:
192.168.2.128(192.168.2.128:3306) (current master)
+--192.168.2.129(192.168.2.129:3306)
+--192.168.2.130(192.168.2.130:3306)

To:
192.168.2.129(192.168.2.129:3306) (new master)
+--192.168.2.130(192.168.2.130:3306)
+--192.168.2.128(192.168.2.128:3306)

Starting master switch from 192.168.2.128(192.168.2.128:3306) to 192.168.2.129(192.168.2.129:3306)? (yes/NO): yes
Mon Jan 19 01:51:50 2015 - [info] Checking whether 192.168.2.129(192.168.2.129:3306) is ok for the new master..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] 192.168.2.128(192.168.2.128:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Mon Jan 19 01:51:50 2015 - [info] 192.168.2.128(192.168.2.128:3306): Resetting slave pointing to the dummy host.
Mon Jan 19 01:51:50 2015 - [info] ** Phase 1: Configuration Check Phase completed.
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] * Phase 2: Rejecting updates Phase..
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] Executing master ip online change script to disable write on the current master:
Mon Jan 19 01:51:50 2015 - [info] /usr/local/bin/master_ip_online_change --command=stop --orig_master_host=192.168.2.128 --orig_master_ip=192.168.2.128 --orig_master_port=3306 --orig_master_user='root' --orig_master_password='123456' --new_master_host=192.168.2.129 --new_master_ip=192.168.2.129 --new_master_port=3306 --new_master_user='root' --new_master_password='123456' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave
Mon Jan 19 01:51:50 2015 173112 Set read_only on the new master.. ok.
Mon Jan 19 01:51:50 2015 178943 Drpping app user on the orig master..
Mon Jan 19 01:51:50 2015 180438 Set read_only=1 on the orig master.. ok.
Mon Jan 19 01:51:50 2015 183258 Killing all application threads..
Mon Jan 19 01:51:50 2015 183387 done.
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Mon Jan 19 01:51:50 2015 - [info] Executing FLUSH TABLES WITH READ LOCK..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] Orig master binlog:pos is mysql-bin.000017:107.
Mon Jan 19 01:51:50 2015 - [info] Waiting to execute all relay logs on 192.168.2.129(192.168.2.129:3306)..
Mon Jan 19 01:51:50 2015 - [info] master_pos_wait(mysql-bin.000017:107) completed on 192.168.2.129(192.168.2.129:3306). Executed 0 events.
Mon Jan 19 01:51:50 2015 - [info] done.
Mon Jan 19 01:51:50 2015 - [info] Getting new master's binlog name and position..
Mon Jan 19 01:51:50 2015 - [info] mysql-bin.000005:61791
Mon Jan 19 01:51:50 2015 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.2.129', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000005', MASTER_LOG_POS=61791, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Mon Jan 19 01:51:50 2015 - [info] Executing master ip online change script to allow write on the new master:
Mon Jan 19 01:51:50 2015 - [info] /usr/local/bin/master_ip_online_change --command=start --orig_master_host=192.168.2.128 --orig_master_ip=192.168.2.128 --orig_master_port=3306 --orig_master_user='root' --orig_master_password='123456' --new_master_host=192.168.2.129 --new_master_ip=192.168.2.129 --new_master_port=3306 --new_master_user='root' --new_master_password='123456' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave
Mon Jan 19 01:51:50 2015 443208 Set read_only=0 on the new master.
Mon Jan 19 01:51:50 2015 444741 Creating app user on the new master..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] * Switching slaves in parallel..
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] -- Slave switch on host 192.168.2.130(192.168.2.130:3306) started, pid: 23040
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] Log messages from 192.168.2.130 ...
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] Waiting to execute all relay logs on 192.168.2.130(192.168.2.130:3306)..
Mon Jan 19 01:51:50 2015 - [info] master_pos_wait(mysql-bin.000017:107) completed on 192.168.2.130(192.168.2.130:3306). Executed 0 events.
Mon Jan 19 01:51:50 2015 - [info] done.
Mon Jan 19 01:51:50 2015 - [info] Resetting slave 192.168.2.130(192.168.2.130:3306) and starting replication from the new master 192.168.2.129(192.168.2.129:3306)..
Mon Jan 19 01:51:50 2015 - [info] Executed CHANGE MASTER.
Mon Jan 19 01:51:50 2015 - [info] Slave started.
Mon Jan 19 01:51:50 2015 - [info] End of log messages from 192.168.2.130 ...
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] -- Slave switch on host 192.168.2.130(192.168.2.130:3306) succeeded.
Mon Jan 19 01:51:50 2015 - [info] Unlocking all tables on the orig master:
Mon Jan 19 01:51:50 2015 - [info] Executing UNLOCK TABLES..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] Starting orig master as a new slave..
Mon Jan 19 01:51:50 2015 - [info] Resetting slave 192.168.2.128(192.168.2.128:3306) and starting replication from the new master 192.168.2.129(192.168.2.129:3306)..
Mon Jan 19 01:51:50 2015 - [info] Executed CHANGE MASTER.
Mon Jan 19 01:51:50 2015 - [info] Slave started.
Mon Jan 19 01:51:50 2015 - [info] All new slave servers switched successfully.
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] * Phase 5: New master cleanup phase..
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] 192.168.2.129: Resetting slave info succeeded.
Mon Jan 19 01:51:50 2015 - [info] Switching master to 192.168.2.129(192.168.2.129:3306) completed successfully.
192.168.2.131 [root bin]$

192.168.2.131 [root bin]$ masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=192.168.2.129 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000
Mon Jan 19 01:51:39 2015 - [info] MHA::MasterRotate version 0.56.
Mon Jan 19 01:51:39 2015 - [info] Starting online master switch..
Mon Jan 19 01:51:39 2015 - [info]
Mon Jan 19 01:51:39 2015 - [info] * Phase 1: Configuration Check Phase..
Mon Jan 19 01:51:39 2015 - [info]
Mon Jan 19 01:51:39 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jan 19 01:51:39 2015 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Jan 19 01:51:39 2015 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Jan 19 01:51:39 2015 - [info] GTID failover mode = 0
Mon Jan 19 01:51:39 2015 - [info] Current Alive Master: 192.168.2.128(192.168.2.128:3306)
Mon Jan 19 01:51:39 2015 - [info] Alive Slaves:
Mon Jan 19 01:51:39 2015 - [info] 192.168.2.129(192.168.2.129:3306) Version=5.5.30-log (oldest major version between slaves) log-bin:enabled
Mon Jan 19 01:51:39 2015 - [info] Replicating from 192.168.2.128(192.168.2.128:3306)
Mon Jan 19 01:51:39 2015 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jan 19 01:51:39 2015 - [info] 192.168.2.130(192.168.2.130:3306) Version=5.5.25-log (oldest major version between slaves) log-bin:enabled
Mon Jan 19 01:51:39 2015 - [info] Replicating from 192.168.2.128(192.168.2.128:3306) It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.2.128(192.168.2.128:3306)? (YES/no): yes
Mon Jan 19 01:51:46 2015 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Mon Jan 19 01:51:46 2015 - [info] ok.
Mon Jan 19 01:51:46 2015 - [info] Checking MHA is not monitoring or doing failover..
Mon Jan 19 01:51:46 2015 - [info] Checking replication health on 192.168.2.129..
Mon Jan 19 01:51:46 2015 - [info] ok.
Mon Jan 19 01:51:46 2015 - [info] Checking replication health on 192.168.2.130..
Mon Jan 19 01:51:46 2015 - [info] ok.
Mon Jan 19 01:51:46 2015 - [info] 192.168.2.129 can be new master.
Mon Jan 19 01:51:46 2015 - [info]
From:
192.168.2.128(192.168.2.128:3306) (current master)
+--192.168.2.129(192.168.2.129:3306)
+--192.168.2.130(192.168.2.130:3306) To:
192.168.2.129(192.168.2.129:3306) (new master)
+--192.168.2.130(192.168.2.130:3306)
+--192.168.2.128(192.168.2.128:3306) Starting master switch from 192.168.2.128(192.168.2.128:3306) to 192.168.2.129(192.168.2.129:3306)? (yes/NO): yes
Mon Jan 19 01:51:50 2015 - [info] Checking whether 192.168.2.129(192.168.2.129:3306) is ok for the new master..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] 192.168.2.128(192.168.2.128:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Mon Jan 19 01:51:50 2015 - [info] 192.168.2.128(192.168.2.128:3306): Resetting slave pointing to the dummy host.
Mon Jan 19 01:51:50 2015 - [info] ** Phase 1: Configuration Check Phase completed.
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] * Phase 2: Rejecting updates Phase..
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] Executing master ip online change script to disable write on the current master:
Mon Jan 19 01:51:50 2015 - [info] /usr/local/bin/master_ip_online_change --command=stop --orig_master_host=192.168.2.128 --orig_master_ip=192.168.2.128 --orig_master_port=3306 --orig_master_user='root' --orig_master_password='123456' --new_master_host=192.168.2.129 --new_master_ip=192.168.2.129 --new_master_port=3306 --new_master_user='root' --new_master_password='123456' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave
Mon Jan 19 01:51:50 2015 173112 Set read_only on the new master.. ok.
Mon Jan 19 01:51:50 2015 178943 Drpping app user on the orig master..
Mon Jan 19 01:51:50 2015 180438 Set read_only=1 on the orig master.. ok.
Mon Jan 19 01:51:50 2015 183258 Killing all application threads..
Mon Jan 19 01:51:50 2015 183387 done.
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Mon Jan 19 01:51:50 2015 - [info] Executing FLUSH TABLES WITH READ LOCK..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] Orig master binlog:pos is mysql-bin.000017:107.
Mon Jan 19 01:51:50 2015 - [info] Waiting to execute all relay logs on 192.168.2.129(192.168.2.129:3306)..
Mon Jan 19 01:51:50 2015 - [info] master_pos_wait(mysql-bin.000017:107) completed on 192.168.2.129(192.168.2.129:3306). Executed 0 events.
Mon Jan 19 01:51:50 2015 - [info] done.
Mon Jan 19 01:51:50 2015 - [info] Getting new master's binlog name and position..
Mon Jan 19 01:51:50 2015 - [info] mysql-bin.000005:61791
Mon Jan 19 01:51:50 2015 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.2.129', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000005', MASTER_LOG_POS=61791, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Mon Jan 19 01:51:50 2015 - [info] Executing master ip online change script to allow write on the new master:
Mon Jan 19 01:51:50 2015 - [info] /usr/local/bin/master_ip_online_change --command=start --orig_master_host=192.168.2.128 --orig_master_ip=192.168.2.128 --orig_master_port=3306 --orig_master_user='root' --orig_master_password='123456' --new_master_host=192.168.2.129 --new_master_ip=192.168.2.129 --new_master_port=3306 --new_master_user='root' --new_master_password='123456' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave
Mon Jan 19 01:51:50 2015 443208 Set read_only=0 on the new master.
Mon Jan 19 01:51:50 2015 444741 Creating app user on the new master..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] * Switching slaves in parallel..
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] -- Slave switch on host 192.168.2.130(192.168.2.130:3306) started, pid: 23040
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] Log messages from 192.168.2.130 ...
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] Waiting to execute all relay logs on 192.168.2.130(192.168.2.130:3306)..
Mon Jan 19 01:51:50 2015 - [info] master_pos_wait(mysql-bin.000017:107) completed on 192.168.2.130(192.168.2.130:3306). Executed 0 events.
Mon Jan 19 01:51:50 2015 - [info] done.
Mon Jan 19 01:51:50 2015 - [info] Resetting slave 192.168.2.130(192.168.2.130:3306) and starting replication from the new master 192.168.2.129(192.168.2.129:3306)..
Mon Jan 19 01:51:50 2015 - [info] Executed CHANGE MASTER.
Mon Jan 19 01:51:50 2015 - [info] Slave started.
Mon Jan 19 01:51:50 2015 - [info] End of log messages from 192.168.2.130 ...
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] -- Slave switch on host 192.168.2.130(192.168.2.130:3306) succeeded.
Mon Jan 19 01:51:50 2015 - [info] Unlocking all tables on the orig master:
Mon Jan 19 01:51:50 2015 - [info] Executing UNLOCK TABLES..
Mon Jan 19 01:51:50 2015 - [info] ok.
Mon Jan 19 01:51:50 2015 - [info] Starting orig master as a new slave..
Mon Jan 19 01:51:50 2015 - [info] Resetting slave 192.168.2.128(192.168.2.128:3306) and starting replication from the new master 192.168.2.129(192.168.2.129:3306)..
Mon Jan 19 01:51:50 2015 - [info] Executed CHANGE MASTER.
Mon Jan 19 01:51:50 2015 - [info] Slave started.
Mon Jan 19 01:51:50 2015 - [info] All new slave servers switched successfully.
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] * Phase 5: New master cleanup phase..
Mon Jan 19 01:51:50 2015 - [info]
Mon Jan 19 01:51:50 2015 - [info] 192.168.2.129: Resetting slave info succeeded.
Mon Jan 19 01:51:50 2015 - [info] Switching master to 192.168.2.129(192.168.2.129:3306) completed successfully.
192.168.2.131 [root bin]$

参数说明:

--orig_master_is_new_slave 切换时加上此参数是将原 master 变为 slave 节点,如果不加此参数,原来的 master 将不启动

--running_updates_limit=10000,故障切换时,候选master 如果有延迟的话, mha 切换不能成功,加上此参数表示延迟在此时间范围内都可切换(单位为s),但是切换的时间长短是由recover 时relay 日志的大小决定 

master_ip_online_change脚本代码如下:

#!/usr/bin/env perl

# Copyright (C) 2011 DeNA Co.,Ltd.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => 'all';

use Getopt::Long;
use MHA::DBHelper;
use MHA::NodeUtil;
use Time::HiRes qw( sleep gettimeofday tv_interval );
use Data::Dumper;

my $_tstart;
my $_running_interval = 0.1;
my (
$command, $orig_master_is_new_slave, $orig_master_host,
$orig_master_ip, $orig_master_port, $orig_master_user,
$orig_master_password, $orig_master_ssh_user, $new_master_host,
$new_master_ip, $new_master_port, $new_master_user,
$new_master_password, $new_master_ssh_user
);
my $vip = '192.168.2.88/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
my $orig_master_ssh_port = 22;
my $new_master_ssh_port = 22;
GetOptions(
'command=s' => \$command,
'orig_master_is_new_slave' => \$orig_master_is_new_slave,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'orig_master_user=s' => \$orig_master_user,
'orig_master_password=s' => \$orig_master_password,
'orig_master_ssh_user=s' => \$orig_master_ssh_user,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
'new_master_user=s' => \$new_master_user,
'new_master_password=s' => \$new_master_password,
'new_master_ssh_user=s' => \$new_master_ssh_user,
'orig_master_ssh_port=i' => \$orig_master_ssh_port,
'new_master_ssh_port=i' => \$new_master_ssh_port,
);

exit &main();

sub current_time_us {
my ( $sec, $microsec ) = gettimeofday();
my $curdate = localtime($sec);
return $curdate . " " . sprintf( "%06d", $microsec );
}

sub sleep_until {
my $elapsed = tv_interval($_tstart);
if ( $_running_interval > $elapsed ) {
sleep( $_running_interval - $elapsed );
}
}

sub get_threads_util {
my $dbh = shift;
my $my_connection_id = shift;
my $running_time_threshold = shift;
my $type = shift;
$running_time_threshold = 0 unless ($running_time_threshold);
$type = 0 unless ($type);
my @threads;

my $sth = $dbh->prepare("SHOW PROCESSLIST");
$sth->execute();

while ( my $ref = $sth->fetchrow_hashref() ) {
my $id = $ref->{Id};
my $user = $ref->{User};
my $host = $ref->{Host};
my $command = $ref->{Command};
my $state = $ref->{State};
my $query_time = $ref->{Time};
my $info = $ref->{Info};
$info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);
next if ( $my_connection_id == $id );
next if ( defined($query_time) && $query_time < $running_time_threshold );
next if ( defined($command) && $command eq "Binlog Dump" );
next if ( defined($user) && $user eq "system user" );
next
if ( defined($command)
&& $command eq "Sleep"
&& defined($query_time)
&& $query_time >= 1 );

if ( $type >= 1 ) {
next if ( defined($command) && $command eq "Sleep" );
next if ( defined($command) && $command eq "Connect" );
}

if ( $type >= 2 ) {
next if ( defined($info) && $info =~ m/^select/i );
next if ( defined($info) && $info =~ m/^show/i );
}

push @threads, $ref;
}
return @threads;
}

sub main {
if ( $command eq "stop" ) {
## Gracefully killing connections on the current master
# 1. Set read_only= 1 on the new master
# 2. DROP USER so that no app user can establish new connections
# 3. Set read_only= 1 on the current master
# 4. Kill current queries
# * Any database access failure will result in script die.
my $exit_code = 1;
eval {
## Setting read_only=1 on the new master (to avoid accident)
my $new_master_handler = new MHA::DBHelper();

# args: hostname, port, user, password, raise_error(die_on_error)_or_not
$new_master_handler->connect( $new_master_ip, $new_master_port,
$new_master_user, $new_master_password, 1 );
print current_time_us() . " Set read_only on the new master.. ";
$new_master_handler->enable_read_only();
if ( $new_master_handler->is_read_only() ) {
print "ok.\n";
}
else {
die "Failed!\n";
}
$new_master_handler->disconnect();

# Connecting to the orig master, die if any database error happens
my $orig_master_handler = new MHA::DBHelper();
$orig_master_handler->connect( $orig_master_ip, $orig_master_port,
$orig_master_user, $orig_master_password, 1 );

## Drop application user so that nobody can connect. Disabling per-session binlog beforehand
$orig_master_handler->disable_log_bin_local();
print current_time_us() . " Drpping app user on the orig master..\n";
#FIXME_xxx_drop_app_user($orig_master_handler);

## Waiting for N * 100 milliseconds so that current connections can exit
my $time_until_read_only = 15;
$_tstart = [gettimeofday];
my @threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
while ( $time_until_read_only > 0 && $#threads >= 0 ) {
if ( $time_until_read_only % 5 == 0 ) {
printf
"%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n",
current_time_us(), $#threads + 1, $time_until_read_only * 100;
if ( $#threads < 5 ) {
print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
foreach (@threads);
}
}
sleep_until();
$_tstart = [gettimeofday];
$time_until_read_only--;
@threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
}

## Setting read_only=1 on the current master so that nobody(except SUPER) can write
print current_time_us() . " Set read_only=1 on the orig master.. ";
$orig_master_handler->enable_read_only();
if ( $orig_master_handler->is_read_only() ) {
print "ok.\n";
}
else {
die "Failed!\n";
}

## Waiting for M * 100 milliseconds so that current update queries can complete
my $time_until_kill_threads = 5;
@threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
while ( $time_until_kill_threads > 0 && $#threads >= 0 ) {
if ( $time_until_kill_threads % 5 == 0 ) {
printf
"%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n",
current_time_us(), $#threads + 1, $time_until_kill_threads * 100;
if ( $#threads < 5 ) {
print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
foreach (@threads);
}
}
sleep_until();
$_tstart = [gettimeofday];
$time_until_kill_threads--;
@threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
}

## Terminating all threads
print current_time_us() . " Killing all application threads..\n";
$orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 );
print current_time_us() . " done.\n";
$orig_master_handler->enable_log_bin_local();
$orig_master_handler->disconnect();

## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK
eval {
`ssh -p$orig_master_ssh_port $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
};
if ($@) {
warn $@;
}
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
## Activating master ip on the new master
# 1. Create app user with write privileges
# 2. Moving backup script if needed
# 3. Register new master's ip to the catalog database

# We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery.
# If exit code is 0 or 10, MHA does not abort
my $exit_code = 10;
eval {
my $new_master_handler = new MHA::DBHelper();

# args: hostname, port, user, password, raise_error_or_not
$new_master_handler->connect( $new_master_ip, $new_master_port,
$new_master_user, $new_master_password, 1 );

## Set read_only=0 on the new master
$new_master_handler->disable_log_bin_local();
print current_time_us() . " Set read_only=0 on the new master.\n";
$new_master_handler->disable_read_only();

## Creating an app user on the new master
print current_time_us() . " Creating app user on the new master..\n";
#FIXME_xxx_create_app_user($new_master_handler);
$new_master_handler->enable_log_bin_local();
$new_master_handler->disconnect();

## Update master ip on the catalog database, etc
`ssh -p$new_master_ssh_port $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`;
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {

# do nothing
exit 0;
}
else {
&usage();
exit 1;
}
}

sub usage {
print
"Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
die;
}

说明可以参考官网:https://code.google.com/p/mysql-master-ha/wiki/Parameters#master_ip_online_change_script(自备梯子)

2、修复宕机的Master 

通常情况下自动切换以后,原master可能已经废弃掉,待原master主机修复后,如果数据完整的情况下,可能想把原来master重新作为新主库的slave,这时我们可以借助当时自动切换时刻的MHA日志来完成对原master的修复。下面是提取相关日志的命令:

从上面信息可以看到:

All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.2.129', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000005', MASTER_LOG_POS=61791, MASTER_USER='repl', MASTER_PASSWORD='xxx';

意思是说,如果Master主机修复好了,可以在修复好后的Master执行CHANGE MASTER操作,作为新的slave库。

目前高可用方案可以一定程度上实现数据库的高可用,比如前面文章介绍的MMMheartbeat+drbdCluster等。还有percona的Galera Cluster等。这些高可用软件各有优劣。在进行高可用方案选择时,主要是看业务还有对数据一致性方面的要求。最后出于对数据库的高可用和数据一致性的要求,推荐使用MHA架构。

总结:

1.mha manager没有运行

2.集群运行正常

3.执行切换后old master变成第二master,与原第二master互换角色(在数据了很小情况下测试得出的,并且主备是同步的,这种切换对主备同步有要求,不能差距太多)

4.还可以执行同样操作,再次切换角色

MHA的在线切换后的一些总结(mha方案来自网络)的更多相关文章

  1. 关于mha手动切换的一些记录(mha方案来自网络)

    mha方案出自:http://www.cnblogs.com/xuanzhi201111/p/4231412.html 当主服务器故障时,人工手动调用MHA来进行故障切换操作,具体命令如下: 先停MH ...

  2. MHA手动在线切换主 原创3(主不参与复制)

    monitor 执行:slave2连接到slave1,server1 不做(主/从复制角色,停在那里) [root@monitor app1]# masterha_master_switch --co ...

  3. MHA在线切换的步骤及原理

    在日常工作中,会碰到如下的场景,如mysql数据库升级,主服务器硬件升级等,这个时候就需要将写操作切换到另外一台服务器上,那么如何进行在线切换呢?同时,要求切换过程短,对业务的影响比较小. MHA就提 ...

  4. MySQL高可用方案MHA在线切换的步骤及原理

    在日常工作中,会碰到如下的场景,如mysql数据库升级,主服务器硬件升级等,这个时候就需要将写操作切换到另外一台服务器上,那么如何进行在线切换呢?同时,要求切换过程短,对业务的影响比较小. MHA就提 ...

  5. MySQL 高可用架构 之 MHA (Centos 7.5 MySQL 5.7.18 MHA 0.58)

    目录 简介 环境准备 秘钥互信 安装基础依赖包 安装MHA组件 安装 MHA Node组件 安装 MHA Manager 组件 建立 MySQL 一主三从 初始化 MySQL 启动MySQL 并简单配 ...

  6. MHA在线切换过程

    MHA 在线切换是MHA除了自动监控切换换提供的另外一种方式,多用于诸如硬件升级,MySQL数据库迁移等等.该方式提供快速切换和优雅的阻塞写入,无关关闭原有服务器,整个切换过程在0.5-2s 的时间左 ...

  7. mysql mha 主从自动切换 高可用

    mha(Master High Availability)目前在MySQL多服务器(超过二台),高可用方面是一个相对成熟的解决方案. 一,什么是mha,有什么特性 1. 主服务器的自动监控和故障转移 ...

  8. 使用DBMS_REDEFINITION在线切换普通表到分区表

    随着数据库数据量的不断增长,有些表须要由普通的堆表转换为分区表的模式.有几种不同的方法来对此进行操作.诸如导出表数据,然后创建分区表再导入数据到分区表.使用EXCHANGE PARTITION方式来转 ...

  9. MySQL 有关MHA搭建与切换的几个错误log

    1:masterha_check_repl 副本集方面报错  replicates is not defined in the configuration file! 具体信息如下: # /usr/l ...

随机推荐

  1. Masonry学习笔记

    1.边距 [bottomView mas_makeConstraints:^(MASConstraintMaker *make) { make.left.equalTo(self.view).offs ...

  2. .Net字符串替换

    在.Net中,有些地方需要进行字符的替换才能实现一些相关功能,这里是一个简单的字符串替换的方法 //如下,变量strWhere中是通过一些方法获取的sql拼接的条件语句,但在数据库中是多表查询,有同名 ...

  3. Datatable/Dataset 转 JSON方法

    当数据库表的数据在一般处理程序中查出来需要将这个表数据返回到前台的jquery中,需要将数据拼成json字符串形式,这里是将数据库数据查出放在Datatable中,然后在一般处理程序中将datatab ...

  4. vim 标记 mark 详解 (转载)

    http://www.cnblogs.com/jianyungsun/archive/2011/02/14/1954057.html Vim 允许你在文本中放置自定义的标记.命令 "ma&q ...

  5. 用node搭建静态文件服务器

    占个坑,写个node静态文件服务器

  6. Function类型

    1.每个函数都是Function类型的,和其他引用类型一样都具有属性和方法.函数也是对象,因此函数实际上是一个指向函数对象的指针. 函数声明语法定义: 方法1: function sum(num1,n ...

  7. 关于ajax载入窗口使用RedirectToAction在窗口显示的问题

    在过滤器中过滤用户是否登录,没有登录就RedirectToAction("Login", "Auth", new { Area = "Account& ...

  8. asp.net web.config 设置Session过期时间

    在Asp.net中,可以有四处设置Session的过期时间:(原文作者:望月狼地址:http://www.cnblogs.com/wangyuelang0526/) 一.全局网站(即服务器)级 IIS ...

  9. Fatal error: Allowed memory size of 524288000 bytes exhausted (tried to allocate 64 bytes) in D

    Fatal error: Allowed memory size of 524288000 bytes exhausted (tried to allocate 64 bytes) in D 从数据库 ...

  10. MSSQL附加数据库5120错误(拒绝访问)处理方法

    一. 右键需要附加的数据库文件,弹出属性对话框,选择安全标签页. 找到Authenticated Users用户名. 如未找到,进行Authenticated Users用户名的添加. 二. 添加Au ...