MHA-手动Failover流程(传统复制&GTID复制)

本文仅梳理手动Failover流程。MHA的介绍详见：MySQL高可用架构之MHA

一、基本环境

1.1、复制结构

VMware10.0+CentOS6.9+MySQL5.7.21

ROLE	HOSTNAME	BASEDIR	DATADIR	IP	PORT
Node1	ZST1	/usr/local/mysql	/data/mysql/mysql3307/data	192.168.85.132	3307
Node2	ZST2	/usr/local/mysql	/data/mysql/mysql3307/data	192.168.85.133	3307
Node3	ZST3	/usr/local/mysql	/data/mysql/mysql3307/data	192.168.85.134	3307

传统复制基于Row+Position，GTID复制基于Row+Gtid搭建的一主两从复制结构：Node1->{Node2、Node3}

1.2、MHA配置文件

文中使用的MHA版本是0.56，并且在Node1、Node2、Node3全部安装manager、node包
MHA的配置文件如下

# 全局级配置文件：/etc/masterha/masterha_default.conf
[root@ZST1 masterha]# cat masterha_default.conf
[server default]
#MySQL的用户和密码
user=mydba
password=mysql5721
 
#系统ssh用户
ssh_user=root
 
#复制用户
repl_user=repl
repl_password=repl
 
#监控
ping_interval=
#shutdown_script=/etc/masterha/send_report.sh
 
#切换调用的脚本
master_ip_failover_script=/etc/masterha/master_ip_failover
master_ip_online_change_script=/etc/masterha/master_ip_online_change
 
log_level=debug
[root@ZST1 masterha]# 
 
# 集群1配置文件：/etc/masterha/app1.conf
[root@ZST1 masterha]# cat app1.conf
[server default]
#mha manager工作目录
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/app1.log
remote_workdir=/var/log/masterha/app1
 
[server1]
hostname=192.168.85.132
port=
master_binlog_dir=/data/mysql/mysql3307/logs
candidate_master=
check_repl_delay=
 
[server2]
hostname=192.168.85.133
port=
master_binlog_dir=/data/mysql/mysql3307/logs
candidate_master=
check_repl_delay=
 
[server3]
hostname=192.168.85.134
port=
master_binlog_dir=/data/mysql/mysql3307/logs
candidate_master=
check_repl_delay=
[root@ZST1 masterha]#

1.3、测试数据

通过停止从节点的io_thread，再往主节点写入数据，模拟出主从数据、从从数据不一致~

#首先清空表中记录
mydba@192.168.85.132,3307 [replcrash]> truncate table py_user;
 
#Node1写入第一条记录
mydba@192.168.85.132,3307 [replcrash]> insert into py_user(name,add_time,server_id) select left(uuid(),32),now(),@@server_id;
#Node3停止io_thread
mydba@192.168.85.134,3307 [replcrash]> stop slave io_thread;
 
#Node1写入第二条记录
mydba@192.168.85.132,3307 [replcrash]> insert into py_user(name,add_time,server_id) select left(uuid(),32),now(),@@server_id;
#Node2停止io_thread
mydba@192.168.85.133,3307 [replcrash]> stop slave io_thread;
 
#Node1写入第三条记录
mydba@192.168.85.132,3307 [replcrash]> insert into py_user(name,add_time,server_id) select left(uuid(),32),now(),@@server_id;
 
# 最终各节点记录如下
#Node1有三条记录
mydba@192.168.85.132,3307 [replcrash]> select * from py_user;
+-----+----------------------------------+---------------------+-----------+
| uid | name                             | add_time            | server_id |
+-----+----------------------------------+---------------------+-----------+
|   1 | 153dc6bf-325d-11e8-88e6-000c29c1 | 2018-03-28 15:53:20 | 1323307   |
|   2 | 272f15ee-325d-11e8-88e6-000c29c1 | 2018-03-28 15:53:50 | 1323307   |
|   3 | 2d8900cc-325d-11e8-88e6-000c29c1 | 2018-03-28 15:54:01 | 1323307   |
+-----+----------------------------------+---------------------+-----------+
3 rows in set (0.00 sec)
mydba@192.168.85.132,3307 [replcrash]> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000004 |     1303 |              |                  |                   |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
#Node2有两条记录
mydba@192.168.85.133,3307 [replcrash]> select * from py_user;
+-----+----------------------------------+---------------------+-----------+
| uid | name                             | add_time            | server_id |
+-----+----------------------------------+---------------------+-----------+
|   1 | 153dc6bf-325d-11e8-88e6-000c29c1 | 2018-03-28 15:53:20 | 1323307   |
|   2 | 272f15ee-325d-11e8-88e6-000c29c1 | 2018-03-28 15:53:50 | 1323307   |
+-----+----------------------------------+---------------------+-----------+
2 rows in set (0.00 sec)
mydba@192.168.85.133,3307 [replcrash]> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000007 |     8859 |              |                  |                   |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
#Node1有一条记录
mydba@192.168.85.134,3307 [replcrash]> select * from py_user;
+-----+----------------------------------+---------------------+-----------+
| uid | name                             | add_time            | server_id |
+-----+----------------------------------+---------------------+-----------+
|   1 | 153dc6bf-325d-11e8-88e6-000c29c1 | 2018-03-28 15:53:20 | 1323307   |
+-----+----------------------------------+---------------------+-----------+
1 row in set (0.00 sec)
mydba@192.168.85.134,3307 [replcrash]> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000002 |    10322 |              |                  |                   |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

很明显从节点Node3落后于从节点Node2、从节点Node2落后于主节点Node1

二、传统复制下手动Failover

手动Failover场景，Master挂掉，但是mha_manager没有开启，可以通过手动Failover

2.1、手动Failover

• 关闭Node1节点数据库服务

# 关闭Node1节点数据库服务
mydba@192.168.85.132,3307 [replcrash]> shutdown;
 
# Node2、Node3节点复制状态
mydba@192.168.85.133,3307 [replcrash]> pager cat | egrep 'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Running'
PAGER set to 'cat | egrep 'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Running''
mydba@192.168.85.133,3307 [replcrash]> show slave status\G
              Master_Log_File: mysql-bin.000004
          Read_Master_Log_Pos: 973
        Relay_Master_Log_File: mysql-bin.000004
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 973
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
1 row in set (0.00 sec)
mydba@192.168.85.133,3307 [replcrash]> 
 
mydba@192.168.85.134,3307 [replcrash]> pager cat | egrep 'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Running'
PAGER set to 'cat | egrep 'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Running''
mydba@192.168.85.134,3307 [replcrash]> show slave status\G
              Master_Log_File: mysql-bin.000004
          Read_Master_Log_Pos: 643
        Relay_Master_Log_File: mysql-bin.000004
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 643
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
1 row in set (0.00 sec)
mydba@192.168.85.134,3307 [replcrash]>

此时，是否开启从库的io_thread没啥影响，主库已经down掉，从库的io_thread肯定是连不上去
• 手动Failover脚本，指定新Master为Node3

# Node1节点手动故障切换
[root@ZST3 app1]# masterha_master_switch --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf --dead_master_host=192.168.85.132 --dead_master_port= --master_state=dead --new_master_host=192.168.85.134 --new_master_port= --ignore_last_failover

此时复制结构为Node1->{Node2、Node3}，手动故障切换后结构为：Node3->{Node2}

2.2、切换流程

手动Failover日志输出

# 手动Failover
[root@ZST3 app1]# masterha_master_switch --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf --dead_master_host=192.168.85.132 --dead_master_port= --master_state=dead --new_master_host=192.168.85.134 --new_master_port= --ignore_last_failover
--dead_master_ip=<dead_master_ip> is not set. Using 192.168.85.132.
Wed Mar  ::  - [info] Reading default configuration from /etc/masterha/masterha_default.conf..
Wed Mar  ::  - [info] Reading application default configuration from /etc/masterha/app1.conf..
Wed Mar  ::  - [info] Reading server configuration from /etc/masterha/app1.conf..
Wed Mar  ::  - [info] MHA::MasterFailover version 0.56.
Wed Mar  ::  - [info] Starting master failover.
Wed Mar  ::  - [info]
==================== 、配置检查阶段，Start ====================
Wed Mar  ::  - [info] * Phase : Configuration Check Phase..
Wed Mar  ::  - [info]
Wed Mar  ::  - [debug] Connecting to servers..
Wed Mar  ::  - [debug]  Connected to: 192.168.85.133(192.168.85.133:), user=mydba
Wed Mar  ::  - [debug]  Number of slave worker threads on host 192.168.85.133(192.168.85.133:):
Wed Mar  ::  - [debug]  Connected to: 192.168.85.134(192.168.85.134:), user=mydba
Wed Mar  ::  - [debug]  Number of slave worker threads on host 192.168.85.134(192.168.85.134:):
Wed Mar  ::  - [debug]  Comparing MySQL versions..
Wed Mar  ::  - [debug]   Comparing MySQL versions done.
Wed Mar  ::  - [debug] Connecting to servers done.
Wed Mar  ::  - [info] GTID failover mode =
Wed Mar  ::  - [info] Dead Servers:
Wed Mar  ::  - [info]   192.168.85.132(192.168.85.132:)
Wed Mar  ::  - [info] Checking master reachability via MySQL(double check)...
Wed Mar  ::  - [info]  ok.
Wed Mar  ::  - [info] Alive Servers:
Wed Mar  ::  - [info]   192.168.85.133(192.168.85.133:)
Wed Mar  ::  - [info]   192.168.85.134(192.168.85.134:)
Wed Mar  ::  - [info] Alive Slaves:
Wed Mar  ::  - [info]   192.168.85.133(192.168.85.133:)  Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Wed Mar  ::  - [debug]    Relay log info repository: FILE
Wed Mar  ::  - [info]     Replicating from 192.168.85.132(192.168.85.132:)
Wed Mar  ::  - [info]     Primary candidate for the new Master (candidate_master is set)
Wed Mar  ::  - [info]   192.168.85.134(192.168.85.134:)  Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Wed Mar  ::  - [debug]    Relay log info repository: FILE
Wed Mar  ::  - [info]     Replicating from 192.168.85.132(192.168.85.132:)
Wed Mar  ::  - [info]     Primary candidate for the new Master (candidate_master is set)
******************** 选择是否继续进行 ********************
Master 192.168.85.132(192.168.85.132:) is dead. Proceed? (yes/NO): yes
Wed Mar  ::  - [info] Starting Non-GTID based failover.
Wed Mar  ::  - [info]
Wed Mar  ::  - [info] ** Phase : Configuration Check Phase completed.
==================== 、配置检查阶段，End ====================
Wed Mar  ::  - [info]
==================== 、故障Master关闭阶段，Start ====================
Wed Mar  ::  - [info] * Phase : Dead Master Shutdown Phase..
Wed Mar  ::  - [info]
Wed Mar  ::  - [debug]  Stopping IO thread on 192.168.85.133(192.168.85.133:)..
Wed Mar  ::  - [debug]  Stopping IO thread on 192.168.85.134(192.168.85.134:)..
Wed Mar  ::  - [debug]  Stop IO thread on 192.168.85.134(192.168.85.134:) done.
Wed Mar  ::  - [debug]  Stop IO thread on 192.168.85.133(192.168.85.133:) done.
Wed Mar  ::  - [debug] SSH connection test to 192.168.85.132, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=, timeout
Wed Mar  ::  - [info] HealthCheck: SSH to 192.168.85.132 is reachable.
Wed Mar  ::  - [info] Forcing shutdown so that applications never connect to the current master..
Wed Mar  ::  - [info] Executing master IP deactivation script:
Wed Mar  ::  - [info]   /etc/masterha/master_ip_failover --orig_master_host=192.168.85.132 --orig_master_ip=192.168.85.132 --orig_master_port= --command=stopssh --ssh_user=root
Wed Mar  ::  - [info]  done.
Wed Mar  ::  - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Wed Mar  ::  - [info] * Phase : Dead Master Shutdown Phase completed.
==================== 、故障Master关闭阶段，End ====================
Wed Mar  ::  - [info]
==================== 、新Master恢复阶段，Start ====================
Wed Mar  ::  - [info] * Phase : Master Recovery Phase..
Wed Mar  ::  - [info]
==================== 3.1、获取最新的Slave ====================
******************** 最新Slave，用途1：用于补全其他Slave缺少的relay-log；用途2：用于save故障Master的binlog的起始点 ********************
Wed Mar  ::  - [info] * Phase 3.1: Getting Latest Slaves Phase..
Wed Mar  ::  - [info]
Wed Mar  ::  - [debug] Fetching current slave status..
Wed Mar  ::  - [debug]  Fetching current slave status done.
Wed Mar  ::  - [info] The latest binary log file/position on all slaves is mysql-bin.:
Wed Mar  ::  - [info] Latest slaves (Slaves that received relay log files to the latest):
Wed Mar  ::  - [info]   192.168.85.133(192.168.85.133:)  Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Wed Mar  ::  - [debug]    Relay log info repository: FILE
Wed Mar  ::  - [info]     Replicating from 192.168.85.132(192.168.85.132:)
Wed Mar  ::  - [info]     Primary candidate for the new Master (candidate_master is set)
Wed Mar  ::  - [info] The oldest binary log file/position on all slaves is mysql-bin.:
Wed Mar  ::  - [info] Oldest slaves:
Wed Mar  ::  - [info]   192.168.85.134(192.168.85.134:)  Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Wed Mar  ::  - [debug]    Relay log info repository: FILE
Wed Mar  ::  - [info]     Replicating from 192.168.85.132(192.168.85.132:)
Wed Mar  ::  - [info]     Primary candidate for the new Master (candidate_master is set)
Wed Mar  ::  - [info]
==================== 3.2、保存故障Master的binlog ====================
Wed Mar  ::  - [info] * Phase 3.2: Saving Dead Master''s Binlog Phase..
Wed Mar  ::  - [info]
Wed Mar  ::  - [info] Fetching dead master''s binary logs..
******************** 在故障Master执行，取最新Slave之后的部分 ********************
Wed Mar  ::  - [info] Executing command on the dead master 192.168.85.132(192.168.85.132:): save_binary_logs --command=save --start_file=mysql-bin.  --start_pos= --binlog_dir=/data/mysql/mysql3307/logs --output_file=/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog --handle_raw_binlog= --disable_log_bin= --manager_version=0.56 --debug
  Creating /var/log/masterha/app1 if not exists..    ok.
 Concat binary/relay logs from mysql-bin. pos  to mysql-bin. EOF into /var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog ..
parse_init_headers: file=mysql-bin. event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
 Binlog Checksum enabled
parse_init_headers: file=mysql-bin. event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
 Got previous gtids log event: .
parse_init_headers: file=mysql-bin. event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
  Dumping binlog format description event, from position  to .. ok.
  Dumping effective binlog data from /data/mysql/mysql3307/logs/mysql-bin. position  to tail().. ok.
parse_init_headers: file=saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
 Binlog Checksum enabled
parse_init_headers: file=saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
 Got previous gtids log event: .
parse_init_headers: file=saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
 Concat succeeded.
saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog                                                                                                  %       .5KB/s   :
******************** 将得到的Master binlog scp到 管理节点mha-manage/手动failover 运行的工作目录 ********************
Wed Mar  ::  - [info] scp from root@192.168.85.132:/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog to local:/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog succeeded.
Wed Mar  ::  - [debug] SSH connection test to 192.168.85.133, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=, timeout
Wed Mar  ::  - [info] HealthCheck: SSH to 192.168.85.133 is reachable.
Wed Mar  ::  - [debug] SSH connection test to 192.168.85.134, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=, timeout
Wed Mar  ::  - [info] HealthCheck: SSH to 192.168.85.134 is reachable.
Wed Mar  ::  - [info]
==================== 3.3、选举新Master ====================
Wed Mar  ::  - [info] * Phase 3.3: Determining New Master Phase..
Wed Mar  ::  - [info]
******************** 查找最新的Slave是否包含其他Slave缺失的Relay-log ********************
Wed Mar  ::  - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Wed Mar  ::  - [info] Checking whether 192.168.85.133 has relay logs from the oldest position..
Wed Mar  ::  - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=mysql-bin. --latest_rmlp= --target_mlf=mysql-bin. --target_rmlp= --server_id= --workdir=/var/log/masterha/app1 --timestamp= --manager_version=0.56 --relay_log_info=/data/mysql/mysql3307/data/relay-log.info  --relay_dir=/data/mysql/mysql3307/data/  --debug  :
    Opening /data/mysql/mysql3307/data/relay-log.info ... ok.
    Relay log found at /data/mysql/mysql3307/data, up to relay-bin.
 Fast relay log position search succeeded.
 Target relay log file/position found. start_file:relay-bin., start_pos:.
Target relay log FOUND!
Wed Mar  ::  - [info] OK. 192.168.85.133 has all relay logs.
Wed Mar  ::  - [info] 192.168.85.134 can be new master.
Wed Mar  ::  - [info] New master is 192.168.85.134(192.168.85.134:)
Wed Mar  ::  - [info] Starting master failover..
Wed Mar  ::  - [info]
From:
192.168.85.132(192.168.85.132:) (current master)
 +--192.168.85.133(192.168.85.133:)
 +--192.168.85.134(192.168.85.134:)
 
To:
192.168.85.134(192.168.85.134:) (new master)
 +--192.168.85.133(192.168.85.133:)
 
******************** 选择是否进行切换 ********************
Starting master switch from 192.168.85.132(192.168.85.132:) to 192.168.85.134(192.168.85.134:)? (yes/NO): yes
Wed Mar  ::  - [info] New master decided manually is 192.168.85.134(192.168.85.134:)
Wed Mar  ::  - [info]
Wed Mar  ::  - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Wed Mar  ::  - [info]
******************** 在最新的Slave，产生新Master与最新的Slave缺失的Relay-log ********************
Wed Mar  ::  - [info] Server 192.168.85.134 received relay logs up to: mysql-bin.:
Wed Mar  ::  - [info] Need to get diffs from the latest slave(192.168.85.133) up to: mysql-bin.: (using the latest slave''s relay logs)
Wed Mar  ::  - [info] Connecting to the latest slave host 192.168.85.133, generating diff relay log files..
Wed Mar  ::  - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=192.168.85.134 --latest_mlf=mysql-bin. --latest_rmlp= --target_mlf=mysql-bin. --target_rmlp= --server_id= --diff_file_readtolatest=/var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog --workdir=/var/log/masterha/app1 --timestamp= --handle_raw_binlog= --disable_log_bin= --manager_version=0.56 --relay_log_info=/data/mysql/mysql3307/data/relay-log.info  --relay_dir=/data/mysql/mysql3307/data/  --debug
Wed Mar  ::  - [info]
    Opening /data/mysql/mysql3307/data/relay-log.info ... ok.
    Relay log found at /data/mysql/mysql3307/data, up to relay-bin.
 Fast relay log position search succeeded.
 Target relay log file/position found. start_file:relay-bin., start_pos:.
 Concat binary/relay logs from relay-bin. pos  to relay-bin. EOF into /var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog ..
parse_init_headers: file=relay-bin. event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
 Binlog Checksum enabled
parse_init_headers: file=relay-bin. event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
 Got previous gtids log event: .
parse_init_headers: file=relay-bin. event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
parse_init_headers: file=relay-bin. event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
 Binlog Checksum enabled
parse_init_headers: file=relay-bin. event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
parse_init_headers: file=relay-bin. event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
  Dumping binlog format description event, from position  to .. ok.
  Dumping effective binlog data from /data/mysql/mysql3307/data/relay-bin. position  to tail().. ok.
parse_init_headers: file=relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
 Binlog Checksum enabled
parse_init_headers: file=relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
 Got previous gtids log event: .
parse_init_headers: file=relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
parse_init_headers: file=relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
 Binlog Checksum enabled
parse_init_headers: file=relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
parse_init_headers: file=relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
 Concat succeeded.
 Generating diff relay log succeeded. Saved at /var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog .
******************** 将得到的relay-log scp到新Master工作目录 ********************
 scp ZST2:/var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog to root@192.168.85.134() succeeded.
Wed Mar  ::  - [info]  Generating diff files succeeded.
Wed Mar  ::  - [info] Sending binlog..
saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog                                                                                                  %       .5KB/s   :
******************** 从管理节点mha-manage/手动failover运行的工作目录scp故障Master的binlog到新Master工作目录 ********************
Wed Mar  ::  - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog to root@192.168.85.134:/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog succeeded.
Wed Mar  ::  - [info]
==================== 3.4、新Master应用差异log ====================
Wed Mar  ::  - [info] * Phase 3.4: Master Log Apply Phase..
Wed Mar  ::  - [info]
Wed Mar  ::  - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Wed Mar  ::  - [info] Starting recovery on 192.168.85.134(192.168.85.134:)..
Wed Mar  ::  - [info]  Generating diffs succeeded.
******************** 等待新Master应用完自己的relay-log ********************
Wed Mar  ::  - [info] Waiting until all relay logs are applied.
Wed Mar  ::  - [info]  done.
Wed Mar  ::  - [debug]  Stopping SQL thread on 192.168.85.134(192.168.85.134:)..
Wed Mar  ::  - [debug]   done.
Wed Mar  ::  - [info] Getting slave status..
Wed Mar  ::  - [info] This slave(192.168.85.134)''s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.:). No need to recover from Exec_Master_Log_Pos.
Wed Mar  ::  - [debug] Current max_allowed_packet is .
Wed Mar  ::  - [debug] Tentatively setting max_allowed_packet to 1GB succeeded.
Wed Mar  ::  - [info] Connecting to the target slave host 192.168.85.134, running recover script..
******************** 新Master按顺序应用与最新的Slave缺失的relay-log，以及故障Master保存的binlog ********************
Wed Mar  ::  - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mydba' --slave_host=192.168.85.134 --slave_ip=192.168.85.134  --slave_port= --apply_files=/var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog,/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog --workdir=/var/log/masterha/app1 --target_version=5.7.-log --timestamp= --handle_raw_binlog= --disable_log_bin= --manager_version=0.56 --debug  --slave_pass=xxx
Wed Mar  ::  - [info]
******************** 将所有缺失的relay-log、binlog汇总到total_binlog ********************
 Concat all apply files to /var/log/masterha/app1/total_binlog_for_192.168.85.134_3307..binlog ..
 Copying the first binlog file /var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog to /var/log/masterha/app1/total_binlog_for_192.168.85.134_3307..binlog.. ok.
  Dumping binlog head events (rotate events), skipping format description events from /var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog.. parse_init_headers: file=saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
 Binlog Checksum enabled
parse_init_headers: file=saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
 Got previous gtids log event: .
parse_init_headers: file=saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
dumped up to pos . ok.
 /var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog has effective binlog events from pos .
  Dumping effective binlog data from /var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog position  to tail().. ok.
 Concat succeeded.
All apply target binary logs are concatinated at /var/log/masterha/app1/total_binlog_for_192.168.85.134_3307..binlog .
MySQL client version is 5.7.. Using --binary-mode.
Applying differential binary/relay log files /var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog,/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog on 192.168.85.134:. This may take long time...
Applying log files succeeded.
Wed Mar  ::  - [debug] Setting max_allowed_packet back to  succeeded.
Wed Mar  ::  - [info]  All relay logs were successfully applied.
******************** 新Master应用完所有的relay-log、binlog，得到当前位置 ********************
Wed Mar  ::  - [info] Getting new master''s binlog name and position..
Wed Mar  ::  - [info]  mysql-bin.:
Wed Mar  ::  - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.85.134', MASTER_PORT=, MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=, MASTER_USER='repl', MASTER_PASSWORD='xxx';
******************** 开启虚拟IP，新Master可以对外提供服务 ********************
Wed Mar  ::  - [info] Executing master IP activate script:
Wed Mar  ::  - [info]   /etc/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.85.132 --orig_master_ip=192.168.85.132 --orig_master_port= --new_master_host=192.168.85.134 --new_master_ip=192.168.85.134 --new_master_port= --new_master_user='mydba' --new_master_password='mysql5721'
Set read_only= on the new master.
Wed Mar  ::  - [info]  OK.
Wed Mar  ::  - [info] ** Finished master recovery successfully.
Wed Mar  ::  - [info] * Phase : Master Recovery Phase completed.
==================== 、新Master恢复阶段，End ====================
Wed Mar  ::  - [info]
==================== 、Slave恢复阶段，Start ====================
******************** Slave恢复过程类似新Master，首先得到与最新的Slave差异relay-log，然后获取故障Master的binlog ********************
Wed Mar  ::  - [info] * Phase : Slaves Recovery Phase..
Wed Mar  ::  - [info]
==================== 4.1、生成最新Slave和Slave之间的差异log ====================
Wed Mar  ::  - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Wed Mar  ::  - [info]
Wed Mar  ::  - [info] -- Slave diff file generation on host 192.168.85.133(192.168.85.133:) started, pid: . Check tmp log /var/log/masterha/app1/192.168..133_3307_20180328160107.log if it takes time..
Wed Mar  ::  - [info]
Wed Mar  ::  - [info] Log messages from 192.168.85.133 ...
Wed Mar  ::  - [info]
Wed Mar  ::  - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Wed Mar  ::  - [info] End of log messages from 192.168.85.133.
Wed Mar  ::  - [info] -- 192.168.85.133(192.168.85.133:) has the latest relay log events.
Wed Mar  ::  - [info] Generating relay diff files from the latest slave succeeded.
Wed Mar  ::  - [info]
==================== 4.2、Slave应用差异log ====================
Wed Mar  ::  - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Wed Mar  ::  - [info]
Wed Mar  ::  - [info] -- Slave recovery on host 192.168.85.133(192.168.85.133:) started, pid: . Check tmp log /var/log/masterha/app1/192.168..133_3307_20180328160107.log if it takes time..
saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog                                                                                                  %       .5KB/s   :
Wed Mar  ::  - [debug] Explicitly disabled relay_log_purge.
Wed Mar  ::  - [info]
Wed Mar  ::  - [info] Log messages from 192.168.85.133 ...
Wed Mar  ::  - [info]
Wed Mar  ::  - [info] Sending binlog..
******************** 从管理节点mha-manage/手动failover运行的工作目录scp故障Master的binlog到Slave工作目录 ********************
Wed Mar  ::  - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog to root@192.168.85.133:/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog succeeded.
Wed Mar  ::  - [info] Starting recovery on 192.168.85.133(192.168.85.133:)..
Wed Mar  ::  - [info]  Generating diffs succeeded.
Wed Mar  ::  - [info] Waiting until all relay logs are applied.
Wed Mar  ::  - [info]  done.
Wed Mar  ::  - [debug]  Stopping SQL thread on 192.168.85.133(192.168.85.133:)..
Wed Mar  ::  - [debug]   done.
Wed Mar  ::  - [info] Getting slave status..
Wed Mar  ::  - [info] This slave(192.168.85.133)''s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.:). No need to recover from Exec_Master_Log_Pos.
Wed Mar  ::  - [debug] Current max_allowed_packet is .
Wed Mar  ::  - [debug] Tentatively setting max_allowed_packet to 1GB succeeded.
Wed Mar  ::  - [info] Connecting to the target slave host 192.168.85.133, running recover script..
******************** Slave按顺序应用与最新的Slave缺失的relay-log，以及故障Master保存的binlog ********************
Wed Mar  ::  - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mydba' --slave_host=192.168.85.133 --slave_ip=192.168.85.133  --slave_port= --apply_files=/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog --workdir=/var/log/masterha/app1 --target_version=5.7.-log --timestamp= --handle_raw_binlog= --disable_log_bin= --manager_version=0.56 --debug  --slave_pass=xxx
Wed Mar  ::  - [info]
MySQL client version is 5.7.. Using --binary-mode.
Applying differential binary/relay log files /var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog on 192.168.85.133:. This may take long time...
Applying log files succeeded.
Wed Mar  ::  - [debug] Setting max_allowed_packet back to  succeeded.
Wed Mar  ::  - [info]  All relay logs were successfully applied.
Wed Mar  ::  - [info]  Resetting slave 192.168.85.133(192.168.85.133:) and starting replication from the new master 192.168.85.134(192.168.85.134:)..
Wed Mar  ::  - [debug]  Stopping slave IO/SQL thread on 192.168.85.133(192.168.85.133:)..
Wed Mar  ::  - [debug]   done.
Wed Mar  ::  - [info]  Executed CHANGE MASTER.
Wed Mar  ::  - [debug]  Starting slave IO/SQL thread on 192.168.85.133(192.168.85.133:)..
Wed Mar  ::  - [debug]   done.
Wed Mar  ::  - [info]  Slave started.
Wed Mar  ::  - [info] End of log messages from 192.168.85.133.
Wed Mar  ::  - [info] -- Slave recovery on host 192.168.85.133(192.168.85.133:) succeeded.
Wed Mar  ::  - [info] All new slave servers recovered successfully.
==================== 、Slave恢复阶段，End ====================
Wed Mar  ::  - [info]
==================== 、新Master清理阶段，Start ====================
Wed Mar  ::  - [info] * Phase : New master cleanup phase..
Wed Mar  ::  - [info]
Wed Mar  ::  - [info] Resetting slave info on the new master..
Wed Mar  ::  - [debug]  Clearing slave info..
Wed Mar  ::  - [debug]  Stopping slave IO/SQL thread on 192.168.85.134(192.168.85.134:)..
Wed Mar  ::  - [debug]   done.
Wed Mar  ::  - [debug]  SHOW SLAVE STATUS shows new master does not replicate from anywhere. OK.
Wed Mar  ::  - [info]  192.168.85.134: Resetting slave info succeeded.
==================== 、新Master清理阶段，End ====================
Wed Mar  ::  - [info] Master failover to 192.168.85.134(192.168.85.134:) completed successfully.
Wed Mar  ::  - [debug]  Disconnected from 192.168.85.133(192.168.85.133:)
Wed Mar  ::  - [debug]  Disconnected from 192.168.85.134(192.168.85.134:)
Wed Mar  ::  - [info] 
 
----- Failover Report -----
 
app1: MySQL Master failover 192.168.85.132(192.168.85.132:) to 192.168.85.134(192.168.85.134:) succeeded
 
Master 192.168.85.132(192.168.85.132:) is down!
 
Check MHA Manager logs at ZST3 for details.
 
Started manual(interactive) failover.
Invalidated master IP address on 192.168.85.132(192.168.85.132:)
The latest slave 192.168.85.133(192.168.85.133:) has all relay logs for recovery.
Selected 192.168.85.134(192.168.85.134:) as a new master.
192.168.85.134(192.168.85.134:): OK: Applying all logs succeeded.
192.168.85.134(192.168.85.134:): OK: Activated master IP address.
192.168.85.133(192.168.85.133:): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.85.133(192.168.85.133:): OK: Applying all logs succeeded. Slave started, replicating from 192.168.85.134(192.168.85.134:)
192.168.85.134(192.168.85.134:): Resetting slave info succeeded.
Master failover to 192.168.85.134(192.168.85.134:) completed successfully.
[root@ZST3 app1]#

手动Failover流程

手动Failover(传统)
、配置检查：连接各实例，检查服务状态，检查主从关系
、故障Master关闭：停止各Slave上的IO Thread，故障Master虚拟IP摘除(stopssh)
、新Master恢复
    3.1、获取最新的Slave
        用于补全新Master/其他Slave缺少的数据；用于save故障Master的binlog的起始点
    3.2、保存故障Master的binlog
        故障Master上执行save_binary_logs(只取最新Slave之后的部分)\n将得到的binlog scp到手动Failover运行的工作目录
    3.3、选举新Master
        查找最新的Slave是否包含最旧的Slave缺失的relay-log
        确定新Master，得到切换前后结构
        生成最新Slave和新Master之间的差异relay-log，并拷贝到新Master的工作目录
        从手动Failover运行的工作目录scp故障Master的binlog到新Master工作目录
    3.4、新Master应用差异log
        等待新Master应用完自己的relay-log
        按顺序应用与最新的Slave缺失的relay-log，以及故障Master保存的binlog
        将所有缺失的relay-log、binlog汇总到total_binlog
        得到新Master的binlog:pos，其他Slave将从这个位置开始复制
        绑定虚拟IP，新Master可以对外提供服务
、其他Slave恢复
    4.1、生成差异log
        生成最新Slave和Slave之间的差异relay-log，并拷贝到Slave的工作目录；从手动Failover运行的工作目录scp故障Master的binlog到Slave工作目录
    4.2、Slave应用差异log
        等待Slave应用完自己的relay-log；按顺序应用与最新的Slave缺失的relay-log，以及故障Master保存的binlog；重置Slave上的复制到新Master~
    4.3、如果存在多个Slaves，重复上述操作
、新Master清理：清理旧的复制信息STOP SLAVE;RESET SLAVE ALL;

2.3、目录文件

切换流程需要补全数据，会产生各类文件

# 故障Master
[root@ZST1 app1]# ll
total
-rw-r--r--  root root  Mar  : saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog
[root@ZST1 app1]#

Dead Master

saved_master_binlog_from_**：故障Master与最新Slave之间的差异binlog，在故障Master生成，然后拷贝到 MHA管理节点/手动Failover 工作目录

# 最新的Slave
[root@ZST2 app1]# ll
total
-rw-r--r--.  root root   Mar  : relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog
-rw-r--r--.  root root  Mar  : relay_log_apply_for_192.168.85.133_3307_20180328160107_err.log
-rw-r--r--.  root root   Mar  : saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog
[root@ZST2 app1]#

Latest Slave

relay_from_read_to_latest_**：最新Slave与其他Slave之间的差异relay-log，在最新Slave生成，然后拷贝到其他对应Slave
saved_master_binlog_from_**：从管理节点拷贝过来，源头在故障Master

# 新Master
[root@ZST3 app1]# ll
total
-rw-r--r--.  root root     Mar  : app1.failover.complete
-rw-r--r--.  root root   Mar  : relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog
-rw-r--r--.  root root  Mar  : relay_log_apply_for_192.168.85.134_3307_20180328160107_err.log
-rw-r--r--.  root root   Mar  : saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog
-rw-r--r--.  root root  Mar  : total_binlog_for_192.168.85.134_3307..binlog
[root@ZST3 app1]#

New Master

relay_from_read_to_latest_**：从最新的Slave上拷贝过来
saved_master_binlog_from_ **：从管理节点拷贝过来，源头在故障Master
total_binlog_for_**：汇总所有缺失的relay-log、binlog信息
• 解析差异log，查看文件中的日志信息

#最新Slave与其他Slave之间的差异relay-log
[root@ZST3 app1]# mysqlbinlog -vv --base64-output=decode-rows relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at
# :: server id   end_log_pos  CRC32 0x152b7e41    Start: binlog v , server v 5.7.-log created  ::
# This Format_description_event appears in a relay log and was generated by the slave thread.
# at
# :: server id   end_log_pos  CRC32 0x5ea2e9c6    Previous-GTIDs
# [empty]
# at
#  :: server id   end_log_pos  CRC32 0x2076d50b      Rotate to mysql-bin.  pos:
# at
# :: server id   end_log_pos  CRC32 0x9b1488de    Start: binlog v , server v 5.7.-log created  :: at startup
ROLLBACK/*!*/;
# at
# :: server id   end_log_pos  CRC32 0x838279dd  Rotate to mysql-bin.  pos:
# at
# :: server id   end_log_pos  CRC32 0x9fba3aa7    Anonymous_GTID  last_committed=        sequence_number=       rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
# at
# :: server id   end_log_pos  CRC32 0x112f5399    Query   thread_id=     exec_time=     error_code=
SET TIMESTAMP=/*!*/;
SET @@session.pseudo_thread_id=/*!*/;
SET @@session.foreign_key_checks=, @@session.sql_auto_is_null=, @@session.unique_checks=, @@session.autocommit=/*!*/;
SET @@session.sql_mode=/*!*/;
SET @@session.auto_increment_increment=, @@session.auto_increment_offset=/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=,@@session.collation_connection=,@@session.collation_server=/*!*/;
SET @@session.time_zone='SYSTEM'/*!*/;
SET @@session.lc_time_names=/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at
# :: server id   end_log_pos  CRC32 0x890cf300    Table_map: `replcrash`.`py_user` mapped to number
# at
# :: server id   end_log_pos  CRC32 0xccb038f5    Write_rows: table id  flags: STMT_END_F
### INSERT INTO `replcrash`.`py_user`
### SET
###   @= /* INT meta=0 nullable=0 is_null=0 */
###   @='272f15ee-325d-11e8-88e6-000c29c1' /* VARSTRING(96) meta=96 nullable=1 is_null=0 */
###   @='2018-03-28 15:53:50' /* DATETIME(0) meta=0 nullable=1 is_null=0 */
###   @='' /* VARSTRING(30) meta=30 nullable=1 is_null=0 */
# at
# :: server id   end_log_pos  CRC32 0xbfda64ba    Xid =
COMMIT/*!*/;
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
# End of log file
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
[root@ZST3 app1]# 
 
#故障Master与最新Slave之间的差异binlog
[root@ZST3 app1]# mysqlbinlog -vv --base64-output=decode-rows saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at
# :: server id   end_log_pos  CRC32 0x9b1488de    Start: binlog v , server v 5.7.-log created  :: at startup
ROLLBACK/*!*/;
# at
# :: server id   end_log_pos  CRC32 0x37f9307d    Previous-GTIDs
# [empty]
# at
# :: server id   end_log_pos  CRC32 0x74680cfa   Anonymous_GTID  last_committed=        sequence_number=       rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
# at
# :: server id   end_log_pos  CRC32 0x3774a1d0   Query   thread_id=     exec_time=     error_code=
SET TIMESTAMP=/*!*/;
SET @@session.pseudo_thread_id=/*!*/;
SET @@session.foreign_key_checks=, @@session.sql_auto_is_null=, @@session.unique_checks=, @@session.autocommit=/*!*/;
SET @@session.sql_mode=/*!*/;
SET @@session.auto_increment_increment=, @@session.auto_increment_offset=/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=,@@session.collation_connection=,@@session.collation_server=/*!*/;
SET @@session.time_zone='SYSTEM'/*!*/;
SET @@session.lc_time_names=/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at
# :: server id   end_log_pos  CRC32 0x1468e6b1   Table_map: `replcrash`.`py_user` mapped to number
# at
# :: server id   end_log_pos  CRC32 0x79523051   Write_rows: table id  flags: STMT_END_F
### INSERT INTO `replcrash`.`py_user`
### SET
###   @= /* INT meta=0 nullable=0 is_null=0 */
###   @='2d8900cc-325d-11e8-88e6-000c29c1' /* VARSTRING(96) meta=96 nullable=1 is_null=0 */
###   @='2018-03-28 15:54:01' /* DATETIME(0) meta=0 nullable=1 is_null=0 */
###   @='' /* VARSTRING(30) meta=30 nullable=1 is_null=0 */
# at
# :: server id   end_log_pos  CRC32 0xb93ce981   Xid =
COMMIT/*!*/;
# at
# :: server id   end_log_pos  CRC32 0x577dc41e   Stop
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
# End of log file
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
[root@ZST3 app1]# 
 
#所有缺失的relay-log、binlog信息
[root@ZST3 app1]# mysqlbinlog -vv --base64-output=decode-rows total_binlog_for_192.168.85.134_3307..binlog
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at
# :: server id   end_log_pos  CRC32 0x152b7e41    Start: binlog v , server v 5.7.-log created  ::
# This Format_description_event appears in a relay log and was generated by the slave thread.
# at
# :: server id   end_log_pos  CRC32 0x5ea2e9c6    Previous-GTIDs
# [empty]
# at
#  :: server id   end_log_pos  CRC32 0x2076d50b      Rotate to mysql-bin.  pos:
# at
# :: server id   end_log_pos  CRC32 0x9b1488de    Start: binlog v , server v 5.7.-log created  :: at startup
ROLLBACK/*!*/;
# at
# :: server id   end_log_pos  CRC32 0x838279dd  Rotate to mysql-bin.  pos:
# at
# :: server id   end_log_pos  CRC32 0x9fba3aa7    Anonymous_GTID  last_committed=        sequence_number=       rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
# at
# :: server id   end_log_pos  CRC32 0x112f5399    Query   thread_id=     exec_time=     error_code=
SET TIMESTAMP=/*!*/;
SET @@session.pseudo_thread_id=/*!*/;
SET @@session.foreign_key_checks=, @@session.sql_auto_is_null=, @@session.unique_checks=, @@session.autocommit=/*!*/;
SET @@session.sql_mode=/*!*/;
SET @@session.auto_increment_increment=, @@session.auto_increment_offset=/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=,@@session.collation_connection=,@@session.collation_server=/*!*/;
SET @@session.time_zone='SYSTEM'/*!*/;
SET @@session.lc_time_names=/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at
# :: server id   end_log_pos  CRC32 0x890cf300    Table_map: `replcrash`.`py_user` mapped to number
# at
# :: server id   end_log_pos  CRC32 0xccb038f5    Write_rows: table id  flags: STMT_END_F
### INSERT INTO `replcrash`.`py_user`
### SET
###   @= /* INT meta=0 nullable=0 is_null=0 */
###   @='272f15ee-325d-11e8-88e6-000c29c1' /* VARSTRING(96) meta=96 nullable=1 is_null=0 */
###   @='2018-03-28 15:53:50' /* DATETIME(0) meta=0 nullable=1 is_null=0 */
###   @='' /* VARSTRING(30) meta=30 nullable=1 is_null=0 */
# at
# :: server id   end_log_pos  CRC32 0xbfda64ba    Xid =
COMMIT/*!*/;
# at
# :: server id   end_log_pos  CRC32 0x74680cfa   Anonymous_GTID  last_committed=        sequence_number=       rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
# at
# :: server id   end_log_pos  CRC32 0x3774a1d0   Query   thread_id=     exec_time=     error_code=
SET TIMESTAMP=/*!*/;
BEGIN
/*!*/;
# at
# :: server id   end_log_pos  CRC32 0x1468e6b1   Table_map: `replcrash`.`py_user` mapped to number
# at
# :: server id   end_log_pos  CRC32 0x79523051   Write_rows: table id  flags: STMT_END_F
### INSERT INTO `replcrash`.`py_user`
### SET
###   @= /* INT meta=0 nullable=0 is_null=0 */
###   @='2d8900cc-325d-11e8-88e6-000c29c1' /* VARSTRING(96) meta=96 nullable=1 is_null=0 */
###   @='2018-03-28 15:54:01' /* DATETIME(0) meta=0 nullable=1 is_null=0 */
###   @='' /* VARSTRING(30) meta=30 nullable=1 is_null=0 */
# at
# :: server id   end_log_pos  CRC32 0xb93ce981   Xid =
COMMIT/*!*/;
# at
# :: server id   end_log_pos  CRC32 0x577dc41e   Stop
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
# End of log file
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
[root@ZST3 app1]#

手动故障切换后结构为：Node3->{Node2}，且数据进行了自动补全

三、GTID复制下手动Failover

3.1、MHA配置文件调整

MHA在GTID模式下，需要配置[binlog*]，可以是单独的Binlog Server服务器，也可以是主库的binlog目录。如果不配置[binlog*]，即使主服务器没挂，也不会从主服务器拉binlog，所有未传递到从库的日志将丢失

#app1.conf尾部添加Binlog Server信息
[root@ZST1 masterha]# cat app1.conf
...
[binlog1]
hostname=192.168.85.132
master_binlog_dir=/data/mysql/mysql3307/logs
no_master=
[root@ZST1 masterha]#

3.2、手动Failover

基于Row+Gtid搭建的一主两从复制结构：Node1->{Node2、Node3}，重新生成测试数据，关闭Node1节点数据库服务，执行手动Failover脚本

# GTID+手动Failover
[root@ZST1 masterha]# masterha_master_switch --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf --dead_master_host=192.168.85.132 --dead_master_port= --master_state=dead --new_master_host=192.168.85.134 --new_master_port= --ignore_last_failover
--dead_master_ip=<dead_master_ip> is not set. Using 192.168.85.132.
Thu Mar  ::  - [info] Reading default configuration from /etc/masterha/masterha_default.conf..
Thu Mar  ::  - [info] Reading application default configuration from /etc/masterha/app1.conf..
Thu Mar  ::  - [info] Reading server configuration from /etc/masterha/app1.conf..
Thu Mar  ::  - [info] MHA::MasterFailover version 0.56.
Thu Mar  ::  - [info] Starting master failover.
Thu Mar  ::  - [info]
==================== 、配置检查阶段，Start ====================
Thu Mar  ::  - [info] * Phase : Configuration Check Phase..
Thu Mar  ::  - [info]
Thu Mar  ::  - [debug] SSH connection test to 192.168.85.132, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=, timeout
Thu Mar  ::  - [info] HealthCheck: SSH to 192.168.85.132 is reachable.
Thu Mar  ::  - [info] Binlog server 192.168.85.132 is reachable.
Thu Mar  ::  - [debug] Connecting to servers..
Thu Mar  ::  - [debug]  Connected to: 192.168.85.133(192.168.85.133:), user=mydba
Thu Mar  ::  - [debug]  Number of slave worker threads on host 192.168.85.133(192.168.85.133:):
Thu Mar  ::  - [debug]  Connected to: 192.168.85.134(192.168.85.134:), user=mydba
Thu Mar  ::  - [debug]  Number of slave worker threads on host 192.168.85.134(192.168.85.134:):
Thu Mar  ::  - [debug]  Comparing MySQL versions..
Thu Mar  ::  - [debug]   Comparing MySQL versions done.
Thu Mar  ::  - [debug] Connecting to servers done.
Thu Mar  ::  - [info] GTID failover mode =
Thu Mar  ::  - [info] Dead Servers:
Thu Mar  ::  - [info]   192.168.85.132(192.168.85.132:)
Thu Mar  ::  - [info] Checking master reachability via MySQL(double check)...
Thu Mar  ::  - [info]  ok.
Thu Mar  ::  - [info] Alive Servers:
Thu Mar  ::  - [info]   192.168.85.133(192.168.85.133:)
Thu Mar  ::  - [info]   192.168.85.134(192.168.85.134:)
Thu Mar  ::  - [info] Alive Slaves:
Thu Mar  ::  - [info]   192.168.85.133(192.168.85.133:)  Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Thu Mar  ::  - [info]     GTID ON
Thu Mar  ::  - [debug]    Relay log info repository: FILE
Thu Mar  ::  - [info]     Replicating from 192.168.85.132(192.168.85.132:)
Thu Mar  ::  - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Mar  ::  - [info]   192.168.85.134(192.168.85.134:)  Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Thu Mar  ::  - [info]     GTID ON
Thu Mar  ::  - [debug]    Relay log info repository: FILE
Thu Mar  ::  - [info]     Replicating from 192.168.85.132(192.168.85.132:)
Thu Mar  ::  - [info]     Primary candidate for the new Master (candidate_master is set)
******************** 选择是否继续进行 ********************
Master 192.168.85.132(192.168.85.132:) is dead. Proceed? (yes/NO): yes
Thu Mar  ::  - [info] Starting GTID based failover.
Thu Mar  ::  - [info]
Thu Mar  ::  - [info] ** Phase : Configuration Check Phase completed.
==================== 、配置检查阶段，End ====================
Thu Mar  ::  - [info]
==================== 、故障Master关闭阶段，Start ====================
Thu Mar  ::  - [info] * Phase : Dead Master Shutdown Phase..
Thu Mar  ::  - [info]
Thu Mar  ::  - [debug] SSH connection test to 192.168.85.132, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=, timeout
Thu Mar  ::  - [debug]  Stopping IO thread on 192.168.85.134(192.168.85.134:)..
Thu Mar  ::  - [debug]  Stopping IO thread on 192.168.85.133(192.168.85.133:)..
Thu Mar  ::  - [debug]  Stop IO thread on 192.168.85.133(192.168.85.133:) done.
Thu Mar  ::  - [debug]  Stop IO thread on 192.168.85.134(192.168.85.134:) done.
Thu Mar  ::  - [info] HealthCheck: SSH to 192.168.85.132 is reachable.
Thu Mar  ::  - [info] Forcing shutdown so that applications never connect to the current master..
Thu Mar  ::  - [info] Executing master IP deactivation script:
Thu Mar  ::  - [info]   /etc/masterha/master_ip_failover --orig_master_host=192.168.85.132 --orig_master_ip=192.168.85.132 --orig_master_port= --command=stopssh --ssh_user=root
Thu Mar  ::  - [info]  done.
Thu Mar  ::  - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Thu Mar  ::  - [info] * Phase : Dead Master Shutdown Phase completed.
==================== 、故障Master关闭阶段，End ====================
Thu Mar  ::  - [info]
==================== 、新Master恢复阶段，Start ====================
Thu Mar  ::  - [info] * Phase : Master Recovery Phase..
Thu Mar  ::  - [info]
==================== 3.1、获取最新的Slave ====================
******************** 最新Slave，用于补全New Master缺少的数据；用于save故障Master的binlog的起始点 ********************
Thu Mar  ::  - [info] * Phase 3.1: Getting Latest Slaves Phase..
Thu Mar  ::  - [info]
Thu Mar  ::  - [debug] Fetching current slave status..
Thu Mar  ::  - [debug]  Fetching current slave status done.
Thu Mar  ::  - [info] The latest binary log file/position on all slaves is mysql-bin.:
Thu Mar  ::  - [info] Retrieved Gtid Set: 90b30799--11e7--000c29c1025c:-
Thu Mar  ::  - [info] Latest slaves (Slaves that received relay log files to the latest):
Thu Mar  ::  - [info]   192.168.85.133(192.168.85.133:)  Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Thu Mar  ::  - [info]     GTID ON
Thu Mar  ::  - [debug]    Relay log info repository: FILE
Thu Mar  ::  - [info]     Replicating from 192.168.85.132(192.168.85.132:)
Thu Mar  ::  - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Mar  ::  - [info] The oldest binary log file/position on all slaves is mysql-bin.:
Thu Mar  ::  - [info] Retrieved Gtid Set: 90b30799--11e7--000c29c1025c:-
Thu Mar  ::  - [info] Oldest slaves:
Thu Mar  ::  - [info]   192.168.85.134(192.168.85.134:)  Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Thu Mar  ::  - [info]     GTID ON
Thu Mar  ::  - [debug]    Relay log info repository: FILE
Thu Mar  ::  - [info]     Replicating from 192.168.85.132(192.168.85.132:)
Thu Mar  ::  - [info]     Primary candidate for the new Master (candidate_master is set)
Thu Mar  ::  - [info]
==================== 3.3、选举新Master ====================
Thu Mar  ::  - [info] * Phase 3.3: Determining New Master Phase..
Thu Mar  ::  - [info]
Thu Mar  ::  - [info] 192.168.85.134 can be new master.
Thu Mar  ::  - [info] New master is 192.168.85.134(192.168.85.134:)
Thu Mar  ::  - [info] Starting master failover..
Thu Mar  ::  - [info]
From:
192.168.85.132(192.168.85.132:) (current master)
 +--192.168.85.133(192.168.85.133:)
 +--192.168.85.134(192.168.85.134:)
 
To:
192.168.85.134(192.168.85.134:) (new master)
 +--192.168.85.133(192.168.85.133:)
 
******************** 选择是否进行切换 ********************
Starting master switch from 192.168.85.132(192.168.85.132:) to 192.168.85.134(192.168.85.134:)? (yes/NO): yes
Thu Mar  ::  - [info] New master decided manually is 192.168.85.134(192.168.85.134:)
Thu Mar  ::  - [info]
Thu Mar  ::  - [info] * Phase 3.3: New Master Recovery Phase..
Thu Mar  ::  - [info]
******************** 等待新Master应用完自己的relay-log ********************
Thu Mar  ::  - [info]  Waiting all logs to be applied..
Thu Mar  ::  - [info]   done.
Thu Mar  ::  - [debug]  Stopping slave IO/SQL thread on 192.168.85.134(192.168.85.134:)..
Thu Mar  ::  - [debug]   done.
Thu Mar  ::  - [info]  Replicating from the latest slave 192.168.85.133(192.168.85.133:) and waiting to apply..
******************** 等待最新的Slave应用完自己的relay-log ********************
Thu Mar  ::  - [info]  Waiting all logs to be applied on the latest slave..
******************** 将新Master change到最新的Slave，以补全差异数据 ********************
Thu Mar  ::  - [info]  Resetting slave 192.168.85.134(192.168.85.134:) and starting replication from the new master 192.168.85.133(192.168.85.133:)..
Thu Mar  ::  - [debug]  Stopping slave IO/SQL thread on 192.168.85.134(192.168.85.134:)..
Thu Mar  ::  - [debug]   done.
Thu Mar  ::  - [info]  Executed CHANGE MASTER.
Thu Mar  ::  - [debug]  Starting slave IO/SQL thread on 192.168.85.134(192.168.85.134:)..
Thu Mar  ::  - [debug]   done.
Thu Mar  ::  - [info]  Slave started.
Thu Mar  ::  - [info]  Waiting to execute all relay logs on 192.168.85.134(192.168.85.134:)..
Thu Mar  ::  - [info]  master_pos_wait(mysql-bin.:) completed on 192.168.85.134(192.168.85.134:). Executed  events.
Thu Mar  ::  - [info]   done.
Thu Mar  ::  - [debug]  Stopping SQL thread on 192.168.85.134(192.168.85.134:)..
Thu Mar  ::  - [debug]   done.
Thu Mar  ::  - [info]   done.
Thu Mar  ::  - [info] -- Saving binlog from host 192.168.85.132 started, pid:
Thu Mar  ::  - [info]
Thu Mar  ::  - [info] Log messages from 192.168.85.132 ...
Thu Mar  ::  - [info]
******************** 在故障Master/BinlogServer执行，取最新Slave之后的部分 ********************
Thu Mar  ::  - [info] Fetching binary logs from binlog server 192.168.85.132..
Thu Mar  ::  - [info] Executing binlog save command: save_binary_logs --command=save --start_file=mysql-bin.  --start_pos= --output_file=/var/log/masterha/app1/saved_binlog_binlog1_20180329150032.binlog --handle_raw_binlog= --skip_filter= --disable_log_bin= --manager_version=0.56 --oldest_version=5.7.-log  --debug  --binlog_dir=/data/mysql/mysql3307/logs
  Creating /var/log/masterha/app1 if not exists..    ok.
 Concat binary/relay logs from mysql-bin. pos  to mysql-bin. EOF into /var/log/masterha/app1/saved_binlog_binlog1_20180329150032.binlog ..
Executing command: mysqlbinlog --start-position=  /data/mysql/mysql3307/logs/mysql-bin. >> /var/log/masterha/app1/saved_binlog_binlog1_20180329150032.binlog
 Concat succeeded.
******************** 将得到的binlog scp到 手动failover 运行的工作目录 ********************
Thu Mar  ::  - [info] scp from root@192.168.85.132:/var/log/masterha/app1/saved_binlog_binlog1_20180329150032.binlog to local:/var/log/masterha/app1/saved_binlog_192.168.85.132_binlog1_20180329150032.binlog succeeded.
Thu Mar  ::  - [info] End of log messages from 192.168.85.132.
Thu Mar  ::  - [info] Saved mysqlbinlog size from 192.168.85.132 is  bytes.
Thu Mar  ::  - [info] Applying differential binlog /var/log/masterha/app1/saved_binlog_192.168.85.132_binlog1_20180329150032.binlog ..
Thu Mar  ::  - [info] Differential log apply from binlog server succeeded.
******************** 新Master应用完binlog，得到当前位置 ********************
Thu Mar  ::  - [info] Getting new master''s binlog name and position..
Thu Mar  ::  - [info]  mysql-bin.:
Thu Mar  ::  - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.85.134', MASTER_PORT=, MASTER_AUTO_POSITION=, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Thu Mar  ::  - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin., , 90b30799--11e7--000c29c1025c:-
******************** 开启虚拟IP，新Master可以对外提供服务 ********************
Thu Mar  ::  - [info] Executing master IP activate script:
Thu Mar  ::  - [info]   /etc/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.85.132 --orig_master_ip=192.168.85.132 --orig_master_port= --new_master_host=192.168.85.134 --new_master_ip=192.168.85.134 --new_master_port= --new_master_user='mydba' --new_master_password='mysql5721'
Set read_only= on the new master.
RTNETLINK answers: Cannot assign requested address
RTNETLINK answers: File exists
Thu Mar  ::  - [info]  OK.
Thu Mar  ::  - [info] ** Finished master recovery successfully.
Thu Mar  ::  - [info] * Phase : Master Recovery Phase completed.
==================== 、新Master恢复阶段，End ====================
Thu Mar  ::  - [info]
==================== 、Slave恢复阶段，Start ====================
Thu Mar  ::  - [info] * Phase : Slaves Recovery Phase..
Thu Mar  ::  - [info]
Thu Mar  ::  - [info]
==================== 4.1、Slave直接change master to New_Master ====================
Thu Mar  ::  - [info] * Phase 4.1: Starting Slaves in parallel..
Thu Mar  ::  - [info]
Thu Mar  ::  - [info] -- Slave recovery on host 192.168.85.133(192.168.85.133:) started, pid: . Check tmp log /var/log/masterha/app1/192.168..133_3307_20180329150032.log if it takes time..
Thu Mar  ::  - [info]
Thu Mar  ::  - [info] Log messages from 192.168.85.133 ...
Thu Mar  ::  - [info]
Thu Mar  ::  - [info]  Resetting slave 192.168.85.133(192.168.85.133:) and starting replication from the new master 192.168.85.134(192.168.85.134:)..
Thu Mar  ::  - [debug]  Stopping slave IO/SQL thread on 192.168.85.133(192.168.85.133:)..
Thu Mar  ::  - [debug]   done.
Thu Mar  ::  - [info]  Executed CHANGE MASTER.
Thu Mar  ::  - [debug]  Starting slave IO/SQL thread on 192.168.85.133(192.168.85.133:)..
Thu Mar  ::  - [debug]   done.
Thu Mar  ::  - [info]  Slave started.
Thu Mar  ::  - [info]  gtid_wait(90b30799--11e7--000c29c1025c:-) completed on 192.168.85.133(192.168.85.133:). Executed  events.
Thu Mar  ::  - [info] End of log messages from 192.168.85.133.
Thu Mar  ::  - [info] -- Slave on host 192.168.85.133(192.168.85.133:) started.
Thu Mar  ::  - [info] All new slave servers recovered successfully.
==================== 、Slave恢复阶段，End ====================
Thu Mar  ::  - [info]
==================== 、新Master清理阶段，Start ====================
Thu Mar  ::  - [info] * Phase : New master cleanup phase..
Thu Mar  ::  - [info]
Thu Mar  ::  - [info] Resetting slave info on the new master..
Thu Mar  ::  - [debug]  Clearing slave info..
Thu Mar  ::  - [debug]  Stopping slave IO/SQL thread on 192.168.85.134(192.168.85.134:)..
Thu Mar  ::  - [debug]   done.
Thu Mar  ::  - [debug]  SHOW SLAVE STATUS shows new master does not replicate from anywhere. OK.
Thu Mar  ::  - [info]  192.168.85.134: Resetting slave info succeeded.
==================== 、新Master清理阶段，End ====================
Thu Mar  ::  - [info] Master failover to 192.168.85.134(192.168.85.134:) completed successfully.
Thu Mar  ::  - [debug]  Disconnected from 192.168.85.133(192.168.85.133:)
Thu Mar  ::  - [debug]  Disconnected from 192.168.85.134(192.168.85.134:)
Thu Mar  ::  - [info] 
 
----- Failover Report -----
 
app1: MySQL Master failover 192.168.85.132(192.168.85.132:) to 192.168.85.134(192.168.85.134:) succeeded
 
Master 192.168.85.132(192.168.85.132:) is down!
 
Check MHA Manager logs at ZST1 for details.
 
Started manual(interactive) failover.
Invalidated master IP address on 192.168.85.132(192.168.85.132:)
Selected 192.168.85.134(192.168.85.134:) as a new master.
192.168.85.134(192.168.85.134:): OK: Applying all logs succeeded.
192.168.85.134(192.168.85.134:): OK: Activated master IP address.
192.168.85.133(192.168.85.133:): OK: Slave started, replicating from 192.168.85.134(192.168.85.134:)
192.168.85.134(192.168.85.134:): Resetting slave info succeeded.
Master failover to 192.168.85.134(192.168.85.134:) completed successfully.
[root@ZST1 masterha]#

手动Failover流程

手动Failover(GTID)
、配置检查：连接各实例，检查服务状态，检查主从关系
、故障Master关闭：停止各Slave上的IO Thread，故障Master虚拟IP摘除(stopssh)
、新Master恢复
    3.1、获取最新的Slave
        用于补全新Master缺少的数据；用于save故障Master的binlog的起始点
    3.2、选举新Master
        确定新Master，得到切换前后结构
    3.3、新Master恢复
        3.3.、补全新Master与最新Slave差异
            等待新Master应用完自己的relay-log；等待最新Slave应用完自己的relay-log；将新Master change到最新Slave，以补全差异数据
        3.3.、补全新Master与故障Master差异
            故障Master/BinlogServer上执行save_binary_logs；将得到的binlog scp到手动Failover运行的工作目录；新Master应用完binlog，得到当前位置；绑定虚拟IP，新Master可以对外提供服务
、其他Slave恢复
    4.1、重置复制，RESET SLAVE;CHANGE MASTER TO New Master;
    4.2、如果存在多个Slaves，重复上述操作
、新Master清理：清理旧的复制信息STOP SLAVE;RESET SLAVE ALL;

3.3、传统和GTID下手动Failover流程区别

为了得到详细的切换日志，建议
• MHA配置文件开启log_level=debug
• Node1、Node2、Node3节点模拟数据差异
• New Master分别选择Node2、Node3
手动Failover(GTID)，建议打开general-log，以查看New Master与Latest Slave之间数据补全方式

	传统	GTID
是否补全数据	只要主节点服务器没挂，默认会将所有数据补全	需在配置文件将master/binlog server配置到[binlog*]，才能补全Dead Master上的差异log，否则只应用到Latest Slave
补全数据的方式	新Master/其他Slave拉取Latest Slave的relay-log	新master拉取Latest Slave的binlog
	所有的新Master/其他Slave生成与Latest Slave之间差异的relay-log，并应用这些relay-log(对应文件relay_from_read_to_latest_**)	新Master change to Latest Slave，以补全与Latest Slave之间的差异数据
	新Master/其他Slave应用Latest Slave与Dead Master之间的差异binlog(对应文件saved_master_binlog_from_**)	新Master追平Latest Slave后，再通过save_binary_logs生成与Dead Master之间的差异binlog，并应用(对应文件saved_binlog_binlog1_**)
		其他Slave不需应用任何差异log，直接change master to new_master即可
生成的文件	relay_from_read_to_latest_**：最新Slave与其他Slave之间的差异relay-log，在最新Slave生成，然后拷贝到其他对应Slave	saved_master_binlog_from_**：故障Master与最新Slave之间的差异binlog，在故障Master/BinlogServer生成，然后拷贝到手动Failover运行的工作目录
	saved_master_binlog_from_**：故障Master与最新Slave之间的差异binlog，在故障Master生成，先拷贝到手动Failover运行的工作目录，然后拷贝到其他Slave
	文件可以使用mysqlbinlog解析~.~	文件不能使用mysqlbinlog解析(･ω･)也许是姿势不对~不过它们的命令确实稍有不同~~

GTID环境，只有在处理Dead Master数据时，才使用save_binary_logs的方式(主库挂掉，没法change)，其他都是直接通过change master to利用复制线程补全数据。同时它也不再依赖Latest Slave的relay-log
总的来说GTID环境下MHA有点臃肿，有能力的可以自行写脚本处理：
确定Latest_Slave->New_Master:change master to Latest_Slave->mysqlbinlog ./binlogserver/binlog --start-positon>New_Master->Other_Slave change master to New_Master
如果使用增强半同步，基本能确保Dead_Master上的binlog全部传递到Latest_Slave，这种情况下进行故障切换更加简单(⊙_⊙)