本文仅梳理手动Failover流程。MHA的介绍详见:MySQL高可用架构之MHA

一、基本环境

1.1、复制结构

VMware10.0+CentOS6.9+MySQL5.7.21

ROLE HOSTNAME BASEDIR DATADIR IP PORT
Node1 ZST1 /usr/local/mysql /data/mysql/mysql3307/data 192.168.85.132 3307
Node2 ZST2 /usr/local/mysql /data/mysql/mysql3307/data 192.168.85.133 3307
Node3 ZST3 /usr/local/mysql /data/mysql/mysql3307/data 192.168.85.134 3307

传统复制基于Row+Position,GTID复制基于Row+Gtid搭建的一主两从复制结构:Node1->{Node2、Node3}

1.2、MHA配置文件

文中使用的MHA版本是0.56,并且在Node1、Node2、Node3全部安装manager、node包
MHA的配置文件如下

  1. # 全局级配置文件:/etc/masterha/masterha_default.conf
  2. [root@ZST1 masterha]# cat masterha_default.conf
  3. [server default]
  4. #MySQL的用户和密码
  5. user=mydba
  6. password=mysql5721
  7.  
  8. #系统ssh用户
  9. ssh_user=root
  10.  
  11. #复制用户
  12. repl_user=repl
  13. repl_password=repl
  14.  
  15. #监控
  16. ping_interval=
  17. #shutdown_script=/etc/masterha/send_report.sh
  18.  
  19. #切换调用的脚本
  20. master_ip_failover_script=/etc/masterha/master_ip_failover
  21. master_ip_online_change_script=/etc/masterha/master_ip_online_change
  22.  
  23. log_level=debug
  24. [root@ZST1 masterha]#
  25.  
  26. # 集群1配置文件:/etc/masterha/app1.conf
  27. [root@ZST1 masterha]# cat app1.conf
  28. [server default]
  29. #mha manager工作目录
  30. manager_workdir=/var/log/masterha/app1
  31. manager_log=/var/log/masterha/app1/app1.log
  32. remote_workdir=/var/log/masterha/app1
  33.  
  34. [server1]
  35. hostname=192.168.85.132
  36. port=
  37. master_binlog_dir=/data/mysql/mysql3307/logs
  38. candidate_master=
  39. check_repl_delay=
  40.  
  41. [server2]
  42. hostname=192.168.85.133
  43. port=
  44. master_binlog_dir=/data/mysql/mysql3307/logs
  45. candidate_master=
  46. check_repl_delay=
  47.  
  48. [server3]
  49. hostname=192.168.85.134
  50. port=
  51. master_binlog_dir=/data/mysql/mysql3307/logs
  52. candidate_master=
  53. check_repl_delay=
  54. [root@ZST1 masterha]#

1.3、测试数据

通过停止从节点的io_thread,再往主节点写入数据,模拟出主从数据、从从数据不一致~

  1. #首先清空表中记录
  2. mydba@192.168.85.132,3307 [replcrash]> truncate table py_user;
  3.  
  4. #Node1写入第一条记录
  5. mydba@192.168.85.132,3307 [replcrash]> insert into py_user(name,add_time,server_id) select left(uuid(),32),now(),@@server_id;
  6. #Node3停止io_thread
  7. mydba@192.168.85.134,3307 [replcrash]> stop slave io_thread;
  8.  
  9. #Node1写入第二条记录
  10. mydba@192.168.85.132,3307 [replcrash]> insert into py_user(name,add_time,server_id) select left(uuid(),32),now(),@@server_id;
  11. #Node2停止io_thread
  12. mydba@192.168.85.133,3307 [replcrash]> stop slave io_thread;
  13.  
  14. #Node1写入第三条记录
  15. mydba@192.168.85.132,3307 [replcrash]> insert into py_user(name,add_time,server_id) select left(uuid(),32),now(),@@server_id;
  16.  
  17. # 最终各节点记录如下
  18. #Node1有三条记录
  19. mydba@192.168.85.132,3307 [replcrash]> select * from py_user;
  20. +-----+----------------------------------+---------------------+-----------+
  21. | uid | name | add_time | server_id |
  22. +-----+----------------------------------+---------------------+-----------+
  23. | 1 | 153dc6bf-325d-11e8-88e6-000c29c1 | 2018-03-28 15:53:20 | 1323307 |
  24. | 2 | 272f15ee-325d-11e8-88e6-000c29c1 | 2018-03-28 15:53:50 | 1323307 |
  25. | 3 | 2d8900cc-325d-11e8-88e6-000c29c1 | 2018-03-28 15:54:01 | 1323307 |
  26. +-----+----------------------------------+---------------------+-----------+
  27. 3 rows in set (0.00 sec)
  28. mydba@192.168.85.132,3307 [replcrash]> show master status;
  29. +------------------+----------+--------------+------------------+-------------------+
  30. | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
  31. +------------------+----------+--------------+------------------+-------------------+
  32. | mysql-bin.000004 | 1303 | | | |
  33. +------------------+----------+--------------+------------------+-------------------+
  34. 1 row in set (0.00 sec)
  35. #Node2有两条记录
  36. mydba@192.168.85.133,3307 [replcrash]> select * from py_user;
  37. +-----+----------------------------------+---------------------+-----------+
  38. | uid | name | add_time | server_id |
  39. +-----+----------------------------------+---------------------+-----------+
  40. | 1 | 153dc6bf-325d-11e8-88e6-000c29c1 | 2018-03-28 15:53:20 | 1323307 |
  41. | 2 | 272f15ee-325d-11e8-88e6-000c29c1 | 2018-03-28 15:53:50 | 1323307 |
  42. +-----+----------------------------------+---------------------+-----------+
  43. 2 rows in set (0.00 sec)
  44. mydba@192.168.85.133,3307 [replcrash]> show master status;
  45. +------------------+----------+--------------+------------------+-------------------+
  46. | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
  47. +------------------+----------+--------------+------------------+-------------------+
  48. | mysql-bin.000007 | 8859 | | | |
  49. +------------------+----------+--------------+------------------+-------------------+
  50. 1 row in set (0.00 sec)
  51. #Node1有一条记录
  52. mydba@192.168.85.134,3307 [replcrash]> select * from py_user;
  53. +-----+----------------------------------+---------------------+-----------+
  54. | uid | name | add_time | server_id |
  55. +-----+----------------------------------+---------------------+-----------+
  56. | 1 | 153dc6bf-325d-11e8-88e6-000c29c1 | 2018-03-28 15:53:20 | 1323307 |
  57. +-----+----------------------------------+---------------------+-----------+
  58. 1 row in set (0.00 sec)
  59. mydba@192.168.85.134,3307 [replcrash]> show master status;
  60. +------------------+----------+--------------+------------------+-------------------+
  61. | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
  62. +------------------+----------+--------------+------------------+-------------------+
  63. | mysql-bin.000002 | 10322 | | | |
  64. +------------------+----------+--------------+------------------+-------------------+
  65. 1 row in set (0.00 sec)

很明显从节点Node3落后于从节点Node2、从节点Node2落后于主节点Node1

二、传统复制下手动Failover

手动Failover场景,Master挂掉,但是mha_manager没有开启,可以通过手动Failover

2.1、手动Failover

• 关闭Node1节点数据库服务

  1. # 关闭Node1节点数据库服务
  2. mydba@192.168.85.132,3307 [replcrash]> shutdown;
  3.  
  4. # Node2、Node3节点复制状态
  5. mydba@192.168.85.133,3307 [replcrash]> pager cat | egrep 'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Running'
  6. PAGER set to 'cat | egrep 'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Running''
  7. mydba@192.168.85.133,3307 [replcrash]> show slave status\G
  8. Master_Log_File: mysql-bin.000004
  9. Read_Master_Log_Pos: 973
  10. Relay_Master_Log_File: mysql-bin.000004
  11. Slave_IO_Running: No
  12. Slave_SQL_Running: Yes
  13. Exec_Master_Log_Pos: 973
  14. Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
  15. 1 row in set (0.00 sec)
  16. mydba@192.168.85.133,3307 [replcrash]>
  17.  
  18. mydba@192.168.85.134,3307 [replcrash]> pager cat | egrep 'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Running'
  19. PAGER set to 'cat | egrep 'Master_Log_File|Relay_Master_Log_File|Read_Master_Log_Pos|Exec_Master_Log_Pos|Running''
  20. mydba@192.168.85.134,3307 [replcrash]> show slave status\G
  21. Master_Log_File: mysql-bin.000004
  22. Read_Master_Log_Pos: 643
  23. Relay_Master_Log_File: mysql-bin.000004
  24. Slave_IO_Running: No
  25. Slave_SQL_Running: Yes
  26. Exec_Master_Log_Pos: 643
  27. Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
  28. 1 row in set (0.00 sec)
  29. mydba@192.168.85.134,3307 [replcrash]>

此时,是否开启从库的io_thread没啥影响,主库已经down掉,从库的io_thread肯定是连不上去
• 手动Failover脚本,指定新Master为Node3

  1. # Node1节点手动故障切换
  2. [root@ZST3 app1]# masterha_master_switch --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf --dead_master_host=192.168.85.132 --dead_master_port= --master_state=dead --new_master_host=192.168.85.134 --new_master_port= --ignore_last_failover

此时复制结构为Node1->{Node2、Node3},手动故障切换后结构为:Node3->{Node2}

2.2、切换流程

手动Failover日志输出

  1. # 手动Failover
  2. [root@ZST3 app1]# masterha_master_switch --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf --dead_master_host=192.168.85.132 --dead_master_port= --master_state=dead --new_master_host=192.168.85.134 --new_master_port= --ignore_last_failover
  3. --dead_master_ip=<dead_master_ip> is not set. Using 192.168.85.132.
  4. Wed Mar :: - [info] Reading default configuration from /etc/masterha/masterha_default.conf..
  5. Wed Mar :: - [info] Reading application default configuration from /etc/masterha/app1.conf..
  6. Wed Mar :: - [info] Reading server configuration from /etc/masterha/app1.conf..
  7. Wed Mar :: - [info] MHA::MasterFailover version 0.56.
  8. Wed Mar :: - [info] Starting master failover.
  9. Wed Mar :: - [info]
  10. ==================== 、配置检查阶段,Start ====================
  11. Wed Mar :: - [info] * Phase : Configuration Check Phase..
  12. Wed Mar :: - [info]
  13. Wed Mar :: - [debug] Connecting to servers..
  14. Wed Mar :: - [debug] Connected to: 192.168.85.133(192.168.85.133:), user=mydba
  15. Wed Mar :: - [debug] Number of slave worker threads on host 192.168.85.133(192.168.85.133:):
  16. Wed Mar :: - [debug] Connected to: 192.168.85.134(192.168.85.134:), user=mydba
  17. Wed Mar :: - [debug] Number of slave worker threads on host 192.168.85.134(192.168.85.134:):
  18. Wed Mar :: - [debug] Comparing MySQL versions..
  19. Wed Mar :: - [debug] Comparing MySQL versions done.
  20. Wed Mar :: - [debug] Connecting to servers done.
  21. Wed Mar :: - [info] GTID failover mode =
  22. Wed Mar :: - [info] Dead Servers:
  23. Wed Mar :: - [info] 192.168.85.132(192.168.85.132:)
  24. Wed Mar :: - [info] Checking master reachability via MySQL(double check)...
  25. Wed Mar :: - [info] ok.
  26. Wed Mar :: - [info] Alive Servers:
  27. Wed Mar :: - [info] 192.168.85.133(192.168.85.133:)
  28. Wed Mar :: - [info] 192.168.85.134(192.168.85.134:)
  29. Wed Mar :: - [info] Alive Slaves:
  30. Wed Mar :: - [info] 192.168.85.133(192.168.85.133:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
  31. Wed Mar :: - [debug] Relay log info repository: FILE
  32. Wed Mar :: - [info] Replicating from 192.168.85.132(192.168.85.132:)
  33. Wed Mar :: - [info] Primary candidate for the new Master (candidate_master is set)
  34. Wed Mar :: - [info] 192.168.85.134(192.168.85.134:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
  35. Wed Mar :: - [debug] Relay log info repository: FILE
  36. Wed Mar :: - [info] Replicating from 192.168.85.132(192.168.85.132:)
  37. Wed Mar :: - [info] Primary candidate for the new Master (candidate_master is set)
  38. ******************** 选择是否继续进行 ********************
  39. Master 192.168.85.132(192.168.85.132:) is dead. Proceed? (yes/NO): yes
  40. Wed Mar :: - [info] Starting Non-GTID based failover.
  41. Wed Mar :: - [info]
  42. Wed Mar :: - [info] ** Phase : Configuration Check Phase completed.
  43. ==================== 、配置检查阶段,End ====================
  44. Wed Mar :: - [info]
  45. ==================== 、故障Master关闭阶段,Start ====================
  46. Wed Mar :: - [info] * Phase : Dead Master Shutdown Phase..
  47. Wed Mar :: - [info]
  48. Wed Mar :: - [debug] Stopping IO thread on 192.168.85.133(192.168.85.133:)..
  49. Wed Mar :: - [debug] Stopping IO thread on 192.168.85.134(192.168.85.134:)..
  50. Wed Mar :: - [debug] Stop IO thread on 192.168.85.134(192.168.85.134:) done.
  51. Wed Mar :: - [debug] Stop IO thread on 192.168.85.133(192.168.85.133:) done.
  52. Wed Mar :: - [debug] SSH connection test to 192.168.85.132, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=, timeout
  53. Wed Mar :: - [info] HealthCheck: SSH to 192.168.85.132 is reachable.
  54. Wed Mar :: - [info] Forcing shutdown so that applications never connect to the current master..
  55. Wed Mar :: - [info] Executing master IP deactivation script:
  56. Wed Mar :: - [info] /etc/masterha/master_ip_failover --orig_master_host=192.168.85.132 --orig_master_ip=192.168.85.132 --orig_master_port= --command=stopssh --ssh_user=root
  57. Wed Mar :: - [info] done.
  58. Wed Mar :: - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
  59. Wed Mar :: - [info] * Phase : Dead Master Shutdown Phase completed.
  60. ==================== 、故障Master关闭阶段,End ====================
  61. Wed Mar :: - [info]
  62. ==================== 、新Master恢复阶段,Start ====================
  63. Wed Mar :: - [info] * Phase : Master Recovery Phase..
  64. Wed Mar :: - [info]
  65. ==================== 3.1、获取最新的Slave ====================
  66. ******************** 最新Slave,用途1:用于补全其他Slave缺少的relay-log;用途2:用于save故障Masterbinlog的起始点 ********************
  67. Wed Mar :: - [info] * Phase 3.1: Getting Latest Slaves Phase..
  68. Wed Mar :: - [info]
  69. Wed Mar :: - [debug] Fetching current slave status..
  70. Wed Mar :: - [debug] Fetching current slave status done.
  71. Wed Mar :: - [info] The latest binary log file/position on all slaves is mysql-bin.:
  72. Wed Mar :: - [info] Latest slaves (Slaves that received relay log files to the latest):
  73. Wed Mar :: - [info] 192.168.85.133(192.168.85.133:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
  74. Wed Mar :: - [debug] Relay log info repository: FILE
  75. Wed Mar :: - [info] Replicating from 192.168.85.132(192.168.85.132:)
  76. Wed Mar :: - [info] Primary candidate for the new Master (candidate_master is set)
  77. Wed Mar :: - [info] The oldest binary log file/position on all slaves is mysql-bin.:
  78. Wed Mar :: - [info] Oldest slaves:
  79. Wed Mar :: - [info] 192.168.85.134(192.168.85.134:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
  80. Wed Mar :: - [debug] Relay log info repository: FILE
  81. Wed Mar :: - [info] Replicating from 192.168.85.132(192.168.85.132:)
  82. Wed Mar :: - [info] Primary candidate for the new Master (candidate_master is set)
  83. Wed Mar :: - [info]
  84. ==================== 3.2、保存故障Masterbinlog ====================
  85. Wed Mar :: - [info] * Phase 3.2: Saving Dead Master''s Binlog Phase..
  86. Wed Mar :: - [info]
  87. Wed Mar :: - [info] Fetching dead master''s binary logs..
  88. ******************** 在故障Master执行,取最新Slave之后的部分 ********************
  89. Wed Mar :: - [info] Executing command on the dead master 192.168.85.132(192.168.85.132:): save_binary_logs --command=save --start_file=mysql-bin. --start_pos= --binlog_dir=/data/mysql/mysql3307/logs --output_file=/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog --handle_raw_binlog= --disable_log_bin= --manager_version=0.56 --debug
  90. Creating /var/log/masterha/app1 if not exists.. ok.
  91. Concat binary/relay logs from mysql-bin. pos to mysql-bin. EOF into /var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog ..
  92. parse_init_headers: file=mysql-bin. event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
  93. Binlog Checksum enabled
  94. parse_init_headers: file=mysql-bin. event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
  95. Got previous gtids log event: .
  96. parse_init_headers: file=mysql-bin. event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
  97. Dumping binlog format description event, from position to .. ok.
  98. Dumping effective binlog data from /data/mysql/mysql3307/logs/mysql-bin. position to tail().. ok.
  99. parse_init_headers: file=saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
  100. Binlog Checksum enabled
  101. parse_init_headers: file=saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
  102. Got previous gtids log event: .
  103. parse_init_headers: file=saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
  104. Concat succeeded.
  105. saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog % .5KB/s :
  106. ******************** 将得到的Master binlog scp 管理节点mha-manage/手动failover 运行的工作目录 ********************
  107. Wed Mar :: - [info] scp from root@192.168.85.132:/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog to local:/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog succeeded.
  108. Wed Mar :: - [debug] SSH connection test to 192.168.85.133, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=, timeout
  109. Wed Mar :: - [info] HealthCheck: SSH to 192.168.85.133 is reachable.
  110. Wed Mar :: - [debug] SSH connection test to 192.168.85.134, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=, timeout
  111. Wed Mar :: - [info] HealthCheck: SSH to 192.168.85.134 is reachable.
  112. Wed Mar :: - [info]
  113. ==================== 3.3、选举新Master ====================
  114. Wed Mar :: - [info] * Phase 3.3: Determining New Master Phase..
  115. Wed Mar :: - [info]
  116. ******************** 查找最新的Slave是否包含其他Slave缺失的Relay-log ********************
  117. Wed Mar :: - [info] Finding the latest slave that has all relay logs for recovering other slaves..
  118. Wed Mar :: - [info] Checking whether 192.168.85.133 has relay logs from the oldest position..
  119. Wed Mar :: - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=mysql-bin. --latest_rmlp= --target_mlf=mysql-bin. --target_rmlp= --server_id= --workdir=/var/log/masterha/app1 --timestamp= --manager_version=0.56 --relay_log_info=/data/mysql/mysql3307/data/relay-log.info --relay_dir=/data/mysql/mysql3307/data/ --debug :
  120. Opening /data/mysql/mysql3307/data/relay-log.info ... ok.
  121. Relay log found at /data/mysql/mysql3307/data, up to relay-bin.
  122. Fast relay log position search succeeded.
  123. Target relay log file/position found. start_file:relay-bin., start_pos:.
  124. Target relay log FOUND!
  125. Wed Mar :: - [info] OK. 192.168.85.133 has all relay logs.
  126. Wed Mar :: - [info] 192.168.85.134 can be new master.
  127. Wed Mar :: - [info] New master is 192.168.85.134(192.168.85.134:)
  128. Wed Mar :: - [info] Starting master failover..
  129. Wed Mar :: - [info]
  130. From:
  131. 192.168.85.132(192.168.85.132:) (current master)
  132. +--192.168.85.133(192.168.85.133:)
  133. +--192.168.85.134(192.168.85.134:)
  134.  
  135. To:
  136. 192.168.85.134(192.168.85.134:) (new master)
  137. +--192.168.85.133(192.168.85.133:)
  138.  
  139. ******************** 选择是否进行切换 ********************
  140. Starting master switch from 192.168.85.132(192.168.85.132:) to 192.168.85.134(192.168.85.134:)? (yes/NO): yes
  141. Wed Mar :: - [info] New master decided manually is 192.168.85.134(192.168.85.134:)
  142. Wed Mar :: - [info]
  143. Wed Mar :: - [info] * Phase 3.3: New Master Diff Log Generation Phase..
  144. Wed Mar :: - [info]
  145. ******************** 在最新的Slave,产生新Master与最新的Slave缺失的Relay-log ********************
  146. Wed Mar :: - [info] Server 192.168.85.134 received relay logs up to: mysql-bin.:
  147. Wed Mar :: - [info] Need to get diffs from the latest slave(192.168.85.133) up to: mysql-bin.: (using the latest slave''s relay logs)
  148. Wed Mar :: - [info] Connecting to the latest slave host 192.168.85.133, generating diff relay log files..
  149. Wed Mar :: - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=192.168.85.134 --latest_mlf=mysql-bin. --latest_rmlp= --target_mlf=mysql-bin. --target_rmlp= --server_id= --diff_file_readtolatest=/var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog --workdir=/var/log/masterha/app1 --timestamp= --handle_raw_binlog= --disable_log_bin= --manager_version=0.56 --relay_log_info=/data/mysql/mysql3307/data/relay-log.info --relay_dir=/data/mysql/mysql3307/data/ --debug
  150. Wed Mar :: - [info]
  151. Opening /data/mysql/mysql3307/data/relay-log.info ... ok.
  152. Relay log found at /data/mysql/mysql3307/data, up to relay-bin.
  153. Fast relay log position search succeeded.
  154. Target relay log file/position found. start_file:relay-bin., start_pos:.
  155. Concat binary/relay logs from relay-bin. pos to relay-bin. EOF into /var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog ..
  156. parse_init_headers: file=relay-bin. event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
  157. Binlog Checksum enabled
  158. parse_init_headers: file=relay-bin. event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
  159. Got previous gtids log event: .
  160. parse_init_headers: file=relay-bin. event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
  161. parse_init_headers: file=relay-bin. event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
  162. Binlog Checksum enabled
  163. parse_init_headers: file=relay-bin. event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
  164. parse_init_headers: file=relay-bin. event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
  165. Dumping binlog format description event, from position to .. ok.
  166. Dumping effective binlog data from /data/mysql/mysql3307/data/relay-bin. position to tail().. ok.
  167. parse_init_headers: file=relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
  168. Binlog Checksum enabled
  169. parse_init_headers: file=relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
  170. Got previous gtids log event: .
  171. parse_init_headers: file=relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
  172. parse_init_headers: file=relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
  173. Binlog Checksum enabled
  174. parse_init_headers: file=relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
  175. parse_init_headers: file=relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
  176. Concat succeeded.
  177. Generating diff relay log succeeded. Saved at /var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog .
  178. ******************** 将得到的relay-log scp到新Master工作目录 ********************
  179. scp ZST2:/var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog to root@192.168.85.134() succeeded.
  180. Wed Mar :: - [info] Generating diff files succeeded.
  181. Wed Mar :: - [info] Sending binlog..
  182. saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog % .5KB/s :
  183. ******************** 从管理节点mha-manage/手动failover运行的工作目录scp故障Masterbinlog到新Master工作目录 ********************
  184. Wed Mar :: - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog to root@192.168.85.134:/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog succeeded.
  185. Wed Mar :: - [info]
  186. ==================== 3.4、新Master应用差异log ====================
  187. Wed Mar :: - [info] * Phase 3.4: Master Log Apply Phase..
  188. Wed Mar :: - [info]
  189. Wed Mar :: - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
  190. Wed Mar :: - [info] Starting recovery on 192.168.85.134(192.168.85.134:)..
  191. Wed Mar :: - [info] Generating diffs succeeded.
  192. ******************** 等待新Master应用完自己的relay-log ********************
  193. Wed Mar :: - [info] Waiting until all relay logs are applied.
  194. Wed Mar :: - [info] done.
  195. Wed Mar :: - [debug] Stopping SQL thread on 192.168.85.134(192.168.85.134:)..
  196. Wed Mar :: - [debug] done.
  197. Wed Mar :: - [info] Getting slave status..
  198. Wed Mar :: - [info] This slave(192.168.85.134)''s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.:). No need to recover from Exec_Master_Log_Pos.
  199. Wed Mar :: - [debug] Current max_allowed_packet is .
  200. Wed Mar :: - [debug] Tentatively setting max_allowed_packet to 1GB succeeded.
  201. Wed Mar :: - [info] Connecting to the target slave host 192.168.85.134, running recover script..
  202. ******************** Master按顺序应用与最新的Slave缺失的relay-log,以及故障Master保存的binlog ********************
  203. Wed Mar :: - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mydba' --slave_host=192.168.85.134 --slave_ip=192.168.85.134 --slave_port= --apply_files=/var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog,/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog --workdir=/var/log/masterha/app1 --target_version=5.7.-log --timestamp= --handle_raw_binlog= --disable_log_bin= --manager_version=0.56 --debug --slave_pass=xxx
  204. Wed Mar :: - [info]
  205. ******************** 将所有缺失的relay-logbinlog汇总到total_binlog ********************
  206. Concat all apply files to /var/log/masterha/app1/total_binlog_for_192.168.85.134_3307..binlog ..
  207. Copying the first binlog file /var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog to /var/log/masterha/app1/total_binlog_for_192.168.85.134_3307..binlog.. ok.
  208. Dumping binlog head events (rotate events), skipping format description events from /var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog.. parse_init_headers: file=saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
  209. Binlog Checksum enabled
  210. parse_init_headers: file=saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
  211. Got previous gtids log event: .
  212. parse_init_headers: file=saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
  213. dumped up to pos . ok.
  214. /var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog has effective binlog events from pos .
  215. Dumping effective binlog data from /var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog position to tail().. ok.
  216. Concat succeeded.
  217. All apply target binary logs are concatinated at /var/log/masterha/app1/total_binlog_for_192.168.85.134_3307..binlog .
  218. MySQL client version is 5.7.. Using --binary-mode.
  219. Applying differential binary/relay log files /var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog,/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog on 192.168.85.134:. This may take long time...
  220. Applying log files succeeded.
  221. Wed Mar :: - [debug] Setting max_allowed_packet back to succeeded.
  222. Wed Mar :: - [info] All relay logs were successfully applied.
  223. ******************** Master应用完所有的relay-logbinlog,得到当前位置 ********************
  224. Wed Mar :: - [info] Getting new master''s binlog name and position..
  225. Wed Mar :: - [info] mysql-bin.:
  226. Wed Mar :: - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.85.134', MASTER_PORT=, MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=, MASTER_USER='repl', MASTER_PASSWORD='xxx';
  227. ******************** 开启虚拟IP,新Master可以对外提供服务 ********************
  228. Wed Mar :: - [info] Executing master IP activate script:
  229. Wed Mar :: - [info] /etc/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.85.132 --orig_master_ip=192.168.85.132 --orig_master_port= --new_master_host=192.168.85.134 --new_master_ip=192.168.85.134 --new_master_port= --new_master_user='mydba' --new_master_password='mysql5721'
  230. Set read_only= on the new master.
  231. Wed Mar :: - [info] OK.
  232. Wed Mar :: - [info] ** Finished master recovery successfully.
  233. Wed Mar :: - [info] * Phase : Master Recovery Phase completed.
  234. ==================== 、新Master恢复阶段,End ====================
  235. Wed Mar :: - [info]
  236. ==================== Slave恢复阶段,Start ====================
  237. ******************** Slave恢复过程类似新Master,首先得到与最新的Slave差异relay-log,然后获取故障Masterbinlog ********************
  238. Wed Mar :: - [info] * Phase : Slaves Recovery Phase..
  239. Wed Mar :: - [info]
  240. ==================== 4.1、生成最新SlaveSlave之间的差异log ====================
  241. Wed Mar :: - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
  242. Wed Mar :: - [info]
  243. Wed Mar :: - [info] -- Slave diff file generation on host 192.168.85.133(192.168.85.133:) started, pid: . Check tmp log /var/log/masterha/app1/192.168..133_3307_20180328160107.log if it takes time..
  244. Wed Mar :: - [info]
  245. Wed Mar :: - [info] Log messages from 192.168.85.133 ...
  246. Wed Mar :: - [info]
  247. Wed Mar :: - [info] This server has all relay logs. No need to generate diff files from the latest slave.
  248. Wed Mar :: - [info] End of log messages from 192.168.85.133.
  249. Wed Mar :: - [info] -- 192.168.85.133(192.168.85.133:) has the latest relay log events.
  250. Wed Mar :: - [info] Generating relay diff files from the latest slave succeeded.
  251. Wed Mar :: - [info]
  252. ==================== 4.2Slave应用差异log ====================
  253. Wed Mar :: - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
  254. Wed Mar :: - [info]
  255. Wed Mar :: - [info] -- Slave recovery on host 192.168.85.133(192.168.85.133:) started, pid: . Check tmp log /var/log/masterha/app1/192.168..133_3307_20180328160107.log if it takes time..
  256. saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog % .5KB/s :
  257. Wed Mar :: - [debug] Explicitly disabled relay_log_purge.
  258. Wed Mar :: - [info]
  259. Wed Mar :: - [info] Log messages from 192.168.85.133 ...
  260. Wed Mar :: - [info]
  261. Wed Mar :: - [info] Sending binlog..
  262. ******************** 从管理节点mha-manage/手动failover运行的工作目录scp故障MasterbinlogSlave工作目录 ********************
  263. Wed Mar :: - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog to root@192.168.85.133:/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog succeeded.
  264. Wed Mar :: - [info] Starting recovery on 192.168.85.133(192.168.85.133:)..
  265. Wed Mar :: - [info] Generating diffs succeeded.
  266. Wed Mar :: - [info] Waiting until all relay logs are applied.
  267. Wed Mar :: - [info] done.
  268. Wed Mar :: - [debug] Stopping SQL thread on 192.168.85.133(192.168.85.133:)..
  269. Wed Mar :: - [debug] done.
  270. Wed Mar :: - [info] Getting slave status..
  271. Wed Mar :: - [info] This slave(192.168.85.133)''s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.:). No need to recover from Exec_Master_Log_Pos.
  272. Wed Mar :: - [debug] Current max_allowed_packet is .
  273. Wed Mar :: - [debug] Tentatively setting max_allowed_packet to 1GB succeeded.
  274. Wed Mar :: - [info] Connecting to the target slave host 192.168.85.133, running recover script..
  275. ******************** Slave按顺序应用与最新的Slave缺失的relay-log,以及故障Master保存的binlog ********************
  276. Wed Mar :: - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mydba' --slave_host=192.168.85.133 --slave_ip=192.168.85.133 --slave_port= --apply_files=/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog --workdir=/var/log/masterha/app1 --target_version=5.7.-log --timestamp= --handle_raw_binlog= --disable_log_bin= --manager_version=0.56 --debug --slave_pass=xxx
  277. Wed Mar :: - [info]
  278. MySQL client version is 5.7.. Using --binary-mode.
  279. Applying differential binary/relay log files /var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog on 192.168.85.133:. This may take long time...
  280. Applying log files succeeded.
  281. Wed Mar :: - [debug] Setting max_allowed_packet back to succeeded.
  282. Wed Mar :: - [info] All relay logs were successfully applied.
  283. Wed Mar :: - [info] Resetting slave 192.168.85.133(192.168.85.133:) and starting replication from the new master 192.168.85.134(192.168.85.134:)..
  284. Wed Mar :: - [debug] Stopping slave IO/SQL thread on 192.168.85.133(192.168.85.133:)..
  285. Wed Mar :: - [debug] done.
  286. Wed Mar :: - [info] Executed CHANGE MASTER.
  287. Wed Mar :: - [debug] Starting slave IO/SQL thread on 192.168.85.133(192.168.85.133:)..
  288. Wed Mar :: - [debug] done.
  289. Wed Mar :: - [info] Slave started.
  290. Wed Mar :: - [info] End of log messages from 192.168.85.133.
  291. Wed Mar :: - [info] -- Slave recovery on host 192.168.85.133(192.168.85.133:) succeeded.
  292. Wed Mar :: - [info] All new slave servers recovered successfully.
  293. ==================== Slave恢复阶段,End ====================
  294. Wed Mar :: - [info]
  295. ==================== 、新Master清理阶段,Start ====================
  296. Wed Mar :: - [info] * Phase : New master cleanup phase..
  297. Wed Mar :: - [info]
  298. Wed Mar :: - [info] Resetting slave info on the new master..
  299. Wed Mar :: - [debug] Clearing slave info..
  300. Wed Mar :: - [debug] Stopping slave IO/SQL thread on 192.168.85.134(192.168.85.134:)..
  301. Wed Mar :: - [debug] done.
  302. Wed Mar :: - [debug] SHOW SLAVE STATUS shows new master does not replicate from anywhere. OK.
  303. Wed Mar :: - [info] 192.168.85.134: Resetting slave info succeeded.
  304. ==================== 、新Master清理阶段,End ====================
  305. Wed Mar :: - [info] Master failover to 192.168.85.134(192.168.85.134:) completed successfully.
  306. Wed Mar :: - [debug] Disconnected from 192.168.85.133(192.168.85.133:)
  307. Wed Mar :: - [debug] Disconnected from 192.168.85.134(192.168.85.134:)
  308. Wed Mar :: - [info]
  309.  
  310. ----- Failover Report -----
  311.  
  312. app1: MySQL Master failover 192.168.85.132(192.168.85.132:) to 192.168.85.134(192.168.85.134:) succeeded
  313.  
  314. Master 192.168.85.132(192.168.85.132:) is down!
  315.  
  316. Check MHA Manager logs at ZST3 for details.
  317.  
  318. Started manual(interactive) failover.
  319. Invalidated master IP address on 192.168.85.132(192.168.85.132:)
  320. The latest slave 192.168.85.133(192.168.85.133:) has all relay logs for recovery.
  321. Selected 192.168.85.134(192.168.85.134:) as a new master.
  322. 192.168.85.134(192.168.85.134:): OK: Applying all logs succeeded.
  323. 192.168.85.134(192.168.85.134:): OK: Activated master IP address.
  324. 192.168.85.133(192.168.85.133:): This host has the latest relay log events.
  325. Generating relay diff files from the latest slave succeeded.
  326. 192.168.85.133(192.168.85.133:): OK: Applying all logs succeeded. Slave started, replicating from 192.168.85.134(192.168.85.134:)
  327. 192.168.85.134(192.168.85.134:): Resetting slave info succeeded.
  328. Master failover to 192.168.85.134(192.168.85.134:) completed successfully.
  329. [root@ZST3 app1]#

手动Failover流程

  1. 手动Failover(传统)
  2. 、配置检查:连接各实例,检查服务状态,检查主从关系
  3. 、故障Master关闭:停止各Slave上的IO Thread,故障Master虚拟IP摘除(stopssh)
  4. 、新Master恢复
  5. 3.1、获取最新的Slave
  6. 用于补全新Master/其他Slave缺少的数据;用于save故障Masterbinlog的起始点
  7. 3.2、保存故障Masterbinlog
  8. 故障Master上执行save_binary_logs(只取最新Slave之后的部分)\n将得到的binlog scp到手动Failover运行的工作目录
  9. 3.3、选举新Master
  10. 查找最新的Slave是否包含最旧的Slave缺失的relay-log
  11. 确定新Master,得到切换前后结构
  12. 生成最新Slave和新Master之间的差异relay-log,并拷贝到新Master的工作目录
  13. 从手动Failover运行的工作目录scp故障Masterbinlog到新Master工作目录
  14. 3.4、新Master应用差异log
  15. 等待新Master应用完自己的relay-log
  16. 按顺序应用与最新的Slave缺失的relay-log,以及故障Master保存的binlog
  17. 将所有缺失的relay-logbinlog汇总到total_binlog
  18. 得到新Masterbinlog:pos,其他Slave将从这个位置开始复制
  19. 绑定虚拟IP,新Master可以对外提供服务
  20. 、其他Slave恢复
  21. 4.1、生成差异log
  22. 生成最新SlaveSlave之间的差异relay-log,并拷贝到Slave的工作目录;从手动Failover运行的工作目录scp故障MasterbinlogSlave工作目录
  23. 4.2Slave应用差异log
  24. 等待Slave应用完自己的relay-log;按顺序应用与最新的Slave缺失的relay-log,以及故障Master保存的binlog;重置Slave上的复制到新Master~
  25. 4.3、如果存在多个Slaves,重复上述操作
  26. 、新Master清理:清理旧的复制信息STOP SLAVE;RESET SLAVE ALL;

2.3、目录文件

切换流程需要补全数据,会产生各类文件

  1. # 故障Master
  2. [root@ZST1 app1]# ll
  3. total
  4. -rw-r--r-- root root Mar : saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog
  5. [root@ZST1 app1]#

Dead Master

saved_master_binlog_from_**:故障Master与最新Slave之间的差异binlog,在故障Master生成,然后拷贝到 MHA管理节点/手动Failover 工作目录

  1. # 最新的Slave
  2. [root@ZST2 app1]# ll
  3. total
  4. -rw-r--r--. root root Mar : relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog
  5. -rw-r--r--. root root Mar : relay_log_apply_for_192.168.85.133_3307_20180328160107_err.log
  6. -rw-r--r--. root root Mar : saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog
  7. [root@ZST2 app1]#

Latest Slave

relay_from_read_to_latest_**:最新Slave与其他Slave之间的差异relay-log,在最新Slave生成,然后拷贝到其他对应Slave
saved_master_binlog_from_**:从管理节点拷贝过来,源头在故障Master

  1. # 新Master
  2. [root@ZST3 app1]# ll
  3. total
  4. -rw-r--r--. root root Mar : app1.failover.complete
  5. -rw-r--r--. root root Mar : relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog
  6. -rw-r--r--. root root Mar : relay_log_apply_for_192.168.85.134_3307_20180328160107_err.log
  7. -rw-r--r--. root root Mar : saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog
  8. -rw-r--r--. root root Mar : total_binlog_for_192.168.85.134_3307..binlog
  9. [root@ZST3 app1]#

New Master

relay_from_read_to_latest_**:从最新的Slave上拷贝过来
saved_master_binlog_from_ **:从管理节点拷贝过来,源头在故障Master
total_binlog_for_**:汇总所有缺失的relay-log、binlog信息
• 解析差异log,查看文件中的日志信息

  1. #最新Slave与其他Slave之间的差异relay-log
  2. [root@ZST3 app1]# mysqlbinlog -vv --base64-output=decode-rows relay_from_read_to_latest_192.168.85.134_3307_20180328160107.binlog
  3. /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
  4. /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
  5. DELIMITER /*!*/;
  6. # at
  7. # :: server id end_log_pos CRC32 0x152b7e41 Start: binlog v , server v 5.7.-log created ::
  8. # This Format_description_event appears in a relay log and was generated by the slave thread.
  9. # at
  10. # :: server id end_log_pos CRC32 0x5ea2e9c6 Previous-GTIDs
  11. # [empty]
  12. # at
  13. # :: server id end_log_pos CRC32 0x2076d50b Rotate to mysql-bin. pos:
  14. # at
  15. # :: server id end_log_pos CRC32 0x9b1488de Start: binlog v , server v 5.7.-log created :: at startup
  16. ROLLBACK/*!*/;
  17. # at
  18. # :: server id end_log_pos CRC32 0x838279dd Rotate to mysql-bin. pos:
  19. # at
  20. # :: server id end_log_pos CRC32 0x9fba3aa7 Anonymous_GTID last_committed= sequence_number= rbr_only=yes
  21. /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
  22. SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
  23. # at
  24. # :: server id end_log_pos CRC32 0x112f5399 Query thread_id= exec_time= error_code=
  25. SET TIMESTAMP=/*!*/;
  26. SET @@session.pseudo_thread_id=/*!*/;
  27. SET @@session.foreign_key_checks=, @@session.sql_auto_is_null=, @@session.unique_checks=, @@session.autocommit=/*!*/;
  28. SET @@session.sql_mode=/*!*/;
  29. SET @@session.auto_increment_increment=, @@session.auto_increment_offset=/*!*/;
  30. /*!\C utf8 *//*!*/;
  31. SET @@session.character_set_client=,@@session.collation_connection=,@@session.collation_server=/*!*/;
  32. SET @@session.time_zone='SYSTEM'/*!*/;
  33. SET @@session.lc_time_names=/*!*/;
  34. SET @@session.collation_database=DEFAULT/*!*/;
  35. BEGIN
  36. /*!*/;
  37. # at
  38. # :: server id end_log_pos CRC32 0x890cf300 Table_map: `replcrash`.`py_user` mapped to number
  39. # at
  40. # :: server id end_log_pos CRC32 0xccb038f5 Write_rows: table id flags: STMT_END_F
  41. ### INSERT INTO `replcrash`.`py_user`
  42. ### SET
  43. ### @= /* INT meta=0 nullable=0 is_null=0 */
  44. ### @='272f15ee-325d-11e8-88e6-000c29c1' /* VARSTRING(96) meta=96 nullable=1 is_null=0 */
  45. ### @='2018-03-28 15:53:50' /* DATETIME(0) meta=0 nullable=1 is_null=0 */
  46. ### @='' /* VARSTRING(30) meta=30 nullable=1 is_null=0 */
  47. # at
  48. # :: server id end_log_pos CRC32 0xbfda64ba Xid =
  49. COMMIT/*!*/;
  50. SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
  51. DELIMITER ;
  52. # End of log file
  53. /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
  54. /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
  55. [root@ZST3 app1]#
  56.  
  57. #故障Master与最新Slave之间的差异binlog
  58. [root@ZST3 app1]# mysqlbinlog -vv --base64-output=decode-rows saved_master_binlog_from_192.168.85.132_3307_20180328160107.binlog
  59. /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
  60. /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
  61. DELIMITER /*!*/;
  62. # at
  63. # :: server id end_log_pos CRC32 0x9b1488de Start: binlog v , server v 5.7.-log created :: at startup
  64. ROLLBACK/*!*/;
  65. # at
  66. # :: server id end_log_pos CRC32 0x37f9307d Previous-GTIDs
  67. # [empty]
  68. # at
  69. # :: server id end_log_pos CRC32 0x74680cfa Anonymous_GTID last_committed= sequence_number= rbr_only=yes
  70. /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
  71. SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
  72. # at
  73. # :: server id end_log_pos CRC32 0x3774a1d0 Query thread_id= exec_time= error_code=
  74. SET TIMESTAMP=/*!*/;
  75. SET @@session.pseudo_thread_id=/*!*/;
  76. SET @@session.foreign_key_checks=, @@session.sql_auto_is_null=, @@session.unique_checks=, @@session.autocommit=/*!*/;
  77. SET @@session.sql_mode=/*!*/;
  78. SET @@session.auto_increment_increment=, @@session.auto_increment_offset=/*!*/;
  79. /*!\C utf8 *//*!*/;
  80. SET @@session.character_set_client=,@@session.collation_connection=,@@session.collation_server=/*!*/;
  81. SET @@session.time_zone='SYSTEM'/*!*/;
  82. SET @@session.lc_time_names=/*!*/;
  83. SET @@session.collation_database=DEFAULT/*!*/;
  84. BEGIN
  85. /*!*/;
  86. # at
  87. # :: server id end_log_pos CRC32 0x1468e6b1 Table_map: `replcrash`.`py_user` mapped to number
  88. # at
  89. # :: server id end_log_pos CRC32 0x79523051 Write_rows: table id flags: STMT_END_F
  90. ### INSERT INTO `replcrash`.`py_user`
  91. ### SET
  92. ### @= /* INT meta=0 nullable=0 is_null=0 */
  93. ### @='2d8900cc-325d-11e8-88e6-000c29c1' /* VARSTRING(96) meta=96 nullable=1 is_null=0 */
  94. ### @='2018-03-28 15:54:01' /* DATETIME(0) meta=0 nullable=1 is_null=0 */
  95. ### @='' /* VARSTRING(30) meta=30 nullable=1 is_null=0 */
  96. # at
  97. # :: server id end_log_pos CRC32 0xb93ce981 Xid =
  98. COMMIT/*!*/;
  99. # at
  100. # :: server id end_log_pos CRC32 0x577dc41e Stop
  101. SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
  102. DELIMITER ;
  103. # End of log file
  104. /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
  105. /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
  106. [root@ZST3 app1]#
  107.  
  108. #所有缺失的relay-log、binlog信息
  109. [root@ZST3 app1]# mysqlbinlog -vv --base64-output=decode-rows total_binlog_for_192.168.85.134_3307..binlog
  110. /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
  111. /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
  112. DELIMITER /*!*/;
  113. # at
  114. # :: server id end_log_pos CRC32 0x152b7e41 Start: binlog v , server v 5.7.-log created ::
  115. # This Format_description_event appears in a relay log and was generated by the slave thread.
  116. # at
  117. # :: server id end_log_pos CRC32 0x5ea2e9c6 Previous-GTIDs
  118. # [empty]
  119. # at
  120. # :: server id end_log_pos CRC32 0x2076d50b Rotate to mysql-bin. pos:
  121. # at
  122. # :: server id end_log_pos CRC32 0x9b1488de Start: binlog v , server v 5.7.-log created :: at startup
  123. ROLLBACK/*!*/;
  124. # at
  125. # :: server id end_log_pos CRC32 0x838279dd Rotate to mysql-bin. pos:
  126. # at
  127. # :: server id end_log_pos CRC32 0x9fba3aa7 Anonymous_GTID last_committed= sequence_number= rbr_only=yes
  128. /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
  129. SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
  130. # at
  131. # :: server id end_log_pos CRC32 0x112f5399 Query thread_id= exec_time= error_code=
  132. SET TIMESTAMP=/*!*/;
  133. SET @@session.pseudo_thread_id=/*!*/;
  134. SET @@session.foreign_key_checks=, @@session.sql_auto_is_null=, @@session.unique_checks=, @@session.autocommit=/*!*/;
  135. SET @@session.sql_mode=/*!*/;
  136. SET @@session.auto_increment_increment=, @@session.auto_increment_offset=/*!*/;
  137. /*!\C utf8 *//*!*/;
  138. SET @@session.character_set_client=,@@session.collation_connection=,@@session.collation_server=/*!*/;
  139. SET @@session.time_zone='SYSTEM'/*!*/;
  140. SET @@session.lc_time_names=/*!*/;
  141. SET @@session.collation_database=DEFAULT/*!*/;
  142. BEGIN
  143. /*!*/;
  144. # at
  145. # :: server id end_log_pos CRC32 0x890cf300 Table_map: `replcrash`.`py_user` mapped to number
  146. # at
  147. # :: server id end_log_pos CRC32 0xccb038f5 Write_rows: table id flags: STMT_END_F
  148. ### INSERT INTO `replcrash`.`py_user`
  149. ### SET
  150. ### @= /* INT meta=0 nullable=0 is_null=0 */
  151. ### @='272f15ee-325d-11e8-88e6-000c29c1' /* VARSTRING(96) meta=96 nullable=1 is_null=0 */
  152. ### @='2018-03-28 15:53:50' /* DATETIME(0) meta=0 nullable=1 is_null=0 */
  153. ### @='' /* VARSTRING(30) meta=30 nullable=1 is_null=0 */
  154. # at
  155. # :: server id end_log_pos CRC32 0xbfda64ba Xid =
  156. COMMIT/*!*/;
  157. # at
  158. # :: server id end_log_pos CRC32 0x74680cfa Anonymous_GTID last_committed= sequence_number= rbr_only=yes
  159. /*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
  160. SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
  161. # at
  162. # :: server id end_log_pos CRC32 0x3774a1d0 Query thread_id= exec_time= error_code=
  163. SET TIMESTAMP=/*!*/;
  164. BEGIN
  165. /*!*/;
  166. # at
  167. # :: server id end_log_pos CRC32 0x1468e6b1 Table_map: `replcrash`.`py_user` mapped to number
  168. # at
  169. # :: server id end_log_pos CRC32 0x79523051 Write_rows: table id flags: STMT_END_F
  170. ### INSERT INTO `replcrash`.`py_user`
  171. ### SET
  172. ### @= /* INT meta=0 nullable=0 is_null=0 */
  173. ### @='2d8900cc-325d-11e8-88e6-000c29c1' /* VARSTRING(96) meta=96 nullable=1 is_null=0 */
  174. ### @='2018-03-28 15:54:01' /* DATETIME(0) meta=0 nullable=1 is_null=0 */
  175. ### @='' /* VARSTRING(30) meta=30 nullable=1 is_null=0 */
  176. # at
  177. # :: server id end_log_pos CRC32 0xb93ce981 Xid =
  178. COMMIT/*!*/;
  179. # at
  180. # :: server id end_log_pos CRC32 0x577dc41e Stop
  181. SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
  182. DELIMITER ;
  183. # End of log file
  184. /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
  185. /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
  186. [root@ZST3 app1]#

手动故障切换后结构为:Node3->{Node2},且数据进行了自动补全

三、GTID复制下手动Failover

3.1、MHA配置文件调整

MHA在GTID模式下,需要配置[binlog*],可以是单独的Binlog Server服务器,也可以是主库的binlog目录。如果不配置[binlog*],即使主服务器没挂,也不会从主服务器拉binlog,所有未传递到从库的日志将丢失

  1. #app1.conf尾部添加Binlog Server信息
  2. [root@ZST1 masterha]# cat app1.conf
  3. ...
  4. [binlog1]
  5. hostname=192.168.85.132
  6. master_binlog_dir=/data/mysql/mysql3307/logs
  7. no_master=
  8. [root@ZST1 masterha]#

3.2、手动Failover

基于Row+Gtid搭建的一主两从复制结构:Node1->{Node2、Node3},重新生成测试数据,关闭Node1节点数据库服务,执行手动Failover脚本

  1. # GTID+手动Failover
  2. [root@ZST1 masterha]# masterha_master_switch --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf --dead_master_host=192.168.85.132 --dead_master_port= --master_state=dead --new_master_host=192.168.85.134 --new_master_port= --ignore_last_failover
  3. --dead_master_ip=<dead_master_ip> is not set. Using 192.168.85.132.
  4. Thu Mar :: - [info] Reading default configuration from /etc/masterha/masterha_default.conf..
  5. Thu Mar :: - [info] Reading application default configuration from /etc/masterha/app1.conf..
  6. Thu Mar :: - [info] Reading server configuration from /etc/masterha/app1.conf..
  7. Thu Mar :: - [info] MHA::MasterFailover version 0.56.
  8. Thu Mar :: - [info] Starting master failover.
  9. Thu Mar :: - [info]
  10. ==================== 、配置检查阶段,Start ====================
  11. Thu Mar :: - [info] * Phase : Configuration Check Phase..
  12. Thu Mar :: - [info]
  13. Thu Mar :: - [debug] SSH connection test to 192.168.85.132, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=, timeout
  14. Thu Mar :: - [info] HealthCheck: SSH to 192.168.85.132 is reachable.
  15. Thu Mar :: - [info] Binlog server 192.168.85.132 is reachable.
  16. Thu Mar :: - [debug] Connecting to servers..
  17. Thu Mar :: - [debug] Connected to: 192.168.85.133(192.168.85.133:), user=mydba
  18. Thu Mar :: - [debug] Number of slave worker threads on host 192.168.85.133(192.168.85.133:):
  19. Thu Mar :: - [debug] Connected to: 192.168.85.134(192.168.85.134:), user=mydba
  20. Thu Mar :: - [debug] Number of slave worker threads on host 192.168.85.134(192.168.85.134:):
  21. Thu Mar :: - [debug] Comparing MySQL versions..
  22. Thu Mar :: - [debug] Comparing MySQL versions done.
  23. Thu Mar :: - [debug] Connecting to servers done.
  24. Thu Mar :: - [info] GTID failover mode =
  25. Thu Mar :: - [info] Dead Servers:
  26. Thu Mar :: - [info] 192.168.85.132(192.168.85.132:)
  27. Thu Mar :: - [info] Checking master reachability via MySQL(double check)...
  28. Thu Mar :: - [info] ok.
  29. Thu Mar :: - [info] Alive Servers:
  30. Thu Mar :: - [info] 192.168.85.133(192.168.85.133:)
  31. Thu Mar :: - [info] 192.168.85.134(192.168.85.134:)
  32. Thu Mar :: - [info] Alive Slaves:
  33. Thu Mar :: - [info] 192.168.85.133(192.168.85.133:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
  34. Thu Mar :: - [info] GTID ON
  35. Thu Mar :: - [debug] Relay log info repository: FILE
  36. Thu Mar :: - [info] Replicating from 192.168.85.132(192.168.85.132:)
  37. Thu Mar :: - [info] Primary candidate for the new Master (candidate_master is set)
  38. Thu Mar :: - [info] 192.168.85.134(192.168.85.134:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
  39. Thu Mar :: - [info] GTID ON
  40. Thu Mar :: - [debug] Relay log info repository: FILE
  41. Thu Mar :: - [info] Replicating from 192.168.85.132(192.168.85.132:)
  42. Thu Mar :: - [info] Primary candidate for the new Master (candidate_master is set)
  43. ******************** 选择是否继续进行 ********************
  44. Master 192.168.85.132(192.168.85.132:) is dead. Proceed? (yes/NO): yes
  45. Thu Mar :: - [info] Starting GTID based failover.
  46. Thu Mar :: - [info]
  47. Thu Mar :: - [info] ** Phase : Configuration Check Phase completed.
  48. ==================== 、配置检查阶段,End ====================
  49. Thu Mar :: - [info]
  50. ==================== 、故障Master关闭阶段,Start ====================
  51. Thu Mar :: - [info] * Phase : Dead Master Shutdown Phase..
  52. Thu Mar :: - [info]
  53. Thu Mar :: - [debug] SSH connection test to 192.168.85.132, option -o StrictHostKeyChecking=no -o PasswordAuthentication=no -o BatchMode=yes -o ConnectTimeout=, timeout
  54. Thu Mar :: - [debug] Stopping IO thread on 192.168.85.134(192.168.85.134:)..
  55. Thu Mar :: - [debug] Stopping IO thread on 192.168.85.133(192.168.85.133:)..
  56. Thu Mar :: - [debug] Stop IO thread on 192.168.85.133(192.168.85.133:) done.
  57. Thu Mar :: - [debug] Stop IO thread on 192.168.85.134(192.168.85.134:) done.
  58. Thu Mar :: - [info] HealthCheck: SSH to 192.168.85.132 is reachable.
  59. Thu Mar :: - [info] Forcing shutdown so that applications never connect to the current master..
  60. Thu Mar :: - [info] Executing master IP deactivation script:
  61. Thu Mar :: - [info] /etc/masterha/master_ip_failover --orig_master_host=192.168.85.132 --orig_master_ip=192.168.85.132 --orig_master_port= --command=stopssh --ssh_user=root
  62. Thu Mar :: - [info] done.
  63. Thu Mar :: - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
  64. Thu Mar :: - [info] * Phase : Dead Master Shutdown Phase completed.
  65. ==================== 、故障Master关闭阶段,End ====================
  66. Thu Mar :: - [info]
  67. ==================== 、新Master恢复阶段,Start ====================
  68. Thu Mar :: - [info] * Phase : Master Recovery Phase..
  69. Thu Mar :: - [info]
  70. ==================== 3.1、获取最新的Slave ====================
  71. ******************** 最新Slave,用于补全New Master缺少的数据;用于save故障Masterbinlog的起始点 ********************
  72. Thu Mar :: - [info] * Phase 3.1: Getting Latest Slaves Phase..
  73. Thu Mar :: - [info]
  74. Thu Mar :: - [debug] Fetching current slave status..
  75. Thu Mar :: - [debug] Fetching current slave status done.
  76. Thu Mar :: - [info] The latest binary log file/position on all slaves is mysql-bin.:
  77. Thu Mar :: - [info] Retrieved Gtid Set: 90b30799--11e7--000c29c1025c:-
  78. Thu Mar :: - [info] Latest slaves (Slaves that received relay log files to the latest):
  79. Thu Mar :: - [info] 192.168.85.133(192.168.85.133:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
  80. Thu Mar :: - [info] GTID ON
  81. Thu Mar :: - [debug] Relay log info repository: FILE
  82. Thu Mar :: - [info] Replicating from 192.168.85.132(192.168.85.132:)
  83. Thu Mar :: - [info] Primary candidate for the new Master (candidate_master is set)
  84. Thu Mar :: - [info] The oldest binary log file/position on all slaves is mysql-bin.:
  85. Thu Mar :: - [info] Retrieved Gtid Set: 90b30799--11e7--000c29c1025c:-
  86. Thu Mar :: - [info] Oldest slaves:
  87. Thu Mar :: - [info] 192.168.85.134(192.168.85.134:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
  88. Thu Mar :: - [info] GTID ON
  89. Thu Mar :: - [debug] Relay log info repository: FILE
  90. Thu Mar :: - [info] Replicating from 192.168.85.132(192.168.85.132:)
  91. Thu Mar :: - [info] Primary candidate for the new Master (candidate_master is set)
  92. Thu Mar :: - [info]
  93. ==================== 3.3、选举新Master ====================
  94. Thu Mar :: - [info] * Phase 3.3: Determining New Master Phase..
  95. Thu Mar :: - [info]
  96. Thu Mar :: - [info] 192.168.85.134 can be new master.
  97. Thu Mar :: - [info] New master is 192.168.85.134(192.168.85.134:)
  98. Thu Mar :: - [info] Starting master failover..
  99. Thu Mar :: - [info]
  100. From:
  101. 192.168.85.132(192.168.85.132:) (current master)
  102. +--192.168.85.133(192.168.85.133:)
  103. +--192.168.85.134(192.168.85.134:)
  104.  
  105. To:
  106. 192.168.85.134(192.168.85.134:) (new master)
  107. +--192.168.85.133(192.168.85.133:)
  108.  
  109. ******************** 选择是否进行切换 ********************
  110. Starting master switch from 192.168.85.132(192.168.85.132:) to 192.168.85.134(192.168.85.134:)? (yes/NO): yes
  111. Thu Mar :: - [info] New master decided manually is 192.168.85.134(192.168.85.134:)
  112. Thu Mar :: - [info]
  113. Thu Mar :: - [info] * Phase 3.3: New Master Recovery Phase..
  114. Thu Mar :: - [info]
  115. ******************** 等待新Master应用完自己的relay-log ********************
  116. Thu Mar :: - [info] Waiting all logs to be applied..
  117. Thu Mar :: - [info] done.
  118. Thu Mar :: - [debug] Stopping slave IO/SQL thread on 192.168.85.134(192.168.85.134:)..
  119. Thu Mar :: - [debug] done.
  120. Thu Mar :: - [info] Replicating from the latest slave 192.168.85.133(192.168.85.133:) and waiting to apply..
  121. ******************** 等待最新的Slave应用完自己的relay-log ********************
  122. Thu Mar :: - [info] Waiting all logs to be applied on the latest slave..
  123. ******************** 将新Master change到最新的Slave,以补全差异数据 ********************
  124. Thu Mar :: - [info] Resetting slave 192.168.85.134(192.168.85.134:) and starting replication from the new master 192.168.85.133(192.168.85.133:)..
  125. Thu Mar :: - [debug] Stopping slave IO/SQL thread on 192.168.85.134(192.168.85.134:)..
  126. Thu Mar :: - [debug] done.
  127. Thu Mar :: - [info] Executed CHANGE MASTER.
  128. Thu Mar :: - [debug] Starting slave IO/SQL thread on 192.168.85.134(192.168.85.134:)..
  129. Thu Mar :: - [debug] done.
  130. Thu Mar :: - [info] Slave started.
  131. Thu Mar :: - [info] Waiting to execute all relay logs on 192.168.85.134(192.168.85.134:)..
  132. Thu Mar :: - [info] master_pos_wait(mysql-bin.:) completed on 192.168.85.134(192.168.85.134:). Executed events.
  133. Thu Mar :: - [info] done.
  134. Thu Mar :: - [debug] Stopping SQL thread on 192.168.85.134(192.168.85.134:)..
  135. Thu Mar :: - [debug] done.
  136. Thu Mar :: - [info] done.
  137. Thu Mar :: - [info] -- Saving binlog from host 192.168.85.132 started, pid:
  138. Thu Mar :: - [info]
  139. Thu Mar :: - [info] Log messages from 192.168.85.132 ...
  140. Thu Mar :: - [info]
  141. ******************** 在故障Master/BinlogServer执行,取最新Slave之后的部分 ********************
  142. Thu Mar :: - [info] Fetching binary logs from binlog server 192.168.85.132..
  143. Thu Mar :: - [info] Executing binlog save command: save_binary_logs --command=save --start_file=mysql-bin. --start_pos= --output_file=/var/log/masterha/app1/saved_binlog_binlog1_20180329150032.binlog --handle_raw_binlog= --skip_filter= --disable_log_bin= --manager_version=0.56 --oldest_version=5.7.-log --debug --binlog_dir=/data/mysql/mysql3307/logs
  144. Creating /var/log/masterha/app1 if not exists.. ok.
  145. Concat binary/relay logs from mysql-bin. pos to mysql-bin. EOF into /var/log/masterha/app1/saved_binlog_binlog1_20180329150032.binlog ..
  146. Executing command: mysqlbinlog --start-position= /data/mysql/mysql3307/logs/mysql-bin. >> /var/log/masterha/app1/saved_binlog_binlog1_20180329150032.binlog
  147. Concat succeeded.
  148. ******************** 将得到的binlog scp 手动failover 运行的工作目录 ********************
  149. Thu Mar :: - [info] scp from root@192.168.85.132:/var/log/masterha/app1/saved_binlog_binlog1_20180329150032.binlog to local:/var/log/masterha/app1/saved_binlog_192.168.85.132_binlog1_20180329150032.binlog succeeded.
  150. Thu Mar :: - [info] End of log messages from 192.168.85.132.
  151. Thu Mar :: - [info] Saved mysqlbinlog size from 192.168.85.132 is bytes.
  152. Thu Mar :: - [info] Applying differential binlog /var/log/masterha/app1/saved_binlog_192.168.85.132_binlog1_20180329150032.binlog ..
  153. Thu Mar :: - [info] Differential log apply from binlog server succeeded.
  154. ******************** Master应用完binlog,得到当前位置 ********************
  155. Thu Mar :: - [info] Getting new master''s binlog name and position..
  156. Thu Mar :: - [info] mysql-bin.:
  157. Thu Mar :: - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.85.134', MASTER_PORT=, MASTER_AUTO_POSITION=, MASTER_USER='repl', MASTER_PASSWORD='xxx';
  158. Thu Mar :: - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin., , 90b30799--11e7--000c29c1025c:-
  159. ******************** 开启虚拟IP,新Master可以对外提供服务 ********************
  160. Thu Mar :: - [info] Executing master IP activate script:
  161. Thu Mar :: - [info] /etc/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.85.132 --orig_master_ip=192.168.85.132 --orig_master_port= --new_master_host=192.168.85.134 --new_master_ip=192.168.85.134 --new_master_port= --new_master_user='mydba' --new_master_password='mysql5721'
  162. Set read_only= on the new master.
  163. RTNETLINK answers: Cannot assign requested address
  164. RTNETLINK answers: File exists
  165. Thu Mar :: - [info] OK.
  166. Thu Mar :: - [info] ** Finished master recovery successfully.
  167. Thu Mar :: - [info] * Phase : Master Recovery Phase completed.
  168. ==================== 、新Master恢复阶段,End ====================
  169. Thu Mar :: - [info]
  170. ==================== Slave恢复阶段,Start ====================
  171. Thu Mar :: - [info] * Phase : Slaves Recovery Phase..
  172. Thu Mar :: - [info]
  173. Thu Mar :: - [info]
  174. ==================== 4.1Slave直接change master to New_Master ====================
  175. Thu Mar :: - [info] * Phase 4.1: Starting Slaves in parallel..
  176. Thu Mar :: - [info]
  177. Thu Mar :: - [info] -- Slave recovery on host 192.168.85.133(192.168.85.133:) started, pid: . Check tmp log /var/log/masterha/app1/192.168..133_3307_20180329150032.log if it takes time..
  178. Thu Mar :: - [info]
  179. Thu Mar :: - [info] Log messages from 192.168.85.133 ...
  180. Thu Mar :: - [info]
  181. Thu Mar :: - [info] Resetting slave 192.168.85.133(192.168.85.133:) and starting replication from the new master 192.168.85.134(192.168.85.134:)..
  182. Thu Mar :: - [debug] Stopping slave IO/SQL thread on 192.168.85.133(192.168.85.133:)..
  183. Thu Mar :: - [debug] done.
  184. Thu Mar :: - [info] Executed CHANGE MASTER.
  185. Thu Mar :: - [debug] Starting slave IO/SQL thread on 192.168.85.133(192.168.85.133:)..
  186. Thu Mar :: - [debug] done.
  187. Thu Mar :: - [info] Slave started.
  188. Thu Mar :: - [info] gtid_wait(90b30799--11e7--000c29c1025c:-) completed on 192.168.85.133(192.168.85.133:). Executed events.
  189. Thu Mar :: - [info] End of log messages from 192.168.85.133.
  190. Thu Mar :: - [info] -- Slave on host 192.168.85.133(192.168.85.133:) started.
  191. Thu Mar :: - [info] All new slave servers recovered successfully.
  192. ==================== Slave恢复阶段,End ====================
  193. Thu Mar :: - [info]
  194. ==================== 、新Master清理阶段,Start ====================
  195. Thu Mar :: - [info] * Phase : New master cleanup phase..
  196. Thu Mar :: - [info]
  197. Thu Mar :: - [info] Resetting slave info on the new master..
  198. Thu Mar :: - [debug] Clearing slave info..
  199. Thu Mar :: - [debug] Stopping slave IO/SQL thread on 192.168.85.134(192.168.85.134:)..
  200. Thu Mar :: - [debug] done.
  201. Thu Mar :: - [debug] SHOW SLAVE STATUS shows new master does not replicate from anywhere. OK.
  202. Thu Mar :: - [info] 192.168.85.134: Resetting slave info succeeded.
  203. ==================== 、新Master清理阶段,End ====================
  204. Thu Mar :: - [info] Master failover to 192.168.85.134(192.168.85.134:) completed successfully.
  205. Thu Mar :: - [debug] Disconnected from 192.168.85.133(192.168.85.133:)
  206. Thu Mar :: - [debug] Disconnected from 192.168.85.134(192.168.85.134:)
  207. Thu Mar :: - [info]
  208.  
  209. ----- Failover Report -----
  210.  
  211. app1: MySQL Master failover 192.168.85.132(192.168.85.132:) to 192.168.85.134(192.168.85.134:) succeeded
  212.  
  213. Master 192.168.85.132(192.168.85.132:) is down!
  214.  
  215. Check MHA Manager logs at ZST1 for details.
  216.  
  217. Started manual(interactive) failover.
  218. Invalidated master IP address on 192.168.85.132(192.168.85.132:)
  219. Selected 192.168.85.134(192.168.85.134:) as a new master.
  220. 192.168.85.134(192.168.85.134:): OK: Applying all logs succeeded.
  221. 192.168.85.134(192.168.85.134:): OK: Activated master IP address.
  222. 192.168.85.133(192.168.85.133:): OK: Slave started, replicating from 192.168.85.134(192.168.85.134:)
  223. 192.168.85.134(192.168.85.134:): Resetting slave info succeeded.
  224. Master failover to 192.168.85.134(192.168.85.134:) completed successfully.
  225. [root@ZST1 masterha]#

手动Failover流程

  1. 手动Failover(GTID)
  2. 、配置检查:连接各实例,检查服务状态,检查主从关系
  3. 、故障Master关闭:停止各Slave上的IO Thread,故障Master虚拟IP摘除(stopssh)
  4. 、新Master恢复
  5. 3.1、获取最新的Slave
  6. 用于补全新Master缺少的数据;用于save故障Masterbinlog的起始点
  7. 3.2、选举新Master
  8. 确定新Master,得到切换前后结构
  9. 3.3、新Master恢复
  10. 3.3.、补全新Master与最新Slave差异
  11. 等待新Master应用完自己的relay-log;等待最新Slave应用完自己的relay-log;将新Master change到最新Slave,以补全差异数据
  12. 3.3.、补全新Master与故障Master差异
  13. 故障Master/BinlogServer上执行save_binary_logs;将得到的binlog scp到手动Failover运行的工作目录;新Master应用完binlog,得到当前位置;绑定虚拟IP,新Master可以对外提供服务
  14. 、其他Slave恢复
  15. 4.1、重置复制,RESET SLAVE;CHANGE MASTER TO New Master;
  16. 4.2、如果存在多个Slaves,重复上述操作
  17. 、新Master清理:清理旧的复制信息STOP SLAVE;RESET SLAVE ALL;

3.3、传统和GTID下手动Failover流程区别

为了得到详细的切换日志,建议
• MHA配置文件开启log_level=debug
• Node1、Node2、Node3节点模拟数据差异
• New Master分别选择Node2、Node3
手动Failover(GTID),建议打开general-log,以查看New Master与Latest Slave之间数据补全方式

  传统 GTID
是否补全数据 只要主节点服务器没挂,默认会将所有数据补全 需在配置文件将master/binlog server配置到[binlog*],才能补全Dead Master上的差异log,否则只应用到Latest Slave
补全数据的方式    新Master/其他Slave拉取Latest Slave的relay-log 新master拉取Latest Slave的binlog
所有的新Master/其他Slave生成与Latest Slave之间差异的relay-log,并应用这些relay-log(对应文件relay_from_read_to_latest_**) 新Master change to Latest Slave,以补全与Latest Slave之间的差异数据
新Master/其他Slave应用Latest Slave与Dead Master之间的差异binlog(对应文件saved_master_binlog_from_**) 新Master追平Latest Slave后,再通过save_binary_logs生成与Dead Master之间的差异binlog,并应用(对应文件saved_binlog_binlog1_**)
  其他Slave不需应用任何差异log,直接change master to new_master即可
生成的文件   relay_from_read_to_latest_**:最新Slave与其他Slave之间的差异relay-log,在最新Slave生成,然后拷贝到其他对应Slave saved_master_binlog_from_**:故障Master与最新Slave之间的差异binlog,在故障Master/BinlogServer生成,然后拷贝到手动Failover运行的工作目录
saved_master_binlog_from_**:故障Master与最新Slave之间的差异binlog,在故障Master生成,先拷贝到手动Failover运行的工作目录,然后拷贝到其他Slave  
文件可以使用mysqlbinlog解析~.~  文件不能使用mysqlbinlog解析(・ω・)也许是姿势不对~不过它们的命令确实稍有不同~~ 

GTID环境,只有在处理Dead Master数据时,才使用save_binary_logs的方式(主库挂掉,没法change),其他都是直接通过change master to利用复制线程补全数据。同时它也不再依赖Latest Slave的relay-log
总的来说GTID环境下MHA有点臃肿,有能力的可以自行写脚本处理:
确定Latest_Slave->New_Master:change master to Latest_Slave->mysqlbinlog ./binlogserver/binlog --start-positon>New_Master->Other_Slave change master to New_Master
如果使用增强半同步,基本能确保Dead_Master上的binlog全部传递到Latest_Slave,这种情况下进行故障切换更加简单(⊙_⊙)

MHA-手动Failover流程(传统复制&GTID复制)的更多相关文章

  1. MHA集群(gtid复制)和vip漂移

    在上一片博客中,讲述了怎么去配置MHA架构!这片博客不再细说,只说明其中MySQL主从搭建,这里使用的是gtid加上半同步复制! 步骤与上一片博客一样,不同之处在于MySQL主从的搭建!详细的gtid ...

  2. MySQL5.7不停业务将传统复制变更为GTID复制

      由于GTID的优势,我们需要将传统基于file-pos的复制更改为基于GTID的复制,如何在线变更成为我们关心的一个点,如下为具体的方法: 目前我们有一个传统复制下的M-S结构: port 330 ...

  3. MySQL的GTID复制与传统复制的相互转换

    主库:192.168.225.128:3307从库1:192.168.225.129:3307 Gtid作为5.6版本以来的杀手级特性,却因为不支持拓扑结构内开关而饱受诟病.如果你需要从未开启GTID ...

  4. MySQL的GTID复制与传统复制的相互切换

    MySQL的GTID复制与传统复制的相互转换 1. GTID复制转换成传统复制 1.1 环境准备 1.2 停止slave 1.3 查看当前主从状态 1.4 change master 1.5 启动主从 ...

  5. GTID复制模式切换与传统主从复制间切换

    GTID复制模式切换到传统主从复制主从复制环境:主库:10.18.10.11从库:10.18.10.12MySQL5.7.22 切换之前查看下主从gitd_mode参数值主服务器:gtid_mode值 ...

  6. Mysql基于GTID复制模式-运维小结 (完整篇)

    先来看mysql5.6主从同步操作时遇到的一个报错:mysql> change master to master_host='192.168.10.59',master_user='repli' ...

  7. MySQL高可用方案MHA自动Failover与手动Failover的实践及原理

    集群信息 角色                             IP地址                 ServerID      类型 Master                     ...

  8. 转 GTID复制的搭建和问题处理

    ########sample 1: 了解mysqldump 和 mysqlbackup  和 gtid_executed 和 gtid_purged https://www.linuxidc.com/ ...

  9. GTID复制的工作原理

    参考自:https://dev.mysql.com/doc/refman/5.7/en/replication-gtids-lifecycle.html 笔记说明: 本文翻译自官网,当然会根据语义做一 ...

随机推荐

  1. 在Python中调用C++模块

    一.一般调用流程 http://www.cnblogs.com/huangshujia/p/4394276.html 二.Python读取图像并传入C++函数,再从C++返回结果图像给Python h ...

  2. codeforces604B

    More Cowbell CodeForces - 604B Kevin Sun wants to move his precious collection of n cowbells from Na ...

  3. BZOJ3237 AHOI2013连通图(线段树分治+并查集)

    把查询看做是在一条时间轴上.那么每条边都有几段存在时间.于是线段树分治就好了. 然而在bzoj上t掉了,不知道是常数大了还是写挂了. 以及brk不知道是啥做数组名过不了编译. #include< ...

  4. Play on Words HDU - 1116(欧拉路判断 + 并查集)

    题意: 给出几个单词,求能否用所有的单词成语接龙 解析: 把每个单词的首字母和尾字母分别看作两个点u 和 v,输入每个单词后,u的出度++, v的入度++ 最后判断是否能组成欧拉路径 或 欧拉回路,当 ...

  5. MT【32】内外圆(Apollonius Circle)的几何证明

    另一方面,如果 M 满足(1)式,那么M必然在以PQ为直径的圆上.事实上当M为P或者Q时,这是显然的.当M异于P,Q时,由$\frac{|MB|}{|MC|}=\frac{|PB|}{|PC|}=\l ...

  6. Hdoj 1009.FatMouse' Trade 题解

    Problem Description FatMouse prepared M pounds of cat food, ready to trade with the cats guarding th ...

  7. [luogu4568][bzoj2763][JLOI2011]飞行路线

    题目描述 Alice和Bob现在要乘飞机旅行,他们选择了一家相对便宜的航空公司.该航空公司一共在n个城市设有业务,设这些城市分别标记为00到n-1,一共有m种航线,每种航线连接两个城市,并且航线有一定 ...

  8. 洛谷 P1053 音乐会的等待 解题报告

    P1823 音乐会的等待 题目描述 \(N\)个人正在排队进入一个音乐会.人们等得很无聊,于是他们开始转来转去,想在队伍里寻找自己的熟人.队列中任意两个人\(A\)和\(B\),如果他们是相邻或他们之 ...

  9. C# 类&结构体&枚举

    类: class Lei  //要和static void Main(string[] args)平级: { public int lei_int;  //public是关键字,代表访问权限,这里是公 ...

  10. javascript高级程序设计第二章知识点提炼

    这是我整理的javascript高级程序设计第二章的脑图,内容也是非常浅显与简单.希望您看了我的博客能够给我一些意见或者建议.