mysql-master-ha 实现mysql master的高可用。
常用的mysql 高可用有下面几种方案:
名称 | 原理 | 特点 |
mysqlmha | Perl脚本对mysql master做心跳,master down了以后,选举new master | ,是要改代理层的路由信息。 |
mmm | 多主复制管理器, 主备通过VIP共享ip , 监控节点做轮询定期检查master是否可用,不可用时 ,切换主备, | 轮询有时间间隔, 这段时间内如果master down , 则事务会大量失败, 不建议 |
heartbeat+ drbd | 主备通过VIP共享ip ,只有一个master提供服务,stand by master通过drbd进行块的复制,类似于raid1 | 不适合大集群 |
mysql cluster | ndb 存储引擎提供存储集群,mysqld 集群提供访问服务, | 用的公司少,没有大规模部署的经验,问题多。 |
percona galera cluster | galera 为innodb提供数据同步 ,所有mysql节点都是读写节点,原理有点类似于mysql cluster ,都是在存储集群上做文章。 |
理论上节点间的复制随着节点的增加会变慢。因为是1对 n-1的一个复制关系。 |
从上表可以看到,只剩下mysqlmha + percona galera cluster 作为我们的备选方案 ,后者我们单独找时间来测试galera cluster。
我们先测试mysqlmha
测试环境:
OS: centos 6.5
mysql: 5.6.27-log
mysql-master-ha: 0.56
和https://code.google.com/p/mysql-master-ha/wiki/TableOfContents?tm=6提供的文档一样,采用了如下mysql测试拓扑结构:
MySQL master(一台 )
mysql slave (2 台 )
mysql-master-ha-manager(一台 )
在所有机器上安装 mysql-master-ha node 节点,安装mysql-master-ha node manager; 安装方法参考这里: https://code.google.com/p/mysql-master-ha/wiki/Tutorial
配置好mysql master 到 MySQL slave 的主从复制关系以后,我们在manger节点启动 masterha_manager
masterha_manager --conf=./app1.conf
然后我们在master节点上killall -9 mysqld mysqld_safe把master杀掉, 观察manager节点上的日志输出 , 来理解mysql-master-ha 是如何做mysql master的高可用的:
[root@server9 mysql-master-ha-conf]# masterha_manager --conf=./app1.conf
Tue Nov 10 10:49:40 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Nov 10 10:49:40 2015 - [info] Reading application default configuration from ./app1.conf..
Tue Nov 10 10:49:40 2015 - [info] Reading server configuration from ./app1.conf..
Tue Nov 10 10:49:40 2015 - [info] MHA::MasterMonitor version 0.56.
Tue Nov 10 10:49:40 2015 - [info] GTID failover mode = 0
1\发现old master不可用。
Tue Nov 10 10:49:40 2015 - [info] Dead Servers:
Tue Nov 10 10:49:40 2015 - [info] server7(server7:3307)
Tue Nov 10 10:49:40 2015 - [info] Alive Servers:
Tue Nov 10 10:49:40 2015 - [info] server6(server6:3307)
Tue Nov 10 10:49:40 2015 - [info] server8(server8:3307)
Tue Nov 10 10:49:40 2015 - [info] Alive Slaves:
Tue Nov 10 10:49:40 2015 - [info] server6(server6:3307) Version=5.6.27-log (oldest major version between slaves) log-bin:enabled
Tue Nov 10 10:49:40 2015 - [info] Replicating from server7(server7:3307)
Tue Nov 10 10:49:40 2015 - [info] server8(server8:3307) Version=5.6.27-log (oldest major version between slaves) log-bin:enabled
Tue Nov 10 10:49:40 2015 - [info] Replicating from server7(server7:3307)
Tue Nov 10 10:49:40 2015 - [warning] MySQL master is not currently alive!
2\检查slave配置, 检查master 的ssh连通性,检查master binlog备份脚本的有效性。
检查slave 的ssh联通性: 进入slave检查relay-log 的apply_diff_relay_logs脚本:
Tue Nov 10 10:49:40 2015 - [info] Checking slave configurations..
Tue Nov 10 10:49:40 2015 - [info] read_only=1 is not set on slave server6(server6:3307).
Tue Nov 10 10:49:40 2015 - [warning] relay_log_purge=0 is not set on slave server6(server6:3307).
Tue Nov 10 10:49:40 2015 - [info] read_only=1 is not set on slave server8(server8:3307).
Tue Nov 10 10:49:40 2015 - [info] Checking replication filtering settings..
Tue Nov 10 10:49:40 2015 - [info] Replication filtering check ok.
Tue Nov 10 10:49:40 2015 - [info] GTID (with auto-pos) is not supported
Tue Nov 10 10:49:40 2015 - [info] Starting SSH connection tests..
Tue Nov 10 10:49:41 2015 - [info] All SSH connection tests passed successfully.
Tue Nov 10 10:49:41 2015 - [info] Checking MHA Node version..
Tue Nov 10 10:49:42 2015 - [info] Version check ok.
Tue Nov 10 10:49:42 2015 - [info] Getting current master (maybe dead) info ..
Tue Nov 10 10:49:42 2015 - [info] Identified master is server7(server7:3307).
Tue Nov 10 10:49:42 2015 - [info] Checking SSH publickey authentication settings on the current master..
Tue Nov 10 10:49:42 2015 - [info] HealthCheck: SSH to server7 is reachable.
Tue Nov 10 10:49:43 2015 - [info] Master MHA Node version is 0.56.
Tue Nov 10 10:49:43 2015 - [info] Checking recovery script configurations on server7(server7:3307)..
Tue Nov 10 10:49:43 2015 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --start_file=master-bin.000004
Tue Nov 10 10:49:43 2015 - [info] Connecting to root@server7(server7:22)..
Creating /var/log/masterha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql, up to master-bin.000004
Tue Nov 10 10:49:43 2015 - [info] Binlog setting check done.
Tue Nov 10 10:49:43 2015 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Tue Nov 10 10:49:43 2015 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=server6 --slave_ip=server6 --slave_port=3307 --workdir=/var/log/masterha/app1 --target_version=5.6.27-log --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Tue Nov 10 10:49:43 2015 - [info] Connecting to root@server6(server6:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to mysqld-relay-bin.000007
Temporary relay log file is /var/lib/mysql/mysqld-relay-bin.000007
Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Tue Nov 10 10:49:43 2015 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=server8 --slave_ip=server8 --slave_port=3307 --workdir=/var/log/masterha/app1 --target_version=5.6.27-log --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Tue Nov 10 10:49:43 2015 - [info] Connecting to root@server8(server8:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to slave-relay-bin.000004
Temporary relay log file is /var/lib/mysql/slave-relay-bin.000004
Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Tue Nov 10 10:49:44 2015 - [info] Slaves settings check done.
Tue Nov 10 10:49:44 2015 - [info]
server7(server7:3307) (current master)
+--server6(server6:3307)
+--server8(server8:3307)
3、确认 old master的mysql不可到达 (ping 3307端口不可用, 建议设置二级检查, 从
多个路由确认old master不可到达。), 连续多次ping old master的3307端口不可到达以后,
准备启动master failover;
----
Tue Nov 10 10:49:44 2015 - [warning] master_ip_failover_script is not defined.
Tue Nov 10 10:49:44 2015 - [warning] shutdown_script is not defined.
Tue Nov 10 10:49:44 2015 - [error][/usr/local/share/perl5/MHA/Server.pm, ln457] Checking slave status failed on server6(server6:3307). err=Got error when executing SHOW SLAVE STATUS. MySQL server has gone away
Tue Nov 10 10:49:44 2015 - [info] Set master ping interval 3 seconds.
Tue Nov 10 10:49:44 2015 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Tue Nov 10 10:49:44 2015 - [info] Starting ping health check on server7(server7:3307)..
Tue Nov 10 10:49:44 2015 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Tue Nov 10 10:49:44 2015 - [warning] Connection failed 1 time(s)..
Tue Nov 10 10:49:44 2015 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/var/log/masterha/app1/save_binary_logs_test --manager_version=0.56 --binlog_prefix=master-bin
Creating /var/log/masterha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql, up to master-bin.000004
Tue Nov 10 10:49:44 2015 - [info] HealthCheck: SSH to server7 is reachable.
Tue Nov 10 10:49:47 2015 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Tue Nov 10 10:49:47 2015 - [warning] Connection failed 2 time(s)..
Tue Nov 10 10:49:50 2015 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Tue Nov 10 10:49:50 2015 - [warning] Connection failed 3 time(s)..
Tue Nov 10 10:49:53 2015 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Tue Nov 10 10:49:53 2015 - [warning] Connection failed 4 time(s)..
Tue Nov 10 10:49:53 2015 - [warning] Master is not reachable from health checker!
Tue Nov 10 10:49:53 2015 - [warning] Master server7(server7:3307) is not reachable!
Tue Nov 10 10:49:53 2015 - [warning] SSH is reachable.
Tue Nov 10 10:49:53 2015 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and ./app1.conf again, and trying to connect to all servers to check server status..
Tue Nov 10 10:49:53 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Nov 10 10:49:53 2015 - [info] Reading application default configuration from ./app1.conf..
Tue Nov 10 10:49:53 2015 - [info] Reading server configuration from ./app1.conf..
Tue Nov 10 10:49:53 2015 - [info] GTID failover mode = 0
Tue Nov 10 10:49:53 2015 - [info] Dead Servers:
Tue Nov 10 10:49:53 2015 - [info] server7(server7:3307)
Tue Nov 10 10:49:53 2015 - [info] Alive Servers:
Tue Nov 10 10:49:53 2015 - [info] server6(server6:3307)
Tue Nov 10 10:49:53 2015 - [info] server8(server8:3307)
Tue Nov 10 10:49:53 2015 - [info] Alive Slaves:
Tue Nov 10 10:49:53 2015 - [info] server6(server6:3307) Version=5.6.27-log (oldest major version between slaves) log-bin:enabled
Tue Nov 10 10:49:53 2015 - [info] Replicating from server7(server7:3307)
Tue Nov 10 10:49:53 2015 - [info] server8(server8:3307) Version=5.6.27-log (oldest major version between slaves) log-bin:enabled
Tue Nov 10 10:49:53 2015 - [info] Replicating from server7(server7:3307)
Tue Nov 10 10:49:53 2015 - [info] Checking slave configurations..
Tue Nov 10 10:49:53 2015 - [info] read_only=1 is not set on slave server6(server6:3307).
Tue Nov 10 10:49:53 2015 - [warning] relay_log_purge=0 is not set on slave server6(server6:3307).
Tue Nov 10 10:49:53 2015 - [info] read_only=1 is not set on slave server8(server8:3307).
Tue Nov 10 10:49:53 2015 - [info] Checking replication filtering settings..
Tue Nov 10 10:49:53 2015 - [info] Replication filtering check ok.
Tue Nov 10 10:49:53 2015 - [info] Master is down!
Tue Nov 10 10:49:53 2015 - [info] Terminating monitoring script.
Tue Nov 10 10:49:53 2015 - [info] Got exit code 20 (Master dead).
Tue Nov 10 10:49:53 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Nov 10 10:49:53 2015 - [info] Reading application default configuration from ./app1.conf..
Tue Nov 10 10:49:53 2015 - [info] Reading server configuration from ./app1.conf..
Tue Nov 10 10:49:53 2015 - [info] MHA::MasterFailover version 0.56.
4、启动master failover :
1) 阶段1 : 配置检查:
Tue Nov 10 10:49:53 2015 - [info] Starting master failover.
Tue Nov 10 10:49:53 2015 - [info]
Tue Nov 10 10:49:53 2015 - [info] * Phase 1: Configuration Check Phase..
Tue Nov 10 10:49:53 2015 - [info]
Tue Nov 10 10:49:53 2015 - [info] GTID failover mode = 0
Tue Nov 10 10:49:53 2015 - [info] Dead Servers:
Tue Nov 10 10:49:53 2015 - [info] server7(server7:3307)
Tue Nov 10 10:49:53 2015 - [info] Checking master reachability via MySQL(double check)...
Tue Nov 10 10:49:53 2015 - [info] ok.
Tue Nov 10 10:49:53 2015 - [info] Alive Servers:
Tue Nov 10 10:49:53 2015 - [info] server6(server6:3307)
Tue Nov 10 10:49:53 2015 - [info] server8(server8:3307)
Tue Nov 10 10:49:53 2015 - [info] Alive Slaves:
Tue Nov 10 10:49:53 2015 - [info] server6(server6:3307) Version=5.6.27-log (oldest major version between slaves) log-bin:enabled
Tue Nov 10 10:49:53 2015 - [info] Replicating from server7(server7:3307)
Tue Nov 10 10:49:53 2015 - [info] server8(server8:3307) Version=5.6.27-log (oldest major version between slaves) log-bin:enabled
Tue Nov 10 10:49:53 2015 - [info] Replicating from server7(server7:3307)
Tue Nov 10 10:49:53 2015 - [info] Starting Non-GTID based failover.
Tue Nov 10 10:49:53 2015 - [info]
Tue Nov 10 10:49:53 2015 - [info] ** Phase 1: Configuration Check Phase completed.
Tue Nov 10 10:49:53 2015 - [info]
2) 阶段2: 关闭dead master(master_ip_failover_script脚本没有做 , 忽略对deadmaster
的ip 失效,。);
Tue Nov 10 10:49:53 2015 - [info] * Phase 2: Dead Master Shutdown Phase..
Tue Nov 10 10:49:53 2015 - [info]
Tue Nov 10 10:49:53 2015 - [info] Forcing shutdown so that applications never connect to the current master..
Tue Nov 10 10:49:53 2015 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master IP address.
Tue Nov 10 10:49:53 2015 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Tue Nov 10 10:49:53 2015 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Tue Nov 10 10:49:53 2015 - [info]
3) 阶段3 : master恢复阶段:
A:找lastest slave :
show slave status: 得到的Relay_Master_Log_File + Read_Master_Log_Pos 最大的就可以 。
因为server6和server8 的读取master位点一样,所以都是oldest和lastest ;
Tue Nov 10 10:49:53 2015 - [info] * Phase 3: Master Recovery Phase..
Tue Nov 10 10:49:53 2015 - [info]
Tue Nov 10 10:49:53 2015 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Tue Nov 10 10:49:53 2015 - [info]
Tue Nov 10 10:49:53 2015 - [info] The latest binary log file/position on all slaves is master-bin.000004:120
Tue Nov 10 10:49:53 2015 - [info] Latest slaves (Slaves that received relay log files to the latest):
Tue Nov 10 10:49:53 2015 - [info] server6(server6:3307) Version=5.6.27-log (oldest major version between slaves) log-bin:enabled
Tue Nov 10 10:49:53 2015 - [info] Replicating from server7(server7:3307)
Tue Nov 10 10:49:53 2015 - [info] server8(server8:3307) Version=5.6.27-log (oldest major version between slaves) log-bin:enabled
Tue Nov 10 10:49:53 2015 - [info] Replicating from server7(server7:3307)
Tue Nov 10 10:49:53 2015 - [info] The oldest binary log file/position on all slaves is master-bin.000004:120
Tue Nov 10 10:49:53 2015 - [info] Oldest slaves:
Tue Nov 10 10:49:53 2015 - [info] server6(server6:3307) Version=5.6.27-log (oldest major version between slaves) log-bin:enabled
Tue Nov 10 10:49:53 2015 - [info] Replicating from server7(server7:3307)
Tue Nov 10 10:49:53 2015 - [info] server8(server8:3307) Version=5.6.27-log (oldest major version between slaves) log-bin:enabled
Tue Nov 10 10:49:53 2015 - [info] Replicating from server7(server7:3307)
Tue Nov 10 10:49:53 2015 - [info]
B:保存dead master binlog:
从lastest slave读到的master binlog 的位点开始截取dead master binlog ,加上binlog的文件描述信息。
Tue Nov 10 10:49:53 2015 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Tue Nov 10 10:49:53 2015 - [info]
Tue Nov 10 10:49:53 2015 - [info] Fetching dead master's binary logs..
Tue Nov 10 10:49:53 2015 - [info] Executing command on the dead master server7(server7:3307): save_binary_logs --command=save --start_file=master-bin.000004 --start_pos=120 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/var/log/masterha/app1/saved_master_binlog_from_server7_3307_20151110104953.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56
Creating /var/log/masterha/app1 if not exists.. ok.
Concat binary/relay logs from master-bin.000004 pos 120 to master-bin.000004 EOF into /var/log/masterha/app1/saved_master_binlog_from_server7_3307_20151110104953.binlog ..
Binlog Checksum enabled
Dumping binlog format description event, from position 0 to 120.. ok.
No need to dump effective binlog data from /var/lib/mysql/master-bin.000004 (pos starts 120, filesize 120). Skipping.
Binlog Checksum enabled
/var/log/masterha/app1/saved_master_binlog_from_server7_3307_20151110104953.binlog has no effective data events.
Event not exists.
Tue Nov 10 10:49:54 2015 - [info] Additional events were not found from the orig master. No need to save.
C: 选举新的master, 因为所有slave接收到的master位点信息是一样的,所以他们不用再做同步了。
随机找了一台server6 promote成为new master.
Tue Nov 10 10:49:54 2015 - [info]
Tue Nov 10 10:49:54 2015 - [info] * Phase 3.3: Determining New Master Phase..
Tue Nov 10 10:49:54 2015 - [info]
Tue Nov 10 10:49:54 2015 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Tue Nov 10 10:49:54 2015 - [info] All slaves received relay logs to the same position. No need to resync each other.
Tue Nov 10 10:49:54 2015 - [info] Searching new master from slaves..
Tue Nov 10 10:49:54 2015 - [info] Candidate masters from the configuration file:
Tue Nov 10 10:49:54 2015 - [info] Non-candidate masters:
Tue Nov 10 10:49:54 2015 - [info] New master is server6(server6:3307)
Tue Nov 10 10:49:54 2015 - [info] Starting master failover..
Tue Nov 10 10:49:54 2015 - [info]
From:
server7(server7:3307) (current master)
+--server6(server6:3307)
+--server8(server8:3307)
To:
server6(server6:3307) (new master)
+--server8(server8:3307)
D: 新的master的差异日志生成阶段
new master(server6)已经包含了所有 relay log (就是说dead master 的binlog和new master的 relay log没有差异了--即所谓的deadmaster binlog 和new master relaylog 的差异)
Tue Nov 10 10:49:54 2015 - [info]
Tue Nov 10 10:49:54 2015 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Tue Nov 10 10:49:54 2015 - [info]
Tue Nov 10 10:49:54 2015 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Tue Nov 10 10:49:54 2015 - [info]
E: 新的master log应用阶段,即把 new master的relay log和dead master的binlog差异在new master做一次重放。
然后获取到new master的binlog 文件和位点信息(master-bin.000003, pos=120)至此 ,dead master和new master已经同步了。
Tue Nov 10 10:49:54 2015 - [info] * Phase 3.4: Master Log Apply Phase..
Tue Nov 10 10:49:54 2015 - [info]
Tue Nov 10 10:49:54 2015 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Tue Nov 10 10:49:54 2015 - [info] Starting recovery on server6(server6:3307)..
Tue Nov 10 10:49:54 2015 - [info] This server has all relay logs. Waiting all logs to be applied..
Tue Nov 10 10:49:54 2015 - [info] done.
Tue Nov 10 10:49:54 2015 - [info] All relay logs were successfully applied.
Tue Nov 10 10:49:54 2015 - [info] Getting new master's binlog name and position..
Tue Nov 10 10:49:54 2015 - [info] master-bin.000003:120
Tue Nov 10 10:49:54 2015 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='server6 or server6', MASTER_PORT=3307, MASTER_LOG_FILE='master-bin.000003', MASTER_LOG_POS=120, MASTER_USER='repl_user', MASTER_PASSWORD='xxx';
Tue Nov 10 10:49:54 2015 - [warning] master_ip_failover_script is not set. Skipping taking over new master IP address.
Tue Nov 10 10:49:54 2015 - [info] ** Finished master recovery successfully.
Tue Nov 10 10:49:54 2015 - [info] * Phase 3: Master Recovery Phase completed.
Tue Nov 10 10:49:54 2015 - [info]
阶段4: 并行操作,对每一个slave .比较它和new master的relay log的位点差异,
然后把这个差异在slave 补全, 最后做change master to new master;
启动slave;
Tue Nov 10 10:49:54 2015 - [info] * Phase 4: Slaves Recovery Phase..
Tue Nov 10 10:49:54 2015 - [info]
Tue Nov 10 10:49:54 2015 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Tue Nov 10 10:49:54 2015 - [info]
Tue Nov 10 10:49:54 2015 - [info] -- Slave diff file generation on host server8(server8:3307) started, pid: 34574. Check tmp log /var/log/masterha/app1/server8_3307_20151110104953.log if it takes time..
Tue Nov 10 10:49:54 2015 - [info]
Tue Nov 10 10:49:54 2015 - [info] Log messages from server8 ...
Tue Nov 10 10:49:54 2015 - [info]
Tue Nov 10 10:49:54 2015 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Tue Nov 10 10:49:54 2015 - [info] End of log messages from server8.
Tue Nov 10 10:49:54 2015 - [info] -- server8(server8:3307) has the latest relay log events.
Tue Nov 10 10:49:54 2015 - [info] Generating relay diff files from the latest slave succeeded.
Tue Nov 10 10:49:54 2015 - [info]
Tue Nov 10 10:49:54 2015 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Tue Nov 10 10:49:54 2015 - [info]
Tue Nov 10 10:49:54 2015 - [info] -- Slave recovery on host server8(server8:3307) started, pid: 34576. Check tmp log /var/log/masterha/app1/server8_3307_20151110104953.log if it takes time..
Tue Nov 10 10:49:54 2015 - [info]
Tue Nov 10 10:49:54 2015 - [info] Log messages from server8 ...
Tue Nov 10 10:49:54 2015 - [info]
Tue Nov 10 10:49:54 2015 - [info] Starting recovery on server8(server8:3307)..
Tue Nov 10 10:49:54 2015 - [info] This server has all relay logs. Waiting all logs to be applied..
Tue Nov 10 10:49:54 2015 - [info] done.
Tue Nov 10 10:49:54 2015 - [info] All relay logs were successfully applied.
Tue Nov 10 10:49:54 2015 - [info] Resetting slave server8(server8:3307) and starting replication from the new master server6(server6:3307)..
Tue Nov 10 10:49:54 2015 - [info] Executed CHANGE MASTER.
Tue Nov 10 10:49:54 2015 - [info] Slave started.
Tue Nov 10 10:49:54 2015 - [info] End of log messages from server8.
Tue Nov 10 10:49:54 2015 - [info] -- Slave recovery on host server8(server8:3307) succeeded.
Tue Nov 10 10:49:54 2015 - [info] All new slave servers recovered successfully.
Tue Nov 10 10:49:54 2015 - [info]
阶段5: : 重置new master上的slave info ., 启动new master;
Tue Nov 10 10:49:54 2015 - [info] * Phase 5: New master cleanup phase..
Tue Nov 10 10:49:54 2015 - [info]
Tue Nov 10 10:49:54 2015 - [info] Resetting slave info on the new master..
Tue Nov 10 10:49:54 2015 - [info] server6: Resetting slave info succeeded.
Tue Nov 10 10:49:54 2015 - [info] Master failover to server6(server6:3307) completed successfully.
Tue Nov 10 10:49:54 2015 - [info]
阶段6: 故障转移报告:
----- Failover Report -----
app1: MySQL Master failover server7(server7:3307) to server6(server6:3307) succeeded
Master server7(server7:3307) is down!
Check MHA Manager logs at server9 for details.
Started automated(non-interactive) failover.
The latest slave server6(server6:3307) has all relay logs for recovery.
Selected server6(server6:3307) as a new master.
server6(server6:3307): OK: Applying all logs succeeded.
server8(server8:3307): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
server8(server8:3307): OK: Applying all logs succeeded. Slave started, replicating from server6(server6:3307)
server6(server6:3307): Resetting slave info succeeded.
Master failover to server6(server6:3307) completed successfully.
简而言之 ,mysql-master-ha的做法是:
1、检测master 不可用。(ping 端口+ 用户自己扩展脚本确认的方式)
2、从多个slaves中选择relay log同步到最大master binlog 的做为new master , 即找到latest slave;
3、从dead master 找latest slave和它的master binlog 日志差异,然后在new master relay log 重放。 ----至此, new master和dead master一致。
4、对其他slave ,找它们和new master relay log的差异, 并重放 ,然后change master to new master -至此,new master和slaves一致。
至此, 完成mysql master的故障转移。
但是mysql-master-ha的前提是old master物理机器的ssh连通性是可用的, 即物理节点本身可用。
而阿里的mysql ha 采用的做法如下:
1、 zk 节点探测mysql 可用性 。
2、如果dead master可以启动, 则重新启动, 不做master切换。-- 最理想的方式 。
3、如果dead master不可启动了,则slaves利用zk的抢锁机制(每一个mysql slave的zk agent 在/lock目录下创建临时顺序zk 节点,并判断自己是否是/lock下的最小节点,如果是, 则抢锁成功,如果否 ,则监控/lock子节点,等待抢锁) 成为new master ,然后dead master利用binlog回滚(对dead master和new master的日志差异做逆向重放) 和new master保持一致 。
4、如果在slave 的relay log 的exec 和read 位点有差异时, 处理方案: reset slave & change master to new master .
但是ppt有几点疑问 ,第3步可以保证抢到zk锁的是 latest slave么 ???????第4步slave直接做reset slave 把relay log清理掉,重新和 new master做同步,效率是否会太低了????? 另外,从ppt来看,这种抢zk锁的操作 , 从逻辑上来看似乎也没有问题,通过dead master的binlog回滚,保持old 和new master的一致 ,然后其他slaves和new master保持一致即可, 但是理解起来有点疑惑。
腾讯的做法:
1、利用半同步保证master的binlog 同步到slave并写入relay log以后, master 才返回事务commit给client .
2、也是zk agent监控master 可用性,当master不可用时 ,数据库proxy中的路由信息改为master 不可用。
3、 slave上报relay log中读到的最新master binlog ,scheduler从中选择一个latest slave(读到old master binlog 最大位点的)作为new master, 并要求new master
重放完relay log (表示存储引擎中已经有relay log所表示的数据库变更了 )
4、修改其他slave的master变成new master;
值得注意的是, 腾讯的TDSQL 文档中没有提到new master 和old master的日志差异,以及new master和其他slave的relay log 日志差异问题,
因为前者通过半同步得到了解决,后者通过其他slave的change master to new master以后,进行位点追赶。比如 new master已经到
master-bin.00002
135
而slave才到master-bin.00001的120 ,则直接change to new master master-bin.00001 ,120 进行位点追赶。
5、 修改new master的slave info .
6、修改代理的路由信息,master完成failover . 。
这里, 腾讯TDSQL 的做法理解起来更直观一些。
总体比较来看,在old master 和new master的binlog同步上,都是一致的(即要保证old master和new master 的完全一致 ,mysql-master-ha用了perl脚本批量做new master的位点追赶, 腾讯用了半同步,阿里用了old master的日志回滚) 。而在new master和其他slave的slave log同步上, mysql-master-ha要求slave 和new master完全一致,腾讯和阿里则没有要求(mysql-master-ha强调数据的一致性,阿里和腾讯强调的是系统的可用性, 因为本身这种master-slave的架构在使用上就要注意主从同步的延迟性问题)。
总体个人觉得,腾讯的TDSQL设计上更简明些, 而mysql-master-ha在做完master failover以后,需要再通知数据库代理层修改主备路由信息。这个是后话 (要自己实现TDSQL的功能 ) 。
mysql-master-ha 实现mysql master的高可用。的更多相关文章
- Mysql双主热备+LVS+Keepalived高可用操作记录
MySQL复制能够保证数据的冗余的同时可以做读写分离来分担系统压力,如果是主主复制还可以很好的避免主节点的单点故障.然而MySQL主主复制存在一些问题无法满足我们的实际需要:未提供统一访问入口来实现负 ...
- 使用Mycat构建MySQL读写分离、主从复制、主从高可用
数据库读写分离对于大型系统或者访问量很高的互联网应用来说,是必不可少的一个重要功能. 从数据库的角度来说,对于大多数应用来说,从集中到分布,最基本的一个需求不是数据存储的瓶颈,而是在于计算的瓶颈,即S ...
- Mysql双主热备+LVS+Keepalived高可用部署实施手册
MySQL复制能够保证数据的冗余的同时可以做读写分离来分担系统压力,如果是主主复制还可以很好的避免主节点的单点故障.然而MySQL主主复制存在一些问题无法满足我们的实际需要:未提供统一访问入口来实现负 ...
- MySQL入门篇(五)之高可用架构MHA
一.MHA原理 1.简介: MHA(Master High Availability)目前在MySQL高可用方面是一个相对成熟的解决方案,它由日本DeNA公司youshimaton(现就职于Faceb ...
- mysql基于Altas读写分离并实现高可用
实验环境准备: master:192.168.200.111 slave1:192.168.200.112 slave2:192.168.200.113 Altas:192.168.200.114 c ...
- MySQL主从复制之Mycat简单配置和高可用
什么是Mycat 1.Mycat就是MySQL Server,而Mycat后面连接的MySQL Server,就好象是MySQL的存储引擎,如InnoDB,MyISAM等.因此,Mycat本身并不存储 ...
- MySQL进阶:主主复制+Keepalived高可用
Blog:博客园 个人 概述 mysql主主复制 所谓主主复制,即双主备份,或者叫互作主从复制,每台master既是master,又是slave.这种方案,既做到了访问量的压力分流,同时也解决了单点故 ...
- MySQL - 高可用性:少宕机即高可用?
我们之前了解了复制.扩展性,接下来就让我们来了解可用性.归根到底,高可用性就意味着 "更少的宕机时间". 老规矩,讨论一个名词,首先要给它下个定义,那么什么是可用性? 1 什么是可 ...
- 高可用群集HA介绍与LVS+keepalived高可用群集
一.Keepalived介绍 通常使用keepalived技术配合LVS对director和存储进行双机热备,防止单点故障,keepalived专为LVS和HA设计的一款健康检查工具,但演变为后来不仅 ...
- 常用数据库高可用和分区解决方案(1) — MySQL篇
在本文中我们将会讨论MySQL.Oracle.MongoDB.Redis以及Oceanbase数据库,大家可能会奇怪为什么看不到有名关系型数据库MSSQL.DB2或者有名NoSQL数据库Hbase.L ...
随机推荐
- Gitlab+Jenkins学习之路(八)之发布maven项目及按版本发布
一.什么是Maven maven是一个项目管理和综合工具.Maven提供给开发人员构建一个完整的生命周期框架. 开发团队可以自动完成该项目的基础设施建设,Maven使用标准的目录结构和默认构建生命周期 ...
- java 从字符串中 以单个或多个空格进行分隔 提取字符串
String str = "test test1 test2 test3"; String [] arr = str.split("\\s+"); for(St ...
- codeblocks一些学习
codeblocks下,怎样建立工程,进行多文件编译?如下是书上的两个文件. https://ask.csdn.net/questions/204326 codeblocks创建静态库并使用 http ...
- react-native初体验(2) — 认识路由
如果学习止步于 hello world, 那么人生也太没意思了.这次要做一个看起来真实的应用.多添加几个页面,让他们可以交互,动起来. react-native 官方推荐使用 react-naviga ...
- MOD 模除运算符
用于奇数和偶数的校验,星期几的计算,以及其它专门的计算.
- PHP的学习路线规划
第一阶段:WEB的快速入门 前期入门学习我们需要学一些HTML+CSS+JS前端的一些技术,这个阶段不需要太深入的学习,学习到可以制作出一个像样点的静态页面就可以了.因为大家是学习PHP,对于新人来说 ...
- 快手hr面
快手hr面 20180918 自我介绍 hr部门介绍 效率工程 主要问题 问我对部门是否有感兴趣? 我要求地点在北京,然后就畅聊口音.老家,学校等 学校的成绩?(研究生.本科) 自己属于哪类学生?(属 ...
- 关于jsp中引用css外部样式无效时的处理方法
今天做项目遇到的一个小问题,如下所示: <link href="./bootstrap/css/bootstrap.min.css" rel="stylesheet ...
- centos7.6 安装nginx-1.14.2
一.安装所需依赖环境 yum -y install gcc-c++ pcre pcre-devel zlib zlib-devel openssl openssl-devel 二.下载nginx官方源 ...
- GlusterFS分布式存储集群-2. 使用
参考文档: Quick Start Guide:http://gluster.readthedocs.io/en/latest/Quick-Start-Guide/Quickstart/ Instal ...