一、简介
  MHA 是一套优秀的作为MySQL高可用性环境下故障切换和主从提升的高可用软件。在MySQL故障切换过程中,MHA能做到在0~30秒之内自动完成数据库的故障切换操作,并且在进行故障切换的过程中,MHA能在最大程度上保证数据的一致性,以达到真正意义上的高可用。

  该软件由两部分组成:MHA Manager(管理节点)  和  MHA Node(数据节点):

  MHA Manager可以单独部署在一台独立机器上管理多个master-slave集群,也可以部署在一台slave上.MHA Manager探测集群的node节点,当发现master出现故障的时候,它可以自动将具有最新数据的slave提升为新的master,然后将所有其它的slave导向新的master上.整个故障转移过程对应用程序是透明的。

   MHA node运行在每台MySQL服务器上(master/slave/manager),它通过监控具备解析和清理logs功能的脚本来加快故障转移的。

  目前MHA主要支持一主多从的架构,要搭建MHA,要求一个复制集群中必须最少有三台数据库服务器,一主二从,即一台充当master,一台充当备用master,另外一台充当从库

  MHA 0.56 开始,就可以支持GTID了,GTID 天生就是为了HA而产生的。 MHA + GTID 可以非常好的解决HA的问题,GTID支持多层级的复制故障转移,MHA不行

温馨提示:

一定要关闭防火墙或者开放相关策略        -  -!

更新系统时间(不是必须):
查看是否是CST上海时区,若不是执行更改: ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
同步网络时间:ntpdate -u asia.pool.ntp.org

注意:在做MHA之前一定要确保主从的数据是一致的,就是两边的执行环境一致。否则故障切换时slave无法自动指向新master!这时就需要手动清除指向老master的记录了。所以为了避免这个问题的出现,一定要确保slave与master的binlog日志执行环境时一致的。

MHA环境
环境 版本
CentOS  7  
MHA 0.57
MHA结构
Master 50.116 写入
Candicate master 50.115
slave_Manager 50.28

警告:搭建过程均参考这两位大神的博客:(有些日志的输出信息,懒得重现了,就直接copy的他们的。)

http://www.cnblogs.com/gomysql/p/3675429.html

http://www.cnblogs.com/xuanzhi201111/p/4231412.html?spm=5176.100239.blogcont52048.7.HR7na7

二 、安装MHA

1.)安装MHA node节点所依赖的perl模块(DBD:mysql):

  rpm -ivh http://dl.fedoraproject.org/pub/epel/7Server/x86_64/e/epel-release-7-10.noarch.rpm
  yum install perl-DBD-MySQL -y

2.)安装MHA node所需依赖:(所有节点)

  yum install -y perl-devel
  yum install -y perl-CPAN

3.)由谷歌维护的MHA 页面自12年就不在更新了,想要最新版本的MHA需要去自己找资源。(上方有给出)

在安装manager之前先安装这些依赖:

yum install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes  -y

下载Node与Manager 软件包,解压后进入目录中执行如下操作安装:

  perl Makefile.PL
  make && make install

Node安装完成后会在/usr/local/bin/下生成这些文件:

-r-xr-xr-x   root root     1月   : apply_diff_relay_logs
-r-xr-xr-x root root 1月 : filter_mysqlbinlog
-r-xr-xr-x root root 1月 : purge_relay_logs
-r-xr-xr-x root root 1月 : save_binary_logs Node脚本说明:(这些工具通常由MHA Manager的脚本触发,无需人为操作)
save_binary_logs //保存和复制master的二进制日志
apply_diff_relay_logs //识别差异的中继日志事件并将其差异的事件应用于其他的slave
filter_mysqlbinlog //去除不必要的ROLLBACK事件(MHA已不再使用这个工具)
purge_relay_logs //清除中继日志(不会阻塞SQL线程)

Manager安装完成后会在/usr/local/bin 下生成如下文件:

-r-xr-xr-x.  root root     1月   : apply_diff_relay_logs
-r-xr-xr-x. root root 1月 : filter_mysqlbinlog
-r-xr-xr-x. root root 1月 : masterha_check_repl
-r-xr-xr-x. root root 1月 : masterha_check_ssh
-r-xr-xr-x. root root 1月 : masterha_check_status
-r-xr-xr-x. root root 1月 : masterha_conf_host
-r-xr-xr-x. root root 1月 : masterha_manager
-r-xr-xr-x. root root 1月 : masterha_master_monitor
-r-xr-xr-x. root root 1月 : masterha_master_switch
-r-xr-xr-x. root root 1月 : masterha_secondary_check
-r-xr-xr-x. root root 1月 : masterha_stop
-r-xr-xr-x. root root 1月 : purge_relay_logs
-r-xr-xr-x. root root 1月 : save_binary_logd

复制mha4mysql-manager-0.53/samples/scripts/目录下的脚本到/usr/local/bin目录:

-rwxr-xr-x.  root root   1月     master_ip_failover  //自动切换时vip管理的脚本,不是必须,如果我们使用keepalived的,我们可以自己编写脚本完成对vip的管理,比如监控mysql,如果mysql异常,我们停止keepalived就行,这样vip就会自动漂移
-rwxr-xr-x. root root 1月 master_ip_online_change //在线切换时vip的管理,不是必须,同样可以可以自行编写简单的shell完成
-rwxr-xr-x. root root 1月 power_manager //故障发生后关闭主机的脚本,不是必须
-rwxr-xr-x. root root 1月 send_report //因故障切换后发送报警的脚本,不是必须,可自行编写简单的shell完成

三、配置MHA

1.)配置SSH登录无密码验证(使用key登录,工作中常用,最好不要禁掉密码登录,如果禁了,可能会有问题)

manager_slave:(当管理节点在本地时需要ssh免秘钥到本地)
ssh-keygen -t rsa
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.50.115
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.50.116
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.50.28

slave: 
ssh-keygen -t rsa
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.50.116
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.50.28

master:
ssh-keygen -t rsa
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.50.115
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.50.28

2.)创建MHA的工作目录   mkdir -p /etc/masterha 、并且创建相关配置文件:  cp  mha4mysql-manager-0.56/samples/conf/app1.cnf   /etc/masterha/

[server default]
manager_workdir=/var/log/masterha/app1
manager_log=/var/log/masterha/app1/manager.log
master_binlog_dir=/opt/mysql/log
master_ip_failover_script= /usr/local/bin/master_ip_failover
master_ip_online_change_script= /usr/local/bin/master_ip_online_change
password=
user=root
ping_interval=
remote_workdir=/tmp
repl_password=
repl_user=root
report_script=/usr/local/bin/send_report
secondary_check_script= /usr/local/bin/masterha_secondary_check -s server03 -s server02
shutdown_script=""
ssh_user=root [server1]
hostname=192.168.50.115
port= [server2]
hostname=192.168.50.116
candidate_master=
check_repl_delay= [server3]
hostname=192.168.50.28
port=

3.)设置relay log的清除方式(在每个slave节点上)

在50.28与50.116 执行:mysql -uroot -p123456 -e "set global relay_log_purge=0"

注意:

MHA在发生切换的过程中,从库的恢复过程中依赖于relay log的相关信息,所以这里要将relay log的自动清除设置为OFF,采用手动清除relay log的方式。在默认情况下,从服务器上的中继日志会在SQL线程执行完毕后被自动删除。但是在MHA环境中,这些中继日志在恢复其他从服务器时可能会被用到,因此需要禁用中继日志的自动删除功能。定期清除中继日志需要考虑到复制延时的问题。在ext3的文件系统下,删除大的文件需要一定的时间,会导致严重的复制延时。为了避免复制延时,需要暂时为中继日志创建硬链接,因为在linux系统中通过硬链接删除大文件速度会很快。(在mysql数据库中,删除大表时,通常也采用建立硬链接的方式)

设置定期清理relay脚本(两台slave服务器):

[root@bogon ~]# vim purge_relay_log.sh

#!/bin/bash
user=root
passwd=
port=
log_dir='/data/masterha/log'
work_dir='/data'
purge='/usr/local/bin/purge_relay_logs' if [ ! -d $log_dir ]
then
mkdir $log_dir -p
fi $purge --user=$user --password=$passwd --disable_relay_log_purge --port=$port --workdir=$work_dir >> $log_dir/purge_relay_logs.log >&1 参数说明:
--user mysql                      //用户名
--password mysql //密码
--port //端口号
--workdir //指定创建relay log的硬链接的位置,默认是/var/tmp,由于系统不同分区创建硬链接文件会失败,故需要执行硬链接具体位置,成功执行脚本后,硬链接的中继日志文件被删除
--disable_relay_log_purge //默认情况下,如果relay_log_purge=1,脚本会什么都不清理,自动退出,通过设定这个参数,当relay_log_purge=1的情况下会将relay_log_purge设置为0。清理relay log之后,最后将参数设置为OFF。
 

[root@bogon ~]# crontab -l

  * * * /bin/bash /root/purge_relay_log.sh

#两台slave的清除时间不要是一致的,不然等到要恢复的时候就尴尬了。

purge_relay_logs脚本删除中继日志不会阻塞SQL线程。下面我们手动执行看看什么情况:

[root@bogon ~]# purge_relay_logs --user=root --password=123456 --port=3306 -disable_relay_log_purge --workdir=/data/

-- ::: purge_relay_logs script started.
Found relay_log.info: /data/mysql/relay-log.info
Removing hard linked relay log files localhost-relay-bin* under /data/.. done.
Current relay log file: /data/mysql/localhost-relay-bin.
Archiving unused relay log files (up to /data/mysql/localhost-relay-bin.) ...
Creating hard link for /data/mysql/localhost-relay-bin. under /data//localhost-relay-bin.000001 .. ok.
Creating hard links for unused relay log files completed.
Executing SET GLOBAL relay_log_purge=; FLUSH LOGS; sleeping a few seconds so that SQL thread can delete older relay log files (if it keeps up); SET GLOBAL relay_log_purge=; .. ok.
Removing hard linked relay log files localhost-relay-bin* under /data/.. done.
-- ::: All relay log purging operations succeeded.

4.)检查SSH配置(server01 192.168.50.28 Monitor 监控节点上操作),如下:

[root@bogon ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf

Fri Nov 3 15:29:01 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Nov 3 15:29:01 2017 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Fri Nov 3 15:29:01 2017 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Fri Nov 3 15:29:01 2017 - [info] Starting SSH connection tests..
Fri Nov 3 15:29:03 2017 - [debug]
Fri Nov 3 15:29:02 2017 - [debug] Connecting via SSH from root@192.168.50.28(192.168.50.28:22) to root@192.168.50.116(192.168.50.116:22)..
Fri Nov 3 15:29:03 2017 - [debug] ok.
Fri Nov 3 15:29:03 2017 - [debug] Connecting via SSH from root@192.168.50.28(192.168.50.28:22) to root@192.168.50.115(192.168.50.115:22)..
Fri Nov 3 15:29:03 2017 - [debug] ok.
Fri Nov 3 15:29:03 2017 - [debug]
Fri Nov 3 15:29:01 2017 - [debug] Connecting via SSH from root@192.168.50.115(192.168.50.115:22) to root@192.168.50.116(192.168.50.116:22)..
Fri Nov 3 15:29:03 2017 - [debug] ok.
Fri Nov 3 15:29:03 2017 - [debug] Connecting via SSH from root@192.168.50.115(192.168.50.115:22) to root@192.168.50.28(192.168.50.28:22)..
Fri Nov 3 15:29:03 2017 - [debug] ok.
Fri Nov 3 15:29:03 2017 - [debug]
Fri Nov 3 15:29:01 2017 - [debug] Connecting via SSH from root@192.168.50.116(192.168.50.116:22) to root@192.168.50.115(192.168.50.115:22)..
Fri Nov 3 15:29:02 2017 - [debug] ok.
Fri Nov 3 15:29:02 2017 - [debug] Connecting via SSH from root@192.168.50.116(192.168.50.116:22) to root@192.168.50.28(192.168.50.28:22)..
Fri Nov 3 15:29:03 2017 - [debug] ok.
Fri Nov 3 15:29:03 2017 - [info] All SSH connection tests passed successfully.

5.)检查整个复制环境状况(192.168.50.28 Monitor 监控节点上操作),如下:

[root@bogon ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf                           (错误环境我就不重现了,直接复制的网上的案例,所以别太在意IP什么的,只看搭建流程和error的解决方法就好。)

Sun Jan  ::  - [info]   Executing command: save_binary_logs --command=test --start_pos= --binlog_dir=/data/mysql --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.
Sun Jan :: - [info] Connecting to root@192.168.2.128(192.168.2.128)..
Creating /tmp if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /data/mysql, up to mysql-bin.
Sun Jan :: - [info] Master setting check done.
Sun Jan :: - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Sun Jan :: - [info] Executing command : apply_diff_relay_logs --command=test --slave_user=root --slave_host=192.168.2.129 --slave_ip=192.168.2.129 --slave_port= --workdir=/tmp --target_version=5.5.-log --manager_version=0.56 --relay_log_info=/data/mysql/relay-log.info --relay_dir=/data/mysql/ --slave_pass=xxx
Sun Jan :: - [info] Connecting to root@192.168.2.129(192.168.2.129:)..
Can't exec "mysqlbinlog": 没有那个文件或目录 at /usr/local/share/perl5/MHA/BinlogManager.pm line 99.
mysqlbinlog version not found!
at /usr/local/bin/apply_diff_relay_logs line
Sun Jan :: - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln193] Slaves settings check failed!
Sun Jan :: - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln372] Slave configuration failed.
Sun Jan :: - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln383] Error happend on checking configurations. at /usr/local/bin/masterha_check_repl line
Sun Jan :: - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln478] Error happened on monitoring servers.
Sun Jan :: - [info] Got exit code (Not master dead). MySQL Replication Health is NOT OK!

如果发现如下错误:

Can't exec "mysqlbinlog": No such file or directory at /usr/local/share/perl5/MHA/BinlogManager.pm line 99.
mysqlbinlog version not found!
Testing mysql connection and privileges..sh: mysql: command not found
mysql command failed with rc 127:0!

可以通过以下方法解决(在所有节点上执行):

ln -s /usr/local/mysql/bin/mysqlbinlog /usr/local/bin/mysqlbinlog
ln -s /usr/local/mysql/bin/mysql /usr/local/bin/mysql

还是报错,纠结N久,才发现原因是:原来Failover两种方式:一种是虚拟IP地址,一种是全局配置文件。MHA并没有限定使用哪一种方式,而是让用户自己选择,虚拟IP地址的方式会牵扯到其它的软件,比如keepalive软件,而且还要修改脚本master_ip_failover。

所以先暂时注释master_ip_failover_script= /usr/local/bin/master_ip_failover这个选项。后面引入keepalived后和修改该脚本以后再开启该选项,如下:

192.168.2.131 [root ~]$ grep master_ip_failover /etc/masterha/app1.cnf
#master_ip_failover_script= /usr/local/bin/master_ip_failover
192.168.2.131 [root ~]$ masterha_check_repl --conf=/etc/masterha/app1.cnf
Sun Jan :: - [info] Slaves settings check done.
Sun Jan :: - [info]
192.168.2.128 (current master)
+--192.168.2.129
+--192.168.2.130 Sun Jan :: - [info] Checking replication health on 192.168.2.129..
Sun Jan :: - [info] ok.
Sun Jan :: - [info] Checking replication health on 192.168.2.130..
Sun Jan :: - [info] ok.
Sun Jan :: - [warning] master_ip_failover_script is not defined.
Sun Jan :: - [warning] shutdown_script is not defined.
Sun Jan :: - [info] Got exit code (Not master dead). MySQL Replication Health is OK.

6.)控制MHA Manager的运行状态

查询:masterha_check_status --conf=/etc/masterha/app1.cnf

启动:nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &

关闭:masterha_stop --conf=/etc/masterha/app1.cnf

查看日志:

tail -f /var/log/masterha/app1/manager.log
Sun Jan  ::  - [warning] master_ip_failover_script is not defined.
Sun Jan :: - [warning] shutdown_script is not defined.
Sun Jan :: - [info] Set master ping interval seconds.
Sun Jan :: - [info] Set secondary check script: /usr/local/bin/masterha_secondary_check -s server03 -s server02
Sun Jan :: - [info] Starting ping health check on 192.168.50.116(192.168.50.116:)..
Sun Jan :: - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

启动参数说明:

--remove_dead_master_conf      //该参数代表当发生主从切换后,老的主库的ip将会从配置文件中移除。

--manger_log                   //日志存放位置

--ignore_last_failover         //在缺省情况下,如果MHA检测到连续发生宕机,且两次宕机间隔不足8小时的话,则不会进行Failover,之所以这样限制是为了避免ping-pong效应。该参数代表忽略上次MHA触发切换产生的文件,默认情况下,MHA发生切换后会在日志目录,也就是上面我设置的/data产生app1.failover.complete文件,下次再次切换的时候如果发现该目录下存在该文件将不允许触发切换,除非在第一次切换后收到删除该文件,为了方便,这里设置为--ignore_last_failover。

7.)配置VIP
vip配置可以采用两种方式,一种通过keepalived的方式管理虚拟ip的浮动;另外一种通过脚本方式启动虚拟ip的方式(即不需要keepalived或者heartbeat类似的软件)。
下面先介绍通过安装keepalived来管理虚拟IP的浮动:

(1)下载软件进行并进行安装(两台master,准确的说一台是master,另外一台是备选master,在没有切换以前是slave)

  http://www.keepalived.org/software/keepalived-1.3.8.tar.gz

  进入解压完后的目录执行:

  ./configure --prefix=/usr/local/keepalived ;make && make install

  制作快捷启动:

  cp keepalived/etc/init.d/keepalived /etc/init.d/
  cp keepalived/etc/sysconfig/keepalived /etc/sysconfig
  mkdir /etc/keepalived
  cp /usr/local/keepalived/etc/keepalived/keepalived.conf /etc/keepalived/
  cp /usr/local/keepalived/sbin/keepalived /usr/sbin/

主master的配置:

! Configuration File for keepalived

global_defs {
notification_email {
1*******@qq.com
}
notification_email_from Alexandre.Cassen@firewall.loc
smtp_server 127.0.0.1
smtp_connect_timeout
router_id MySQL-HA
} vrrp_instance VI_1 {
state backup
interface ens192
virtual_router_id
priority
advert_int authentication {
auth_type PASS
auth_pass
}
virtual_ipaddress {
192.168.50.123
}
}

备用master的配置:

! Configuration File for keepalived

global_defs {
notification_email {
11*******@qq.com
}
notification_email_from Alexandre.Cassen@firewall.loc
smtp_server 127.0.0.1
smtp_connect_timeout
router_id MySQL-HA
} vrrp_instance VI_1 {
state BACKUP
interface ens160
virtual_router_id
priority
advert_int
nopreempt
authentication {
auth_type PASS
auth_pass
}
virtual_ipaddress {
192.168.50.123
}
}

其中router_id MySQL HA表示设定keepalived组的名称,将192.168.50.123这个虚拟ip绑定到该主机的ens的网卡上,并且设置了状态为backup模式。priority 150表示设置的优先级为150。nopreempt 允许一个priority比较低的节点作为master,即使有priority更高的节点启动,nopreemt必须在state为BACKUP的节点上才生效。(还有一个细节要注意的,要看清楚自己的网卡是eth0做模拟VIP,还是eth1)

在master与备用master依次启动:

/etc/init.d/keepalived start

执行命令 ip a (注意ifconfig命令无法查看到配置的虚拟IP)

注意:
上面两台服务器的keepalived都设置为了BACKUP模式,在keepalived中2种模式,分别是master->backup模式和backup->backup模式。这两种模式有很大区别。在master->backup模式下,一旦主库宕机,虚拟ip会自动漂移到从库,当主库修复后,keepalived启动后,还会把虚拟ip抢占过来,即使设置了非抢占模式(nopreempt)抢占ip的动作也会发生。在backup->backup模式下,当主库宕机后虚拟ip会自动漂移到从库上,当原主库恢复和keepalived服务启动后,并不会抢占新主的虚拟ip,即使是优先级高于从库的优先级别,也不会发生抢占。为了减少ip漂移次数,通常是把修复好的主库当做新的备库。

(8)MHA引入keepalived(MySQL服务进程挂掉时通过MHA 停止keepalived):

要想把keepalived服务引入MHA,我们只需要修改切换是触发的脚本文件master_ip_failover即可,在该脚本中添加在master发生宕机时对keepalived的处理。

1、编辑脚本/usr/local/bin/master_ip_failover,修改后如下

#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all'; use Getopt::Long; my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
); my $vip = '192.168.0.88';
my $ssh_start_vip = "/etc/init.d/keepalived start";
my $ssh_stop_vip = "/etc/init.d/keepalived stop"; GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
); exit &main(); sub main { print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n"; if ( $command eq "stop" || $command eq "stopssh" ) { my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) { my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
        exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
return 0  unless  ($ssh_user);
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

把#master_ip_failover_script= /usr/local/bin/master_ip_failover打开

 [root ~]$ grep 'master_ip_failover_script' /etc/masterha/app1.cnf
  master_ip_failover_script= /usr/local/bin/master_ip_failover

然后再次检测:

 masterha_check_repl --conf=/etc/masterha/app1.cnf
IN SCRIPT TEST====/etc/init.d/keepalived stop==/etc/init.d/keepalived start===

Checking the Status of the script.. OK
Tue Nov :: - [info] OK.
Tue Nov :: - [warning] shutdown_script is not defined.
Tue Nov :: - [info] Got exit code (Not master dead). MySQL Replication Health is OK.

查看日志如下面一样表示正在监控:tail -f /var/log/masterha/app1/manager.log

Tue Nov  ::  - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

9.)在管理节点更改hosts

[root@bogon ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.50.115 server01
192.168.50.116 server02
192.168.50.28 server03

10.)停掉Master的MySQL服务模拟宕机,然后到管理节点查看日志:

tail -f /var/log/masterha/app1/manager.log  (最后一段的展示)

----- Failover Report -----

app1: MySQL Master failover 192.168.50.116(192.168.50.116:3306) to 192.168.50.115(192.168.50.115:3306) succeeded

Master 192.168.50.116(192.168.50.116:3306) is down!

Check MHA Manager logs at bogon:/var/log/masterha/app1/manager.log for details.

Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.50.116(192.168.50.116:3306)
Selected 192.168.50.115(192.168.50.115:3306) as a new master.
192.168.50.115(192.168.50.115:3306): OK: Applying all logs succeeded.
192.168.50.115(192.168.50.115:3306): OK: Activated master IP address.
192.168.50.28(192.168.50.28:3306): OK: Slave started, replicating from 192.168.50.115(192.168.50.115:3306)
192.168.50.115(192.168.50.115:3306): Resetting slave info succeeded.
Master failover to 192.168.50.115(192.168.50.115:3306) completed successfully.
Tue Nov 21 16:32:05 2017 - [info] Sending mail..
Unknown option: conf

11.)在之前的Master(192.168.50.116)上查看一下vip,发现已经消失。然后在候选master查看正常。完成切换。

四、报错案例:

1.

复制检测时的案例:ERROR 1142 (42000) at line 1: CREATE command denied to user 'root'@'192.168.50.28' for table 'apply_diff_relay_logs_test'

解决:grant all privileges on *.* to 'root'@'192.168.50.%' identified by '12345678';

2.mha0.53版本BUG

1.)模拟故障发生,进行切换的日志报错:Got ERROR: Use of uninitialized value $msg in scalar chomp at /usr/local/share/perl5/MHA/ManagerConst.pm line 90.

解决方法有两个:

(1.1)网上说这是一个0.53的bug,建议更换0.56

(1.2)/usr/local/share/perl5/MHA/ManagerConst.pm 在此段中添加______的代码
our $log_fmt = sub {
my %args = @_;
my $msg = $args{message};
+ $msg = "" unless($msg);
chomp $msg;
if ( $args{level} eq "error" ) {
my ( $ln, $script ) = ( caller(4) )[ 2, 1 ];

2.)采用GTID主从复制后,模拟故障发生,切换成功,但slave无法指向新master,这是mha0.53版本的一个BUG更换MHA0.56及以上即可

[error][/usr/local/share/perl5/MHA/Server.pm, ln714] Checking slave status failed on 192.168.50.28(192.168.50.28:).
[error][/usr/local/share/perl5/MHA/Server.pm, ln817] Starting slave IO/SQL thread on 192.168.50.28(192.168.50.28:) failed!
Mon Nov :: - [info] End of log messages from 192.168.50.28.
Mon Nov :: - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln1537] Master failover to 192.168.50.115(192.168.50.115:) done, but recovery on slave partially failed. ----- Failover Report ----- app1: MySQL Master failover 192.168.50.116 to 192.168.50.115 Master 192.168.50.116 is down! Check MHA Manager logs at bogon:/var/log/masterha/app1/manager.log for details. Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.50.116.
The latest slave 192.168.50.115(192.168.50.115:) has all relay logs for recovery.
Selected 192.168.50.115 as a new master.
192.168.50.115: OK: Applying all logs succeeded.
192.168.50.115: OK: Activated master IP address.
192.168.50.28: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.50.28: WARN: Applying all logs succeeded. But starting slave failed.
Master failover to 192.168.50.115(192.168.50.115:) done, but recovery on slave partially failed.
Mon Nov :: - [info] Sending mail..
Option new_slave_hosts requires an argument
Unknown option: conf

3.故障切换成功后,启动监控时的日志:

[warning] SQL Thread is stopped(no error) on 192.168.50.115(192.168.50.115:3306)
[error][/usr/local/share/perl5/MHA/ServerManager.pm, ln732] Multi-master configuration is Master configurations are as below:
Master 192.168.50.115(192.168.50.115:3306), replicating from 192.168.50.116(192.168.50.116:3306)
Master 192.168.50.116(192.168.50.116:3306), dead

解决:

使用启动时自动删除老主的参数(--remove_dead_master_conf ),但前提条件是,主库down机后切换的环境必须是正常的,无任何报错情况下可使用。反之该参数无效,需要手动删除相应配置。

4.故障切换成功后,启动监控时的日志:
[warning] SQL Thread is stopped(no error) on 192.168.50.115(192.168.50.115:3306)
[error][/usr/local/share/perl5/MHA/ServerManager.pm, ln622] Master 192.168.50.116:3306 from which slave 192.168.50.115(192.168.50.115:3306) replicates is not defined in the configuration file!

由于新master此时还是存有指向老master的slave状态的,所以mha将新master当成了一个slave,而新master的slave所指向的是旧的且已经宕机的老master,所以报错,解决办法是清除新master的salve信息。

stop salve;

reset slave all;

正常情况下MHA会自动清除新master的slave指向状态的。所以这可能是主从的binlog执行环境不一致导致的,也可能是mha0.53版本BUG导致的。

5.设置MHA为master->backup模式

老master恢复后抢占主,此时需要将manager关闭,然后在配置文件中将老master添加回[server]字段,接着清除老master的slave信息并将新master change到老master,然后开启manager监控,在将新master服务关闭,此时MHA会进行故障转移的操作。(需要将keeplived设置为master——》backup模式,VIP会自动飘到优先级高的去)

基于GTID搭建MHA的更多相关文章

  1. 5.7 并行复制配置 基于GTID 搭建中从 基于GTID的备份与恢复,同步中断处理

    5.7 并行复制配置 基于GTID 搭建中从 基于GTID的备份与恢复,同步中断处理 这个文章包含三个部分 1:gtid的多线程复制2:同步中断处理3:GTID的备份与恢复 下面文字相关的东西 大部分 ...

  2. 基于GTID搭建主从MySQL

    目录 基于gtid搭建主从MySQL 一.GTID的使用 二.GTID的简介 三.GTID的构成 四.查看GTID的执行情况 4.1 gtid_executed 4.2 gtid_own 4.3 gt ...

  3. Percona XtraBackup不锁库搭建slave数据库-基于GTID

    Percona XtraBackup不锁库搭建slave数据库-基于GTID 1.下载安装epel源并安装 wget http://ftp.cuhk.edu.hk/pub/linux/fedora-e ...

  4. 基于GTID的Mysql-Mha高可用方案探索

    声明: 本篇文章内容整理来源于互联网以及本人自己的梳理总结,目的是从零到一的搭建起来mysql mha高可用架构. 一.软件概述 MHA(Master High Availability)目前在MyS ...

  5. Mysql基于GTID复制模式-运维小结 (完整篇)

    先来看mysql5.6主从同步操作时遇到的一个报错:mysql> change master to master_host='192.168.10.59',master_user='repli' ...

  6. Mysql5.7实现主从复制、基于GTID的主从复制、并行复制

    (一.主从复制) 一.mysql主从复制原理    mysql的默认复制方式是主从复制.Mysql内建的复制功能是构建大型,高性能应用程序的基础.将Mysql的数据分布到多个系统上去,这种分布的机制, ...

  7. MySQL 5.7基于GTID复制的常见问题和修复步骤(二)

    [问题二] 有一个集群(MySQL5.7.23)切换后复制slave报1236,其实是不小心在slave上执行了事务导致 Got fatal error 1236 from master when r ...

  8. MySQL5.7配置基于GTID的复制

    MySQL5.7下配置GTID复制的方法: 修改主库和从库的配置文件,加入下列部分的配置项:主库:[mysqld]log-bin=mysql-binbinlog_format= ROWgtid-mod ...

  9. GTID与MHA

    MHA 基于binlog文件位置的复制 * Phase 3: Master Recovery Phase.. * Phase 3.1: Getting Latest Slaves Phase.. La ...

随机推荐

  1. Spark2.0机器学习系列之7: MLPC(多层神经网络)

    Spark2.0 MLPC(多层神经网络分类器)算法概述 MultilayerPerceptronClassifier(MLPC)这是一个基于前馈神经网络的分类器,它是一种在输入层与输出层之间含有一层 ...

  2. HDU3123:GCC(同余模简单题)

    题目:http://acm.hdu.edu.cn/showproblem.php?pid=3123 题意很简单,就是同余模的简单应用. 代码如下: #include <iostream> ...

  3. BC#29A:GTY's math problem(math) B:GTY's birthday gift(矩阵快速幂)

    A: HDU5170 这题让比较a^b与c^d的大小.1<=a,b,c,d<=1000. 显然这题没法直接做,要利用对数来求,但是在math库中有关的对数函数返回的都是浮点数,所以这又要涉 ...

  4. 1141 PAT Ranking of Institutions[难]

    1141 PAT Ranking of Institutions (25 分) After each PAT, the PAT Center will announce the ranking of ...

  5. Linux界面交互与目录结构

    一.交互通道 Linux系统环境默认有六个命令交互通道和一个图形界面交互通道,默认进入的是图形界面通道. 命令交互模式切换:ctrl+alt+F1-F6 图形界面交互模式:ctrl+alt+F7   ...

  6. 聚类之k-means

    1.介绍 k-means算法以k为参数(所期望的簇的个数),把n个对象分成k个簇(单层划分),用质心(数据点的平均值)定义簇的原型.使得簇内具有较高的相似度,而簇间的相似度较低. 通过聚类,我们能够发 ...

  7. yield的表达式形式的应用(待补充)

    1.yield的表达式形式应用的定义: 在一个生成器函数内,将yield赋值给一个变量,这就是yield的表达式形式.也叫生成器的表达式形式 2.send方法的定义: (1)定义: yield的表达式 ...

  8. Rundeck概况

    1.Rundeck介绍 RunDeck是用Java/Grails写的开源工具,帮助用户在数据中心或者云环境中自动化各种操作和流程.通过命令行或者web界面,用户可以对任意数量的服务器进行操作,大大降低 ...

  9. 跨域问题-jsonp

    前端同源策略并不会拦截静态资源请求,那么就将接口伪装成资源,然后后端配合返回一个前端预定义的方法调用,将返回值放入调用该函数的形参即可 <!DOCTYPE html> <html l ...

  10. java synchronized关键字的底层实现

    每个对象都有一个锁(Monitor,监视器锁),class对象也有锁,如果synchronized关键字修饰同步代码块,通过反编译可以看到,其实是有个monitorenter和monitorexit指 ...