MySQL高可用集群MHA方案

爱奇艺在用的数据库高可用方案

MHA 是目前比较成熟及流行的 MySQL 高可用解决方案,很多互联网公司正是直接使用或者基于 MHA 的架构进行改造实现 MySQL 的高可用。

MHA 能在 30 秒内对故障进行转移,并最大程度的保障数据的一致性。MHA 由两个模块组成:Manager 和 Node。

什么是MHA

MHA(Master High Availability)目前在MySQL高可用方面是一个相对成熟的解决方案,是一套优秀的作为MySQL高可用性环境下故障切换和主从提升的高可用软件。

MHA作用是保证MySQL主从复制集群中的master高可用性,也就保证整个数据库集群业务不被故障影响。

  • master故障时,MHA会在30s内实现故障自动检测+故障转移
  • 选择一个最优的slave接替为新的master,并且保证new_master和其他slave继续保持数据一致性

高可用性HA、high availability

指的是一个经过设计的系统,能保证减少架构故障时的停工时间,保证业务程序的高度可用性

超哥也在各种运维业务场景下,接触过HA软件

无论是web、数据库、还是后端

MHA架构

整个MHA软件由两部分角色组成,即MHA Manager(管理节点)和MHA Node(数据节点)。

MHA Manager服务可以独立部署在一台linux机器,也可以部署在某一台主从复制从节点或者其他应用服务器节点上。

MHA Node服务需要运行在每一个MySQL服务器上。

MHA Manager会定时通过主库上的MHA Node服务监测主库,当master出现故障时,它可以自动将最优slave(可以提前指定或由MHA判定)提升为新的master,然后让所有其他的从库与新的主库重新保持正常的复制状态。

故障的整个切换和转移的过程对客户以及应用程序几乎是完全透明的(也就是用户不会感知到有故障发生)

MHA工作原理

MHA主要功能

  • master宕机、切换新的master,且保证其他slave和新的master保持一致复制
  • 故障切换过程中,集群数据丢失量最小

一、选择新master

old_master宕机,在集群中选择一个新的slave作为new_master,这要根据MHA的配置,如根据其他slave的binlog位置点,选择最新的slave作为new_master

二、数据补全

进行故障切换、转移之前,必须要进行数据补全,否则即使故障切换了,数据丢了那也是不允许的

数据补全过程

  • old_master数据库服务器还可以连接,MHA会SSH连接主库,保存主库所有的binlog

    • 若ssh无法连接,放弃主库的binlog数据
  • 以切换好的new_master主库的binlog位置点位基准点,通过relay_log进行数据补全,使得其他所有slave和new_master数据一直
  • 将宕机时从old_master上保存下来的binlog日志(如果存在的话)恢复到所有的数据库节点.

三、角色切换

  • 已选择好的new_master正式提升为主库角色
  • 其他的slave和new_master保持主从复制关系

四、有关master主库IP切换的问题,可以结合keepalived的VIP漂移来实现

MHA软件包介绍

MHA由2部分组成

Manager节点

Node节点

Manager节点命令

Node命令

MHA特点

  • old_master宕机,slave快速切换为new_master
  • 部署MHA与不会对现有的MySQL集群做大量改动
  • MHA——manager功能强大,可以管理上百个节点、多套mysql集群
  • 可以监控mysql状态,隔N秒向master发送ping包,性能不受影响
  • 只要mySQL主从复制支持的存储引擎,MHA也都支持,不限于InnoDB

MHA部署

2台linux及以上

多个mysql实例之间实现复制

这里超哥准备四台linux机器

master1 10.211.55.12 MHA-node

Slave1 10.211.55.9 MHA-node

Slave2 10.211.55.11 MHA-node,MHA-Manager

Client 10.211.55.18 空

注意时间同步

ntpdate -u ntp.aliyun.com

准备好一主两从-GTID

MHA需要支持一主多从架构,至少三台数据库,三台机器,基于GTID的主从复制,三个配置文件,仅有server-id不同

一个master

一个备用master

一个slave

额外配置my.cnf

relay_log_purge = 0              #<==不自动删除relay log,以便于宕机后修复数据。
log-bin=/mm_data/3306/mysql-bin #<==从库开启binlog,以便于宕机
后修复数据。
expire_logs_days = 7 #<==自动删除7天前的binlog。
log-slave-updates = 1 #<==从库开启Binlog,以便于宕机后修复数据。

master基础配置

My.cnf

[client]
socket=/mm_data/3306/mysql.sock [mysqld]
socket=/mm_data/3306/mysql.sock
basedir=/application/mysql-5.6.40-linux-glibc2.12-x86_64/
datadir=/mm_data/3306/data
log-bin=/mm_data/3306/mysql-bin
character-set-server=utf8 server-id=12
expire-logs-days=1
binlog_format=row
gtid_mode=on
enforce_gtid_consistency=1
log_slave_updates=1 [mysqld_safe]
log-error=/mm_data/3306/mysql_3306_error.log
pid-file=/mm_data/3306/mysqld_3306.pid
sql_mode=NO_ENGINE_SUBSTITUTION,STRTICT_TRANS_TABLES

数据目录

[root@mysql-server56 tools]# /mm_data/3306/mysql_3306 restart
Restarting MySQL...
Stoping MySQL...
Starting MySQL...
[root@mysql-server56 tools]# ls /mm_data/3306/
data my.cnf mysql_3306 mysql_3306_error.log mysql-bin.000001 mysql-bin.000002 mysql-bin.index mysqld_3306.pid mysql.sock
[root@mysql-server56 tools]#
[root@mysql-server56 tools]# netstat -tunlp|grep mysql
tcp6 0 0 :::3306 :::* LISTEN 11040/mysqld
[root@mysql-server56 tools]#

主库信息

mysql> show master status\G
*************************** 1. row ***************************
File: mysql-bin.000002
Position: 151
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set:
1 row in set (0.00 sec) mysql> mysql> show slave status;
Empty set (0.00 sec) mysql>

创建复制账号

mysql> grant replication slave on *.* to 'repl_chaoge'@'10.211.55.%' identified by 'chaoge668';
Query OK, 0 rows affected (0.00 sec) mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec) mysql>

数据导出

[root@mysql-server56 3306]# mysqldump -S /mm_data/3306/mysql.sock  -A -B -x --set-gtid-purged=OFF|gzip > /mm_data/m1_alldb_$(date +%F).sql.gz

数据发给所有slave

[root@mysql-server56 tools]# scp -rp /mm_data/m1_alldb_2021-04-27.sql.gz root@10.211.55.9:/mm_data/
m1_alldb_2021-04-27.sql.gz 100% 177KB 28.6MB/s 00:00 [root@mysql-server56 tools]# scp -rp /mm_data/m1_alldb_2021-04-27.sql.gz root@10.211.55.11:/mm_data/
m1_alldb_2021-04-27.sql.gz 100% 177KB 38.5MB/s 00:00
[root@mysql-server56 tools]#

查看GTID信息

mysql> show global variables like '%gtid%';
+---------------------------------+------------------------------------------+
| Variable_name | Value |
+---------------------------------+------------------------------------------+
| binlog_gtid_simple_recovery | OFF |
| enforce_gtid_consistency | ON |
| gtid_executed | 20bdbbd9-a5cd-11eb-a6af-001c4279bcf3:1-5 |
| gtid_mode | ON |
| gtid_owned | |
| gtid_purged | |
| simplified_binlog_gtid_recovery | OFF |
+---------------------------------+------------------------------------------+
7 rows in set (0.00 sec) mysql> mysql> show global variables like 'server%';
+----------------+--------------------------------------+
| Variable_name | Value |
+----------------+--------------------------------------+
| server_id | 12 |
| server_id_bits | 32 |
| server_uuid | 20bdbbd9-a5cd-11eb-a6af-001c4279bcf3 |
+----------------+--------------------------------------+
3 rows in set (0.00 sec)

slave1~10.211.55.9~基础配置

my.cnf server-id=9

[client]
socket=/mm_data/3306/mysql.sock [mysqld]
socket=/mm_data/3306/mysql.sock
basedir=/application/mysql-5.6.40-linux-glibc2.12-x86_64/
datadir=/mm_data/3306/data
log-bin=/mm_data/3306/mysql-bin
character-set-server=utf8 server-id=9
expire-logs-days=1
binlog_format=row
gtid_mode=on
enforce_gtid_consistency=1
log_slave_updates=1 [mysqld_safe]
log-error=/mm_data/3306/mysql_3306_error.log
pid-file=/mm_data/3306/mysqld_3306.pid
sql_mode=NO_ENGINE_SUBSTITUTION,STRTICT_TRANS_TABLES

导入master数据

[root@chaoge_slave1 3306]# zcat /mm_data/m1_alldb_2021-04-27.sql.gz |mysql -S /mm_data/3306/mysql.sock

[root@chaoge_slave1 3306]# ./mysql_3306 restart
Restarting MySQL...
Stoping MySQL...
Starting MySQL...
[root@chaoge_slave1 3306]# ls
data my.cnf mysql_3306 mysql_3306_error.log mysql-bin.000001 mysql-bin.index mysqld_3306.pid mysql.sock
[root@chaoge_slave1 3306]#

授权master-info

mysql> change master to
-> master_host='10.211.55.12',
-> master_port=3306,
-> master_user='repl_chaoge',
-> master_password='chaoge668',
-> master_auto_position=1;
Query OK, 0 rows affected, 2 warnings (0.00 sec)

启动slave

mysql> start slave;

mysql> show slave status\G

mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.211.55.12
Master_User: repl_chaoge
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000002
Read_Master_Log_Pos: 644733
Relay_Log_File: mysqld_3306-relay-bin.000002
Relay_Log_Pos: 408
Relay_Master_Log_File: mysql-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes

slave2~10.211.55.11~基础配置

my.cnf

server-id=11

[client]
socket=/mm_data/3306/mysql.sock [mysqld]
socket=/mm_data/3306/mysql.sock
basedir=/application/mysql-5.6.40-linux-glibc2.12-x86_64/
datadir=/mm_data/3306/data
log-bin=/mm_data/3306/mysql-bin
character-set-server=utf8 server-id=11
expire-logs-days=1
binlog_format=row
gtid_mode=on
enforce_gtid_consistency=1
log_slave_updates=1 [mysqld_safe]
log-error=/mm_data/3306/mysql_3306_error.log
pid-file=/mm_data/3306/mysqld_3306.pid
sql_mode=NO_ENGINE_SUBSTITUTION,STRTICT_TRANS_TABLES

导入master数据

[root@chaoge_slave2 3306]# zcat /mm_data/m1_alldb_2021-04-27.sql.gz |mysql -S /mm_data/3306/mysql.sock
[root@chaoge_slave2 3306]# ./mysql_3306 restart
Restarting MySQL...
Stoping MySQL...
Starting MySQL...
[root@chaoge_slave2 3306]# ls
data mysql_3306 mysql-bin.000001 mysql-bin.000003 mysqld_3306.pid
my.cnf mysql_3306_error.log mysql-bin.000002 mysql-bin.index mysql.sock
[root@chaoge_slave2 3306]#

授权master-info

change master to
master_host='10.211.55.12',
master_port=3306,
master_user='repl_chaoge',
master_password='chaoge668',
master_auto_position=1;

启动slave

mysql> start slave;

mysql> show slave status\G

主从结果

主从结果是正常的

SSH免密登录

MHA_Manager管理节点是通过ssh服务连接其他node节点进行探测、以及获取数据,必须提前做好ssh免密连接

最终结果就是,四台机器,可以任意ssh免密互相登录,包括机器本身

# 生成公私钥
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa > /dev/null 2>&1 # 传输公钥
ssh-copy-id -i ~/.ssh/id_dsa.pub 10.211.55.9
ssh-copy-id -i ~/.ssh/id_dsa.pub 10.211.55.11
ssh-copy-id -i ~/.ssh/id_dsa.pub 10.211.55.12
ssh-copy-id -i ~/.ssh/id_dsa.pub 10.211.55.18

所有节点依赖安装

所有节点,安装MHA基础依赖

配置好yum源

yum install -y perl-DBD-MySQL \
perl-Config-Tiny \
perl-Log-Dispatch \
perl-Parallel-ForkManager \
perl-ExtUtils-CBuilder \
perl-ExtUtils-MakeMaker \
perl-CPAN

所有mysql节点安装MHA-node

安装rpm包

wget --no-check-certificate https://qiniu.wsfnk.com/mha4mysql-node-0.58-0.el7.centos.noarch.rpm

rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm

检查rpm安装出的命令

[root@chaoge_slave1 tools]# ls -l /usr/bin/*_*log*
-rwxr-xr-x 1 root root 17639 Mar 23 2018 /usr/bin/apply_diff_relay_logs
-rwxr-xr-x 1 root root 15704 Aug 9 2019 /usr/bin/db_log_verify
-rwxr-xr-x 1 root root 33032 Aug 9 2019 /usr/bin/db_printlog
-rwxr-xr-x 1 root root 4807 Mar 23 2018 /usr/bin/filter_mysqlbinlog
-rwxr-xr-x 1 root root 8337 Mar 23 2018 /usr/bin/purge_relay_logs
-rwxr-xr-x 1 root root 7525 Mar 23 2018 /usr/bin/save_binary_logs
-rwxr-xr-x. 1 root root 7910 Aug 4 2017 /usr/bin/scsi_logging_level
-rwxr-xr-x. 1 root root 94696 Aug 4 2017 /usr/bin/sg_logs

MHA工具会检测mysql命令,这里还需要加一个软连接

# 三台mysql节点,都执行该命令
ln -s /application/mysql-5.6.40-linux-glibc2.12-x86_64/bin/mysqlbinlog /usr/bin/mysqlbinlog

所有节点,创建MHA管理账号,三台机器都操作

mysql> grant all privileges on *.* to mha@'10.211.55.%' identified by 'mha_chaoge';
Query OK, 0 rows affected (0.00 sec) mysql>

MHA-Manager管理节点

MHA管理节点可以装在任何节点,超哥这里就给安装到了slave02 节点

因为Manager管理节点,通过ssh检测mysql集群,如果master节点服务器宕机,或者网络故障,MHA也无法完成故障切换了。

因此mha-manager不能装在master节点

wget --no-check-certificate https://qiniu.wsfnk.com/mha4mysql-manager-0.58-0.el7.centos.noarch.rpm

[root@chaoge_slave2 tools]# ls mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
[root@chaoge_slave2 tools]#

安装管理节点的依赖

yum install -y perl-Config-Tiny epel-release perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes

安装MHA-Manager的包

[root@chaoge_slave2 tools]# rpm -ivh mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
Preparing... ################################# [100%]
Updating / installing...
1:mha4mysql-manager-0.58-0.el7.cent################################# [100%]
[root@chaoge_slave2 tools]#

检查mha-manager的命令(slave2机器)

[root@chaoge_slave2 tools]# ls -l /usr/bin/masterha_*
-rwxr-xr-x 1 root root 1995 Mar 23 2018 /usr/bin/masterha_check_repl
-rwxr-xr-x 1 root root 1779 Mar 23 2018 /usr/bin/masterha_check_ssh
-rwxr-xr-x 1 root root 1865 Mar 23 2018 /usr/bin/masterha_check_status
-rwxr-xr-x 1 root root 3201 Mar 23 2018 /usr/bin/masterha_conf_host
-rwxr-xr-x 1 root root 2517 Mar 23 2018 /usr/bin/masterha_manager
-rwxr-xr-x 1 root root 2165 Mar 23 2018 /usr/bin/masterha_master_monitor
-rwxr-xr-x 1 root root 2373 Mar 23 2018 /usr/bin/masterha_master_switch
-rwxr-xr-x 1 root root 5172 Mar 23 2018 /usr/bin/masterha_secondary_check
-rwxr-xr-x 1 root root 1739 Mar 23 2018 /usr/bin/masterha_stop
[root@chaoge_slave2 tools]#

创建MHA配置文件

基础信息如下,这里不需要操作

**主从账密**

mysql> grant replication slave on *.* to 'repl_chaoge'@'10.211.55.%' identified by 'chaoge668';

**MHA账密**

mysql> grant all privileges on *.* to mha@'10.211.55.%' identified by 'mha_chaoge';
Query OK, 0 rows affected (0.00 sec) # 一定记住,刷新用户表
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec) **主机host信息**,所有机器都做好/etc/hosts解析 10.211.55.11 chaoge_slave2 ,mha-manager安装在这里 10.211.55.9 chaoge_slave1 10.211.55.12 mysql-server56 ,如果master挂了,需要其他slave去检测

注释版

[root@chaoge_slave2 tools]# mkdir -p /etc/mha           #<==在/etc下创建mha目录。
[root@chaoge_slave2 tools]# mkdir -p /var/log/mha/app1 #<==在/etc下创建mha目录。
[root@chaoge_slave2 tools]# vim /etc/mha/app1.cnf #<==编辑mha配置文件,增加配置内容。 [server default] #<==默认模块标签。
manager_log=/var/log/mha/app1/manager.log #<==配置日志路径。
manager_workdir=/var/log/mha/app1.log #<==配置工作日志路径。
master_binlog_dir=/mm_data/3306/data/ #<==配置MHA保存主库binlog日志的路径。
user=mha #<==MySQL数据中授权的用户。
password=mha_chaoge #<==MySQL数据中授权的用户。
ping_interval=2 #<==设置监控主库发送ping数据包的时间间隔,
若尝试三次没有回应则自动进行failover。
repl_user=repl_chaoge #<==主从复制对应的用户。
repl_password=chaoge668 #<==主从复制用户对应的密码。
ssh_user=root #<==ssh远程连接服务器的用户。
report_script=/usr/local/send_report #<==设置故障发生切换后触发执行的脚本。
secondary_check_script=/usr/local/bin/masterha_secondary_check -s 10.211.55.9 -s 10.211.55.11 --user=root --master_host=mysql-server56 --master_ip=10.211.55.12 --master_port=3306
#<==当MHA Manager节点到MASTER节点(mysql-server56)的监控之间出现问题时,MHA Manager将会尝试从其他路径登录到MASTER(mysql-server56)节点。
#<==注:此配置在MHA Manager节点只有单独一台机器时起作用。意思就是,在Manager节点联系不上Master时,通过两个从节点(chaoge_slave1 、chaoge_slave2)去探视Master(mysql-server56)节点的状态。 shutdown_script="" #<==设置故障发生后执行主机脚本关闭故障机(防止故障机活过来发生脑裂)
[server1] #<==第一个mysql-master主机模块标签。
hostname=10.211.55.12 #<==第一个mysql-master主机IP。
port=3306 #<==第一个mysql主机端口。 [server2]
hostname=10.211.55.11
port=3306
candidate_master=1 #<==设定此参数后,server2标签的主机,将优先作为主库,宕机的候选服务器(切换主库优先选择)。
check_repl_delay=0 #<==设定此参数后,MHA会忽略主从复制延迟,将此服务器作为后选主机。 [server3]
hostname=10.211.55.9
port=3306

最终版配置文件

[root@chaoge_slave2 tools]# mkdir -p /etc/mha
[root@chaoge_slave2 tools]# mkdir -p /var/log/mha/app1
[root@chaoge_slave2 tools]# # 配置文件如下
[server default]
manager_log=/var/log/mha/app1/manager.log
manager_workdir=/var/log/mha/app1.log
master_binlog_dir=/mm_data/3306/data/
# 该脚本暂时先注释
#master_ip_failover_script=/usr/local/bin/master_ip_failover
user=mha
password=mha_chaoge
ping_interval=2
repl_user=repl_chaoge
repl_password=chaoge668
ssh_user=root
shutdown_script="" [server1]
candidate_master=1
check_repl_delay=0
hostname=10.211.55.12
port=3306 [server2]
hostname=10.211.55.11
port=3306
candidate_master=1
check_repl_delay=0 [server3]
hostname=10.211.55.9
port=3306

自愈检测脚本

#!/usr/bin/env perl

use strict;
use warnings FATAL => 'all'; use Getopt::Long; my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
); my $vip = '';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig eth1:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth1:$key down"; GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
); exit &main(); sub main { print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n"; if ( $command eq "stop" || $command eq "stopssh" ) { my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) { my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
&usage();
exit 1;
}
} sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
return 0 unless ($ssh_user);
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
} sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

预备启动MHA

检测如下MHA运行条件

  • SSH免密登录
  • MySQL主从复制

检测SSH登录

如下结果表明服务器之间的SSH免密登录没有问题

[root@chaoge_slave2 tools]# masterha_check_ssh --conf=/etc/mha/app1.conf
Tue Apr 27 15:06:51 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Apr 27 15:06:51 2021 - [info] Reading application default configuration from /etc/mha/app1.conf..
Tue Apr 27 15:06:51 2021 - [info] Reading server configuration from /etc/mha/app1.conf..
Tue Apr 27 15:06:51 2021 - [info] Starting SSH connection tests..
Tue Apr 27 15:06:52 2021 - [debug]
Tue Apr 27 15:06:51 2021 - [debug] Connecting via SSH from root@10.211.55.12(10.211.55.12:22) to root@10.211.55.11(10.211.55.11:22)..
Tue Apr 27 15:06:51 2021 - [debug] ok.
Tue Apr 27 15:06:51 2021 - [debug] Connecting via SSH from root@10.211.55.12(10.211.55.12:22) to root@10.211.55.9(10.211.55.9:22)..
Tue Apr 27 15:06:51 2021 - [debug] ok.
Tue Apr 27 15:06:52 2021 - [debug]
Tue Apr 27 15:06:51 2021 - [debug] Connecting via SSH from root@10.211.55.11(10.211.55.11:22) to root@10.211.55.12(10.211.55.12:22)..
Tue Apr 27 15:06:51 2021 - [debug] ok.
Tue Apr 27 15:06:51 2021 - [debug] Connecting via SSH from root@10.211.55.11(10.211.55.11:22) to root@10.211.55.9(10.211.55.9:22)..
Tue Apr 27 15:06:52 2021 - [debug] ok.
Tue Apr 27 15:06:53 2021 - [debug]
Tue Apr 27 15:06:52 2021 - [debug] Connecting via SSH from root@10.211.55.9(10.211.55.9:22) to root@10.211.55.12(10.211.55.12:22)..
Tue Apr 27 15:06:52 2021 - [debug] ok.
Tue Apr 27 15:06:52 2021 - [debug] Connecting via SSH from root@10.211.55.9(10.211.55.9:22) to root@10.211.55.11(10.211.55.11:22)..
Tue Apr 27 15:06:52 2021 - [debug] ok.
Tue Apr 27 15:06:53 2021 - [info] All SSH connection tests passed successfully.
[root@chaoge_slave2 tools]#

检测主从复制情况

MHA也提供了主从复制检测

[root@chaoge_slave2 3306]# masterha_check_repl --conf=/etc/mha/app1.conf
Tue Apr 27 16:54:50 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Apr 27 16:54:50 2021 - [info] Reading application default configuration from /etc/mha/app1.conf..
Tue Apr 27 16:54:50 2021 - [info] Reading server configuration from /etc/mha/app1.conf..
Tue Apr 27 16:54:50 2021 - [info] MHA::MasterMonitor version 0.58.
Tue Apr 27 16:54:51 2021 - [info] GTID failover mode = 1
Tue Apr 27 16:54:51 2021 - [info] Dead Servers:
Tue Apr 27 16:54:51 2021 - [info] Alive Servers:
Tue Apr 27 16:54:51 2021 - [info] 10.211.55.12(10.211.55.12:3306)
Tue Apr 27 16:54:51 2021 - [info] 10.211.55.11(10.211.55.11:3306)
Tue Apr 27 16:54:51 2021 - [info] 10.211.55.9(10.211.55.9:3306)
Tue Apr 27 16:54:51 2021 - [info] Alive Slaves:
Tue Apr 27 16:54:51 2021 - [info] 10.211.55.11(10.211.55.11:3306) Version=5.6.40-log (oldest major version between slaves) log-bin:enabled
Tue Apr 27 16:54:51 2021 - [info] GTID ON
Tue Apr 27 16:54:51 2021 - [info] Replicating from 10.211.55.12(10.211.55.12:3306)
Tue Apr 27 16:54:51 2021 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Apr 27 16:54:51 2021 - [info] 10.211.55.9(10.211.55.9:3306) Version=5.6.40-log (oldest major version between slaves) log-bin:enabled
Tue Apr 27 16:54:51 2021 - [info] GTID ON
Tue Apr 27 16:54:51 2021 - [info] Replicating from 10.211.55.12(10.211.55.12:3306)
Tue Apr 27 16:54:51 2021 - [info] Current Alive Master: 10.211.55.12(10.211.55.12:3306)
Tue Apr 27 16:54:51 2021 - [info] Checking slave configurations..
Tue Apr 27 16:54:51 2021 - [info] read_only=1 is not set on slave 10.211.55.11(10.211.55.11:3306).
Tue Apr 27 16:54:51 2021 - [info] read_only=1 is not set on slave 10.211.55.9(10.211.55.9:3306).
Tue Apr 27 16:54:51 2021 - [info] Checking replication filtering settings..
Tue Apr 27 16:54:51 2021 - [info] binlog_do_db= , binlog_ignore_db=
Tue Apr 27 16:54:51 2021 - [info] Replication filtering check ok.
Tue Apr 27 16:54:51 2021 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Tue Apr 27 16:54:51 2021 - [info] Checking SSH publickey authentication settings on the current master..
Tue Apr 27 16:54:51 2021 - [info] HealthCheck: SSH to 10.211.55.12 is reachable.
Tue Apr 27 16:54:51 2021 - [info]
10.211.55.12(10.211.55.12:3306) (current master)
+--10.211.55.11(10.211.55.11:3306)
+--10.211.55.9(10.211.55.9:3306) Tue Apr 27 16:54:51 2021 - [info] Checking replication health on 10.211.55.11..
Tue Apr 27 16:54:51 2021 - [info] ok.
Tue Apr 27 16:54:51 2021 - [info] Checking replication health on 10.211.55.9..
Tue Apr 27 16:54:51 2021 - [info] ok.
Tue Apr 27 16:54:51 2021 - [warning] master_ip_failover_script is not defined.
Tue Apr 27 16:54:51 2021 - [warning] shutdown_script is not defined.
Tue Apr 27 16:54:51 2021 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK.
[root@chaoge_slave2 3306]#

结果总结

结果必须和超哥一样,全都是info级别的日志信息,而不得有error日志

并且提示mysql replication health is OK,表示复制检查正常。

配置VIP漂移

上面超哥是临时关闭了VIP的漂移脚本

# 该脚本暂时先注释
#master_ip_failover_script=/usr/local/bin/master_ip_failover

这个作用是当master发生故障,迁移后数据库IP发生变化,解决这个问题,因此得使用VIP进行漂移

MHA已经提供好了perl脚本

我们只需要创建VIP即可,首先在master节点上创建

# 创建
[root@mysql-server56 3306]# ifconfig eth0:1 10.211.55.77/24 # 删除
[root@mysql-server56 3306]# ifconfig eth0:1 del 10.211.55.77/24 # 关闭
[root@mysql-server56 3306]# ifconfig eth0:1 down

使用vip脚本

脚本代码

master_ip_failover_script=/usr/local/bin/master_ip_failover

如下

vim /usr/local/bin/master_ip_failover

给与执行权限

chmod +x /usr/local/bin/master_ip_failover

#!/usr/bin/env perl

use strict;
use warnings FATAL => 'all'; use Getopt::Long; my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
); my $vip = '10.211.55.77/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down"; GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
); exit &main(); sub main { print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n"; if ( $command eq "stop" || $command eq "stopssh" ) { my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) { my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
&usage();
exit 1;
}
} sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
return 0 unless ($ssh_user);
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
} sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

再次检测MHA程序

[root@chaoge_slave2 3306]# masterha_check_repl --conf=/etc/mha/app1.conf
Tue Apr 27 17:29:19 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Apr 27 17:29:19 2021 - [info] Reading application default configuration from /etc/mha/app1.conf..
Tue Apr 27 17:29:19 2021 - [info] Reading server configuration from /etc/mha/app1.conf..
Tue Apr 27 17:29:19 2021 - [info] MHA::MasterMonitor version 0.58.
Tue Apr 27 17:29:20 2021 - [info] GTID failover mode = 1
Tue Apr 27 17:29:20 2021 - [info] Dead Servers:
Tue Apr 27 17:29:20 2021 - [info] Alive Servers:
Tue Apr 27 17:29:20 2021 - [info] 10.211.55.12(10.211.55.12:3306)
Tue Apr 27 17:29:20 2021 - [info] 10.211.55.11(10.211.55.11:3306)
Tue Apr 27 17:29:20 2021 - [info] 10.211.55.9(10.211.55.9:3306)
Tue Apr 27 17:29:20 2021 - [info] Alive Slaves:
Tue Apr 27 17:29:20 2021 - [info] 10.211.55.11(10.211.55.11:3306) Version=5.6.40-log (oldest major version between slaves) log-bin:enabled
Tue Apr 27 17:29:20 2021 - [info] GTID ON
Tue Apr 27 17:29:20 2021 - [info] Replicating from 10.211.55.12(10.211.55.12:3306)
Tue Apr 27 17:29:20 2021 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Apr 27 17:29:20 2021 - [info] 10.211.55.9(10.211.55.9:3306) Version=5.6.40-log (oldest major version between slaves) log-bin:enabled
Tue Apr 27 17:29:20 2021 - [info] GTID ON
Tue Apr 27 17:29:20 2021 - [info] Replicating from 10.211.55.12(10.211.55.12:3306)
Tue Apr 27 17:29:20 2021 - [info] Current Alive Master: 10.211.55.12(10.211.55.12:3306)
Tue Apr 27 17:29:20 2021 - [info] Checking slave configurations..
Tue Apr 27 17:29:20 2021 - [info] read_only=1 is not set on slave 10.211.55.11(10.211.55.11:3306).
Tue Apr 27 17:29:20 2021 - [info] read_only=1 is not set on slave 10.211.55.9(10.211.55.9:3306).
Tue Apr 27 17:29:20 2021 - [info] Checking replication filtering settings..
Tue Apr 27 17:29:20 2021 - [info] binlog_do_db= , binlog_ignore_db=
Tue Apr 27 17:29:20 2021 - [info] Replication filtering check ok.
Tue Apr 27 17:29:20 2021 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Tue Apr 27 17:29:20 2021 - [info] Checking SSH publickey authentication settings on the current master..
Tue Apr 27 17:29:20 2021 - [info] HealthCheck: SSH to 10.211.55.12 is reachable.
Tue Apr 27 17:29:20 2021 - [info]
10.211.55.12(10.211.55.12:3306) (current master)
+--10.211.55.11(10.211.55.11:3306)
+--10.211.55.9(10.211.55.9:3306) Tue Apr 27 17:29:20 2021 - [info] Checking replication health on 10.211.55.11..
Tue Apr 27 17:29:20 2021 - [info] ok.
Tue Apr 27 17:29:20 2021 - [info] Checking replication health on 10.211.55.9..
Tue Apr 27 17:29:20 2021 - [info] ok.
Tue Apr 27 17:29:20 2021 - [info] Checking master_ip_failover_script status:
Tue Apr 27 17:29:20 2021 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.211.55.12 --orig_master_ip=10.211.55.12 --orig_master_port=3306 IN SCRIPT TEST====/sbin/ifconfig eth1:1 down==/sbin/ifconfig eth1:1 10.211.55.77/24=== Checking the Status of the script.. OK
Tue Apr 27 17:29:20 2021 - [info] OK.
Tue Apr 27 17:29:20 2021 - [warning] shutdown_script is not defined.
Tue Apr 27 17:29:20 2021 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK.

一切OK,启动MHA

启动MHA

在MHA的manager节点,启动MHA进程

nohup masterha_manager --conf=/etc/mha/app1.conf --ignore_last_failover /var/log/mha/app1/manager.log 2>&1 &

命令参数:
--remove_dead_master_conf 该参数代表当发生主从切换后,老的主库的ip将会从配置文件中移除。
--manger_log 日志存放位置
--ignore_last_failover 在缺省情况下,如果MHA检测到连续发生宕机,且两次宕机间隔不足8小时的话,则不会进行Failover,之所以这样限制是为了避免ping-pong效应。该参数代表忽略上次MHA触发切换产生的文件,默认情况下,MHA发生切换后会在日志目录,也就是上面设置的manager_workdir目录中产生app1.failover.complete文件,下次再次切换的时候如果发现该目录下存在该文件将不允许触发切换,除非在第一次切换后收到删除该文件,为了方便,这里设置为--ignore_last_failover。 # 停止命令
masterha_stop --conf=/etc/mha/app1.conf

运行命令,检测日志

[root@chaoge_slave2 3306]# nohup masterha_manager --conf=/etc/mha/app1.conf --ignore_last_failover /var/log/mha/app1/manager.log 2>&1 &
[1] 31466
[root@chaoge_slave2 3306]# nohup: ignoring input and appending output to ‘nohup.out’ [root@chaoge_slave2 3306]# ps -ef|grep master
root 31466 25629 2 17:33 pts/0 00:00:00 perl /usr/bin/masterha_manager --conf=/etc/mha/app1.conf --ignore_last_failover /var/log/mha/app1/manager.log
root 31488 25629 0 17:33 pts/0 00:00:00 grep --color=auto master
[root@chaoge_slave2 3306]# tail -f /var/log/mha/app1/manager.log IN SCRIPT TEST====/sbin/ifconfig eth1:1 down==/sbin/ifconfig eth1:1 10.211.55.77/24=== Checking the Status of the script.. OK
Tue Apr 27 17:33:36 2021 - [info] OK.
Tue Apr 27 17:33:36 2021 - [warning] shutdown_script is not defined.
Tue Apr 27 17:33:36 2021 - [info] Set master ping interval 2 seconds.
Tue Apr 27 17:33:36 2021 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Tue Apr 27 17:33:36 2021 - [info] Starting ping health check on 10.211.55.12(10.211.55.12:3306)..
Tue Apr 27 17:33:36 2021 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..

检测MHA状态

检测mysql主从集群的状态

[root@chaoge_slave2 ~]# masterha_check_status --conf=/etc/mha/app1.conf
app1 (pid:31466) is running(0:PING_OK), master:10.211.55.12

检查VIP当前在哪,在当前的mysql-master机器

[root@mysql-server56 3306]# ifconfig  eth0:1
eth0:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.211.55.77 netmask 255.255.255.0 broadcast 10.211.55.255
ether 00:1c:42:79:bc:f3 txqueuelen 1000 (Ethernet) [root@mysql-server56 3306]#

第一次进行VIP脚本自动切换,VIP必须在master机器上

停止master主库

查看结果

  • master切换
  • VIP切换
[root@mysql-server56 3306]# /mm_data/3306/mysql_3306 stop
Stoping MySQL...

见证MHA漂移结果

最终发生了如下变化

  • MHA软件在切换后会自动停止进程
  • VIP发生漂移
  • 主从复制关系发生变化

slave01机器,主从角色发生变化

[root@chaoge_slave1 3306]# mysql -S /mm_data/3306/mysql.sock
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 28
Server version: 5.6.40-log MySQL Community Server (GPL) Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.211.55.11
Master_User: repl_chaoge
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000003
Read_Master_Log_Pos: 1929
Relay_Log_File: mysqld_3306-relay-bin.000003
Relay_Log_Pos: 1405
Relay_Master_Log_File: mysql-bin.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes

slave01机器,已经没有slave的角色,成为了主库

[root@chaoge_slave2 ~]# mysql -S /mm_data/3306/mysql.sock
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 29
Server version: 5.6.40-log MySQL Community Server (GPL) Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> show slave status\G
Empty set (0.00 sec) mysql>

MySQL高可用集群MHA方案的更多相关文章

  1. mysql高可用集群——MHA架构

    目录1.下载2.搭建mha 2.1 系统配置 2.2 架构 2.3 添加ssh公钥信任 2.4 安装mha节点 2.5 manager配置文件 2.6 检查 2.7 启动manager进程 2.8 碰 ...

  2. Mysql 高可用集群PXC

    PXC是percona公司的percona  xtraDB  cluster,简称PXC.它是基于Galera协议的高可用集群方案.可以实现多个节点间的数据同步复制以及读写,并且可保障数据库的服务高可 ...

  3. MySQL高可用集群方案

    一.Mysql高可用解决方案 方案一:共享存储 一般共享存储采用比较多的是 SAN/NAS 方案. 方案二:操作系统实时数据块复制 这个方案的典型场景是 DRBD,DRBD架构(MySQL+DRBD+ ...

  4. Mysql高可用集群环境介绍

    MySQL高可用集群环境搭建 01.MySQL高可用环境方案 02.MySQL主从复制原理 03.MySQL主从复制作用 04.在Linux环境上安装MySQL 05.在MySQL集群环境上配置主从复 ...

  5. 部署MYSQL高可用集群

                                                  mysql-day08     部署MYSQL高可用集群 u 集群架构                   ...

  6. keepalived+MySQL高可用集群

    基于keepalived搭建MySQL的高可用集群   MySQL的高可用方案一般有如下几种: keepalived+双主,MHA,MMM,Heartbeat+DRBD,PXC,Galera Clus ...

  7. MongoDB高可用集群配置方案

    原文链接:https://www.jianshu.com/p/e7e70ca7c7e5 高可用性即HA(High Availability)指的是通过尽量缩短因日常维护操作(计划)和突发的系统崩溃(非 ...

  8. MongoDB分片技术原理和高可用集群配置方案

    一.Sharding分片技术 1.分片概述 当数据量比较大的时候,我们需要把数分片运行在不同的机器中,以降低CPU.内存和Io的压力,Sharding就是数据库分片技术. MongoDB分片技术类似M ...

  9. corosync+pacemaker and drbd实现mysql高可用集群

    DRBD:Distributed Replicated Block Device 分布式复制块设备,原理图如下 DRBD 有主双架构和双主架构的,当处于主从架构时,这个设备一定只有一个节点是可以读写的 ...

随机推荐

  1. cmake错误的解决

    安装Mysql时出现:CMake Error: Error executing cmake:: LoadCache(). Aborting. 对比:/usr/bin/cmake 和 /usr/loca ...

  2. mongo数据同步的三种方案

    (一)直接复制data目录(需要停止源和目标的mongo服务)1.针对目标mongo服务已经存在,并正在运行的(mongo2-->mongo).执行步骤:(1).停止源/目标服务器的mongo服 ...

  3. 大数据Hadoop入门教程 | (二)Linux

    使用finalShell可以提供文件目录图形化 完整Linux命令整理参考大佬博客:Linux常见文件管理命令 - Mr_Walker - 博客园 Linux文件系统基础知识 Linux文件系统概念 ...

  4. C++ 遍历磁盘文件 非递归方法 和递归方法

    1: 非递归方法:  一起学习 寻找快乐 // File Name: CSearch.h #pragma once #include <vector> #include <atlst ...

  5. 基于Apache Hudi构建分析型数据湖

    为了有机地发展业务,每个组织都在迅速采用分析. 在分析过程的帮助下,产品团队正在接收来自用户的反馈,并能够以更快的速度交付新功能. 通过分析提供的对用户的更深入了解,营销团队能够调整他们的活动以针对特 ...

  6. 【AGC】如何快速部署Serverless Url缩短模板

    使用场景 Serverless短URL生成模板实现您将在云数据库服务中URL缩短的诉求.使用此模板后,仅需在云数据库服务侧配置长URL值.Serverless短URL生成模板会在后台与Bitly AP ...

  7. window桌面背景图片

    通过修改注册表项: \HKEY_CURRENT_USER\Control Panel\Desktop下的几个值,及可以将我们想要的图片设置成桌面的背景图 TileWallpaper   Wallpap ...

  8. CodeForces - 1701C

    Problem - C - Codeforces 题意: 每个位置对应一种适合的工人,适合的工人工作消耗1h,不适合2h,每个工人不能同时工作多个机器,问将所有机器工作完毕的最小时间是多少. 题解: ...

  9. 邮箱的代理发送Send as权限不生效

     邮箱的代理发送Sendas权限不生效 最近,有需求为用户添加其它邮箱的代理发送Sendas权限.在Exchange的管理单元里添加完毕后,发现没有效果,客户端提示你没有权限以用户的名义发送邮件 ...

  10. Django 使用VScode 创建工程

    一.VSCode 创建Django 工程 VSCode 官方: https://code.visualstudio.com 1 mysite(项目名),创建Django 项目,可以和虚拟环境放在同一目 ...