一、简介

MHA的逻辑是,为了保证其MySQL的高可用,会有一个StandBy状态的master.在mysql故障切换的过程中,MHA 能做到在 0~30 秒内自动完成数据库的故障切换操作,并且在进行故障切换的过程中,MHA 能最大程度的保证数据的一致性,以达到相对意义上的高可用。

1.1MHA角色

如下图,整个 MHA 架构分为

  • MHA Manager 节点
  • MHA Node 节点

    其中 MHA Manager 节点是单点部署,MHA Node 节点是部署在每个需要监控的 MySQL 集群节点上的。MHA Manager 会定时探测集群中的 Master 节点,当 Master 出现故障时,它可以自动将最新数据的 Standby Master 或 Slave 提升为新的 Master,然后将其他的 Slave 重新指向新的 Master。

二、MHA的工具

Manager节点:

  • masterha_check_ssh:MHA 依赖的 ssh 环境监测工具;
  • masterha_check_repl:MYSQL 复制环境检测工具;
  • masterga_manager:MHA 服务主程序;
  • masterha_check_status:MHA 运行状态探测工具;
  • masterha_master_monitor:MYSQL master 节点可用性监测工具;
  • masterha_master_swith:master:节点切换工具;
  • masterha_conf_host:添加或删除配置的节点;
  • masterha_stop:关闭 MHA 服务的工具。

Node节点:(这些工具通常由MHA Manager的脚本触发,无需人为操作)

  • save_binary_logs:保存和复制 master 的二进制日志;
  • apply_diff_relay_logs:识别差异的中继日志事件并应用于其他 slave;
  • purge_relay_logs:清除中继日志(不会阻塞 SQL 线程);

自定义扩展:

  • secondary_check_script:通过多条网络路由检测master的可用性;
  • master_ip_failover_script:更新application使用的masterip;
  • report_script:发送报告;
  • init_conf_load_script:加载初始配置参数;
  • master_ip_online_change_script;更新master节点ip地址。

三、MHA部署过程

3.1.1 配置

MHA 对 MYSQL 复制环境有特殊要求,例如各节点都要开启二进制日志及中继日志,各从节点必须显示启用其read-only属性,并关闭relay_log_purge功能等,这里对配置做事先说明。

3.1.2 环境规划

机器名 IP 角色 备注
manager 172.30.200.100 manager控制器 用于管理和故障切换
master 172.30.200.101 数据库主服务器 开启binlog,relay-log。关闭relay_log_purge
slave1 172.30.200.102 数据库从服务器 开启binlog,relay-log。关闭relay_log_purge
slave2 172.30.200.103 数据库从服务器 开启binlog,relay-log。关闭relay_log_purge

在各个节点的/etc/hosts文件配置内容添加如下:

[root@localhost ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.30.200.100 arpmgr
172.30.200.101 arpmaster
172.30.200.102 arpslave1
172.30.200.103 arpslave2

创建binlog的目录

mkdir -p /data/mysqldata/binlog
chown -R mysql:mysql /data/mysqldata/binlog

101节点配置:

server-id = 200101

log-bin = /data/mysqldata/binlog/mysql-bin
binlog_format= row
max_binlog_size= 512m
relay-log = /data/mysqldata/binlog/relay-bin
expire-logs-days = 14 lower_case_table_names = 1
character-set-server = utf8
log_slave_updates = 1

102节点配置:

server-id = 200102

log-bin = /data/mysqldata/binlog/mysql-bin
binlog_format= row
max_binlog_size= 512m
relay-log = /data/mysqldata/binlog/relay-bin
expire-logs-days = 14 read_only = ON
relay_log_purge = 0 lower_case_table_names = 1
character-set-server = utf8
log_slave_updates = 1

103节点配置:


server-id = 200103 log-bin = /data/mysqldata/binlog/mysql-bin
binlog_format= row
max_binlog_size= 512m
relay-log = /data/mysqldata/binlog/relay-bin
read_only = ON
relay_log_purge = 0
expire-logs-days = 14 lower_case_table_names = 1
character-set-server = utf8
log_slave_updates = 1

3.1.3 配置一主多从

master节点配置:

MariaDB [(none)]>grant replication slave,replication client on *.* to 'repl'@'172.30.200.%' identified by 'repl7101';
MariaDB [(none)]> show master status;
+------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000001 | 548 | | |
+------------------+----------+--------------+------------------+
1 row in set (0.000 sec)

slave节点配置:

grant replication slave,replication client on *.* to 'repl'@'172.30.200.%' identified by 'repl7101';

change master to master_host='172.30.200.101',
master_user='repl',
master_password='repl7101',
master_log_file='mysql-bin.000001',
master_log_pos=548; start slave;
show slave status\G;

至此,一主多从配置完毕。

3.2 MHA配置

3.2.1 master权限授予

可以在所有节点上面配置,其拥有管理权限,目前只需在master结点上设置权限:

grant all on *.* to 'mhaadmin'@'172.30.%.%' identified by 'mha7101';
grant all on *.* to 'mhaadmin'@'arpmgr' identified by 'mha7101';

3.2.2 ssh互信

四个节点都执行如下语句:

ssh-keygen -t rsa
ssh-copy-id -i .ssh/id_rsa.pub root@arpmgr

然后在arpmgr结点上面,可以看到authorized_keys文件的信息内容如下:

[root@localhost .ssh]# cat authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDY3yFhR5uzcFEpd+q+1Uw/cRF9ZRraygms7OZoefLFzY/ydSi6yYuCighG8WquvRep7XDNjFI71HAUagSoXiyPoCe1lqEnzpxSc+fQpIeQqEhUmLJ2bk+R83EskzwRGh+S/D4yp/swWz1vRgUGoTWevLCs33q7ZrsM8i+jB0uwZmzOV+CyQAPW9vLkRjZa4y1sx65lbR0HbdTQWQYZ4IyZauoU8XQjAIOs/CdLw2nBt8dPO53jT7NS7Ywx6eu/Wj9k/sYVVZT3jTb+pBIVs+Du5+tdUDX5aLKzxINpLlqNhorNevoC9iE0Ame1qvYonQfyWQ52Ae0y+58vFfG6PyV3 root@localhost.localdomain
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC/ZPihYSC6ArawKRU75aQRVSFsQ5S89SrYHGWdzyluB4spj+UDUmWH1kLGYr715/HD5hh22KdLmIs7R4jviOeao1HK52fpMvklYaNtYRHuV63Zkg5sOLvLfhrHdta9wuHlW1NyWx75+wIl2LvKBRtnSddwf5ZvitJ/kChf2gpNhHAWidyjGsPoJdr0OBCNHvz1y6oON6cnMb07ExaIjptRnkbCOU0QSVjFq4+Jmh8zTTbJC2up50s15gSfWXH0+WLXmJXJGkvgHdSYqw4vJt/l25f5qAKKZsfnyfC0iyct4GyHPF6trpvQ/c2lqr/Rg4xLWgdxlyt4aBJYl5adIRK/ root@localhost.localdomain
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDba26wV0KwQNTb4pKuiFDCcVMNRLGMXSiJC8ucN4/KIqzoOYJ747QL8GL5F8ePnRaZ1rtOwdjnlTiC0a4Tcg4JLs+JSnJgzvepuixmGgSJfLbJ36iN1WFh6fP2GZEDdR7Qum4sBUpQyYJ20Kf9rKfQQv2wq6csK5IlFk/OoO+zTySauLnYvRxvKY2avVDXPPFJvpqimKXn59MIAoJr6YEKvncbYyqvrSgUy7klZDys9IIjYcWfO7VKjQ5bwbHrrKtNbedME+KPQld7e8ZVL66Omik4Z6ip7DQEHRKWMmuBIpL99AgOOjPLbzJFWLUPOwvy3DtmEBnZ+0NVf/1obC11 root@localhost.localdomain
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCwrgZtGC31EgixeY4SVl4h64m1r8LdL3hM4Be2/I+6Xw7hCzZyKKTAFgz9W/ukfx6WmZwoqp1VO/7Jp6KO1FhYOi5u0q6J1KIObFNp+3E6cB2P0q39WqmZpQ9cNPYrbs9U2Ej0L0JwUtf/xLh334PaSlv/LcNy+p1dWya2OqsBeraiXZ4MgEBzcb+0twkpfpD327VgT/mRHPmA6fPRJOOJti1u4isHeotE4i13YIqQYfBfmbfiLdXKAvgI8FuTf0i91Re/FUBOgBfBcJbqIQNR0Nh5wZ/LvNxkstDQvypZIZwiK+wN+aZZOQ7jF/+997Z9QQleC9OOoHOJR7+fisLb root@localhost.localdomain

正好有四个形如ssh-rsa 相关 的密钥信息。

把如上的公钥信息,拷贝到其余四台服务器上面:

scp authorized_keys root@arpmaster:~/.ssh/
scp authorized_keys root@arpslave1:~/.ssh/
scp authorized_keys root@arpslave2:~/.ssh/

测试ssh是否可用

[root@localhost .ssh]# ssh arpmaster
[root@localhost ~]# ssh arpslave1
[root@localhost ~]# ssh arpslave2
[root@localhost ~]# ssh arpmgr

3.2.3 安装mha包

mha安装包分为两个,一个是node,另外一个是manager

四个节点安装:mha4mysql-node-0.57-0.el7.centos.noarch.rpm

管理节点安装:mha4mysql-manager-0.57-0.el7.centos.noarch.rpm

在安装``mha4mysql-node-0.57-0.el7.centos.noarch.rpm过程中,有对perl-DBD-mysql,perl-DBI`前置依赖,安装步骤如下:

yum install perl-DBD-mysql perl-DBI

在安装``mha4mysql-manager-0.57-0.el7.centos.noarch.rpm过程中,有对perl`前置依赖,安装步骤如下:

安装yum 扩展包
yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm yum install perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker perl-Config-Tiny perl-Log-Dispatch-* perl-Parallel-ForkManager

然后安装信息,都成功,如下:

[root@localhost ~]# rpm -ivh mha4mysql-node-0.57-0.el7.centos.noarch.rpm
准备中... ################################# [100%]
正在升级/安装...
1:mha4mysql-node-0.58-0.el7.centos ################################# [100%]
[root@localhost ~]# rpm -ivh mha4mysql-manager-0.57-0.el7.centos.noarch.rpm
准备中... ################################# [100%]
正在升级/安装...
1:mha4mysql-manager-0.58-0.el7.cent################################# [100%]

0.58中有一个super_read_only不可用在mariadb,所以使用0.57版本。

3.2.4 MHA管理节点配置

[root@localhost ~]# cd /etc/mha_master/
[root@localhost mha_master]# vi /etc/mha_master/mha.cnf

配置文件内容如下:

[server default]
user=mhaadmin
password=mha7101
manager_workdir=/etc/mha_master/app1
manager_log=/etc/mha_master/manager.log
remote_workdir=/data/mha_master/app1
repl_user=repl
repl_password=repl7101
ping_interval=1
[server1]
hostname=172.30.200.101
ssh_port=22
[server2]
hostname=172.30.200.102
ssh_port=22
candidate_master=1
[server3]
hostname=172.30.200.103
ssh_port=22
no_master=1

3.2.5 MHA节点检测

  1. 在管理节点检测ssh连通性如下:

    [root@localhost ~]# masterha_check_ssh -conf=/etc/mha_master/mha.cnf
    
    

    有如下日志,代表正常:

    Thu Jan  9 14:43:09 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
    Thu Jan 9 14:43:09 2020 - [info] Reading application default configuration from /etc/mha_master/mha.cnf..
    root@172.30.200.103(172.30.200.103:22) to root@172.30.200.101(172.30.200.101:22)..
    Thu Jan 9 14:43:11 2020 - [debug] ok.
    Thu Jan 9 14:43:11 2020 - [debug] Connecting via SSH from root@172.30.200.103(172.30.200.103:22) to root@172.30.200.102(172.30.200.102:22)..
    Thu Jan 9 14:43:11 2020 - [debug] ok.
    Thu Jan 9 14:43:12 2020 - [info] All SSH connection tests passed successfully.
  2. 检测MySQL replication是否正常

    masterha_check_repl --conf=/etc/mha_master/mha.cnf

    有如下日志,说明正常:

    Thu Jan  9 14:44:54 2020 - [info] Slaves settings check done.
    Thu Jan 9 14:44:54 2020 - [info]
    172.30.200.101(172.30.200.101:3306) (current master)
    +--172.30.200.102(172.30.200.102:3306)
    +--172.30.200.103(172.30.200.103:3306) Thu Jan 9 14:44:54 2020 - [info] Checking replication health on 172.30.200.102..
    Thu Jan 9 14:44:54 2020 - [info] ok.
    Thu Jan 9 14:44:54 2020 - [info] Checking replication health on 172.30.200.103..
    Thu Jan 9 14:44:54 2020 - [info] ok.
    Thu Jan 9 14:44:54 2020 - [warning] master_ip_failover_script is not defined.
    Thu Jan 9 14:44:54 2020 - [warning] shutdown_script is not defined.
    Thu Jan 9 14:44:54 2020 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK.

3.2.6 MHA启动

  1. 启动mha manager:

    nohup masterha_manager --conf=/etc/mha_master/mha.cnf &> /etc/mha_master/manager.log &
  2. 检测master节点状态:

    [root@localhost ~]# masterha_check_status --conf=/etc/mha_master/mha.cnf
    mha (pid:31709) is running(0:PING_OK), master:172.30.200.101

    说明主数据库172.30.200.101启动正常。

  3. 关闭mha manager:

    masterha_stop -conf=/etc/mha_master/mha.cnf

3.2.7 MHA模拟故障

  1. master直接kill mysql节点

    [root@localhost ~]# ps -ef |grep mysql
    root 19864 1 0 08:51 ? 00:00:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --datadir=/data/mysqldata --pid-file=/data/mysqldata/localhost.localdomain.pid
    mysql 19976 19864 0 08:51 ? 00:00:13 /usr/local/mysql/bin/mysqld --basedir=/usr/local/mysql --datadir=/data/mysqldata --plugin-dir=/usr/local/mysql/lib/plugin --user=mysql --log-error=/data/mysqldata/mysqld.log --pid-file=/data/mysqldata/localhost.localdomain.pid --socket=/tmp/mysql.sock
    root 22166 21525 0 14:55 pts/0 00:00:00 grep --color=auto mysql
    [root@localhost ~]# kill -9 19864 19976
  2. MHA转移日志。

    [root@localhost ~]# tail -f /etc/mha_master/manager.log
    
    From:
    172.30.200.101(172.30.200.101:3306) (current master)
    +--172.30.200.102(172.30.200.102:3306)
    +--172.30.200.103(172.30.200.103:3306) To:
    172.30.200.102(172.30.200.102:3306) (new master)
    +--172.30.200.103(172.30.200.103:3306) Master 172.30.200.101(172.30.200.101:3306) is down! Check MHA Manager logs at localhost.localdomain:/etc/mha_master/manager.log for details. Started automated(non-interactive) failover.
    The latest slave 172.30.200.102(172.30.200.102:3306) has all relay logs for recovery.
    Selected 172.30.200.102(172.30.200.102:3306) as a new master.
    172.30.200.102(172.30.200.102:3306): OK: Applying all logs succeeded.
    172.30.200.103(172.30.200.103:3306): This host has the latest relay log events.
    Generating relay diff files from the latest slave succeeded.
    172.30.200.103(172.30.200.103:3306): OK: Applying all logs succeeded. Slave started, replicating from 172.30.200.102(172.30.200.102:3306)
    172.30.200.102(172.30.200.102:3306): Resetting slave info succeeded.
    Master failover to 172.30.200.102(172.30.200.102:3306) completed successfully.

    从上述日志来看,172.30.200.102已经成为了新的master,而172.30.200.103还是slave数据库。

3.2.8 修复已损坏的节点加入MHA中

由于这里是实验环境,可以不到处mysqldump的备份。如果是生产环境恢复,可以停掉slave的SQL thread,记住对应的pos的位置,然后备份出数据,保证数据一致性之后,同步数据,恢复损坏的结点。

change master to master_host='172.30.200.102',
master_user='repl',
master_password='repl7101',
master_log_file='mysql-bin.000003',
master_log_pos=401;

查看slave状态:

MariaDB [(none)]> start slave;

MariaDB [(none)]> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 172.30.200.102
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000003
Read_Master_Log_Pos: 401
Relay_Log_File: relay-bin.000002
Relay_Log_Pos: 555
Relay_Master_Log_File: mysql-bin.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes

再次启动,如下:

[root@localhost ~]# nohup masterha_manager --conf=/etc/mha_master/mha.cnf &> /etc/mha_master/manager.log &
[root@localhost ~]# masterha_check_status --conf=/etc/mha_master/mha.cnf
mha (pid:31905) is running(0:PING_OK), master:172.30.200.101

至此,MHA实验完毕。由于生产环境会用到VIP,后续会继续编写。

MHA 问题集锦

错误一

日志错误:

Thu Jan  9 11:31:36 2020 - [info]   Connecting to root@172.30.200.102(172.30.200.102:22)..
Can't exec "mysqlbinlog": 没有那个文件或目录 at /usr/share/perl5/vendor_perl/MHA/BinlogManager.pm line 106.
mysqlbinlog version command failed with rc 1:0, please verify PATH, LD_LIBRARY_PATH, and client options

解决方法:

ln -s /usr/local/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog

错误二

日志错误:

Checking if super_read_only is defined and turned on..DBI connect(';host=172.30.200.102;port=3306','mhaadmin',...) failed: Access denied for user 'mhaadmin'@'arpslave1' (using password: YES) at /usr/share/perl5/vendor_perl/MHA/SlaveUtil.pm line 239

解决方法:

manager节点,执行:

grant all on *.* to 'mhaadmin'@'arpmgr' identified by 'mha7101';
grant all on *.* to 'mhaadmin'@'arpmaster' identified by 'mha7101';
grant all on *.* to 'mhaadmin'@'arpslave1' identified by 'mha7101';
grant all on *.* to 'mhaadmin'@'arpslave2' identified by 'mha7101';

错误三

日志如下:

    Testing mysql connection and privileges..sh: mysql: 未找到命令
mysql command failed with rc 127:0!

解决方法:

ln -s /usr/local/mysql/bin/mysql /usr/local/bin/mysql

[MariaDB]MHA高可用部署-实验的更多相关文章

  1. Rancher Server HA的高可用部署实验-学习笔记

    转载于https://blog.csdn.net/csdn_duomaomao/article/details/78771731 Rancher Server HA的高可用部署实验-学习笔记 一.机器 ...

  2. MySQL MHA 高可用集群部署及故障切换

    MySQL MHA 高可用集群部署及故障切换 1.概念 2.搭建MySQL + MHA 1.概念: a)MHA概念 : MHA(MasterHigh Availability)是一套优秀的MySQL高 ...

  3. MHA 高可用集群搭建(二)

    MHA 高可用集群搭建安装scp远程控制http://www.cnblogs.com/kevingrace/p/5662839.html yum install openssh-clients mys ...

  4. MHA高可用架构与Atlas读写分离

    1.1 MHA简介 1.1.1 MHA软件介绍 MHA(Master High Availability)目前在MySQL高可用方面是一个相对成熟的解决方案,它由日本DeNA公司youshimaton ...

  5. Mysql MHA高可用集群架构

    ** 记得之前发过一篇文章,名字叫<浅析MySQL高可用架构>,之后一直有很多小伙伴在公众号后台或其它渠道问我,何时有相关的深入配置管理文章出来,因此,民工哥,也将对前面的各类架构逐一进行 ...

  6. MHA高可用 MHA+Keepalive

    MHA高可用 MHA简介 MHA(Master High Availability)目前在MySQL高可用方面是一个相对成熟的解决方案,它由日本DeNA公司youshimaton(现就职于Facebo ...

  7. Keepalived+LVS实现LNMP网站的高可用部署

    Keepalived+LVS实现LNMP网站的高可用部署   项目需求   当我们访问某个网站的时候可以在浏览器中输入IP或者域名链接到Web Server进行访问,如果这个Web Server挂了, ...

  8. MHA高可用配置及故障切换

    MHA高可用配置及故障切换 目录 MHA高可用配置及故障切换 一.案例概述 二.案例前置知识点 1. MHA概述 2. MHA的组成 (1)MHA Manager(管理节点) (2)MHA Node( ...

  9. kubernetes 1.15.1 高可用部署 -- 从零开始

    这是一本书!!! 一本写我在容器生态圈的所学!!! 重点先知: 1. centos 7.6安装优化 2. k8s 1.15.1 高可用部署 3. 网络插件calico 4. dashboard 插件 ...

随机推荐

  1. H3C路由器Telnet服务配置命令(续)

  2. .net core 文件下载

    public IActionResult Dowanload(string id,int numTypes) { try { var memory = new MemoryStream(): //mo ...

  3. supersocket Silverlight 策略服务器

    <?xml version="1.0"?> <configuration> <configSections> <section name= ...

  4. window10+python3.7安装tensorflow--gpu tensorflow 安装

    能安装GPU的前提是:1.显卡支持CUDA (1)右击我的电脑–属性 (2)打开设备管理器   (3)显示适配器 我的电脑是支持CUDA的 2.pip 版本 >= 8.1查看pip版本 :pip ...

  5. WebService 基础知识点和用Postman调试

    阅读连接:Retrofit 用Soap协议访问WebService 详解 参考 1.java发HTTP POST请求(内容为xml格式) 2. android解析XML总结(SAX.Pull.Dom三 ...

  6. Eclipse设置默认编码为UTF-8

    需要设置的几处地方为: Window->Preferences->General ->Content Type->Text->JSP 最下面设置为UTF-8 Window ...

  7. 高级教程: 作出动态决策和 Bi-LSTM CRF 重点

    动态 VS 静态深度学习工具集 Pytorch 是一个 动态 神经网络工具包. 另一个动态工具包的例子是 Dynet (我之所以提这个是因为使用 Pytorch 和 Dynet 是十分类似的. 如果你 ...

  8. 后缀数组 + LCP加速多模式匹配算法 O(m+logn)

    #include <iostream> #include <cstdio> #include <cstring> #include <algorithm> ...

  9. Python--day32--复习:https和http的区别;黏包;黏包问题的解决方式;

    1,https和http的区别: https比较安全,传输的时候先对内容进行加密,收到后再进行解密:它的传输内容不容易拦截,就算拦截下来了,也是加密的,看不懂.但是要买证书,一年要好几万,小公司承担不 ...

  10. multipart/form-data与application/octet-stream的区别、application/x-www-form-urlencoded

    情景再现: 上传文件到.net(wcf)后台时,总是上传不成功,后台要求Content-Type是application/octet-stream,我使用multipart/form-data方式,c ...