Mysql 5.6 MHA (gtid) on Kylin
mha on Kylin
ip hostname repl role mha role
192.168.19.69 mysql1 master node
192.168.19.73 mysql2 slave node
192.168.19.74 mysql3 slave node/manager
mysql version MySQL Community Server 5.6.15
1 系统配置
[root@localhost ~]# cat /etc/issue
\nKylin 3.3
Kernel \r on an \m
\KRelease (Trial)
[root@localhost ~]# cat /etc/redhat-release
Kylin Linux release 3.3.1707 (Core)
[root@localhost ~]# sysctl -a |grep swap
vm.swappiness = 30
[root@localhost ~]# vim /etc/sysctl.conf
vm.swappiness = 10
[root@localhost ~]# sysctl -p
vm.swappiness = 10
[root@localhost ~]# sysctl -a |grep swap
vm.swappiness = 10
[root@localhost ~]# cat /sys/block/sda/queue/rotational
1
[root@localhost ~]# cat /sys/block/sda/queue/scheduler
noop [deadline] cfq
[root@localhost ~]# vim /etc/profile
HISTTIMEFORMAT="%Y:%M:%D %H-%m-%s"
export=HISTTIMEFORMAT
[root@localhost ~]# date
Thu Jul 12 03:09:27 EDT 2018
[root@localhost ~]# hostname
localhost.localdomain
[root@localhost ~]# vim /etc/hosts
127.0.0.1 mysql1
[root@localhost ~]# vim /etc/sysconfig/network
HOSTNAME=mysql1
[root@localhost ~]# getenforce
Enforcing
[root@localhost ~]# vim /etc/selinux/config
SELINUX=disabled
[root@localhost ~]# systemctl stop firewalld.service
[root@localhost ~]# systemctl disable firewalld.service
[root@localhost ~]# reboot
[root@localhost ~]# hostnamectl set-hostname mysql1
[root@localhost ~]# hostname
mysql1
[root@mysql1 ~]# systemctl status firewalld.service
--# vim /etc/hosts
192.168.19.69 mysql1
192.168.19.73 mysql2
192.168.19.74 mysql3
2 安装mysql
[root@mysql1 ~]# groupadd mysql
[root@mysql1 ~]# useradd -g mysql mysql
[root@mysql1 ~]# passwd mysql
[root@mysql1 ~]# ulimit -n
1024
[root@mysql1 ~]# vim /etc/security/limits.conf
mysql soft nproc 65535
mysql hard nproc 65535
mysql soft nofile 65535
mysql hard nofile 65535
[root@mysql1 ~]# ulimit -n 65535
[root@mysql1 ~]# vim /home/mysql/.bash_profile
export LANG=en_US.UTF-8
export PATH=/usr/local/mysql/bin:$PATH
export MYSQL_PS1="(\u@\h:\p) [\d]> "
yum安装相应的依赖包
[root@mysql1 ~]# yum -y install lrzsz
[root@mysql1 ~]# yum install -y gcc gcc-* make cmake gcc-c++ libaio libaio-devel bison bison-devel autoconf automake zlib* libxml* ncurses-devel libmcrypt* libtool-ltdl-devel* --skip-broken
[root@mysql1 ~]# yum install -y openssl openssl-devel ncurses ncurses-devel
[root@mysql1 ~]# mkdir -p /data/mysqldata/{3306/{data,tmp,binlog,slave,log/iblog},backup,scripts}
[root@mysql1 soft]# yum list|grep mysql
Repository gcc-4.9 is listed more than once in the configuration
Repository other is listed more than once in the configuration
mysql-community-client.x86_64 5.6.15-4.ky3 other
mysql-community-common.x86_64 5.6.15-4.ky3 other
mysql-community-devel.x86_64 5.6.15-4.ky3 other
mysql-community-embedded.x86_64 5.6.15-4.ky3 other
mysql-community-embedded-devel.x86_64
mysql-community-libs.x86_64 5.6.15-4.ky3 other
mysql-community-server.x86_64 5.6.15-4.ky3 other
mysql-community-test.x86_64 5.6.15-4.ky3 other
mysql-connector-odbc.x86_64 5.2.5-6.ky3.kb2 base
[root@mysql1 soft]# tar -zxvf mysql-5.6.15-linux-glibc2.5-x86_64.tar.gz -C /usr/local/
[root@mysql1 soft]# cd /usr/local/
[root@mysql1 local]# ln -s mysql-5.6.15-linux-glibc2.5-x86_64 mysql
[root@mysql1 local]# chown -R mysql:mysql /usr/local/mysql/
[root@mysql1 local]# chown -R mysql:mysql /data/mysqldata/
--my.cnf
[mysql@mysql1 3306]$ /usr/local/mysql/scripts/mysql_install_db --defaults-file=/data/mysqldata/3306/my.cnf --datadir=/data/mysqldata/3306/data --basedir=/usr/local/mysql --user=mysql
[mysql@mysql1 3306]$ /usr/local/mysql/bin/mysqld_safe --defaults-file=/data/mysqldata/3306/my.cnf &
[mysql@mysql1 3306]$ ps -ef|grep mysql
root 2673 2652 0 04:08 pts/1 00:00:00 su - mysql
mysql 2674 2673 0 04:08 pts/1 00:00:00 -bash
mysql 2880 2674 0 04:15 pts/1 00:00:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --defaults-file=/data/mysqldata/3306/my.cnf
mysql 3865 2880 11 04:15 pts/1 00:00:00 /usr/local/mysql/bin/mysqld --defaults-file=/data/mysqldata/3306/my.cnf --basedir=/usr/local/mysql --datadir=/data/mysqldata/3306/data --plugin-dir=/usr/local/mysql/lib/plugin --log-error=/data/mysqldata/3306/log/mysql-error.log --open-files-limit=65535 --pid-file=/data/mysqldata/3306/data/mysql1.pid --socket=/data/mysqldata/3306/mysql.sock --port=3306
mysql 3900 2674 0 04:15 pts/1 00:00:00 ps -ef
mysql 3901 2674 0 04:15 pts/1 00:00:00 grep --color=auto mysql
[mysql@mysql1 3306]$ mysql -S /data/mysqldata/3306/mysql.sock
(root@localhost:mysql.sock) [(none)]> select user,host,password from mysql.user;
+------+-----------+----------+
| user | host | password |
+------+-----------+----------+
| root | localhost | |
| root | mysql1 | |
| root | 127.0.0.1 | |
| root | ::1 | |
| | localhost | |
| | mysql1 | |
+------+-----------+----------+
(root@localhost:mysql.sock) [(none)]> delete from mysql.user where (user,host) not in(select 'root','localhost');
Query OK, 5 rows affected (0.01 sec)
(root@localhost:mysql.sock) [(none)]> select user,host,password from mysql.user;
+------+-----------+----------+
| user | host | password |
+------+-----------+----------+
| root | localhost | |
+------+-----------+----------+
1 row in set (0.00 sec)
(root@localhost:mysql.sock) [(none)]> update mysql.user set password=password('***');
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
(root@localhost:mysql.sock) [(none)]> flush privileges;
Query OK, 0 rows affected (0.00 sec)
(root@localhost:mysql.sock) [(none)]> truncate table mysql.db;
Query OK, 0 rows affected (0.00 sec)
(root@localhost:mysql.sock) [(none)]> grant ALL PRIVILEGES ON *.* TO 'system'@'%' IDENTIFIED BY "***";
Query OK, 0 rows affected (0.00 sec)
(root@localhost:mysql.sock) [(none)]> grant ALL PRIVILEGES ON *.* TO 'system'@'127.0.0.1' IDENTIFIED BY "***";
Query OK, 0 rows affected (0.00 sec)
(root@localhost:mysql.sock) [(none)]> flush privileges;
Query OK, 0 rows affected (0.00 sec)
(root@localhost:mysql.sock) [(none)]> select user,host,password from mysql.user;
+--------+-----------+-------------------------------------------+
| user | host | password |
+--------+-----------+-------------------------------------------+
| root | localhost | *E74858DB86EBA20BC33D0AECAE8A8108C56B17FA |
| system | 127.0.0.1 | *E74858DB86EBA20BC33D0AECAE8A8108C56B17FA |
| system | % | *E74858DB86EBA20BC33D0AECAE8A8108C56B17FA |
+--------+-----------+-------------------------------------------+
[mysql@mysql1 3306]$ mysql -uroot -p -S /data/mysqldata/3306/mysql.sock
Enter password:
--生产环境严格用户权限,密码控制
grant select ,insert ,update,delete on test.* to 'test_prog'@'%'identified by "***" ;
flush privileges;
show grants for test_prog@'%';
revoke insert ,update,delete on test.* from test_prog@'%';
grant insert ,update,delete on test.* to 'test_prog'@'%';
3 主从搭建
192.168.19.69 master
192.168.19.73 slave1
192.168.19.74 slave2
69
(root@localhost:mysql.sock) [(none)]> GRANT REPLICATION SLAVE,RELOAD,SUPER ON *.* to repl@'192.168.%' identified by "***";
--mysqldump
/usr/local/mysql/bin/mysqldump -h 127.0.0.1 -usystem -p'mysql' -A -R -S /data/mysqldata/3306/mysql.sock \
--single-transaction --default-character-set=utf8 --master-data=2 --set-gtid-purged=OFF | gzip > /data/mysqldata/backup/mysqldumpfullbk_20180712.sql.gz
scp mysqldumpfullbk_20180712.sql mysql@192.168.19.73:/data/mysqldata/backup/.
scp mysqldumpfullbk_20180712.sql mysql@192.168.19.74:/data/mysqldata/backup/.
--mysql -usystem -pmysql -h127.0.0.1 -P3306 <mysqldumpfullbk_20180712.sql
(system@127.0.0.1:3306) [(none)]> show global variables like '%gtid%';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| enforce_gtid_consistency | ON |
| gtid_executed | |
| gtid_mode | ON |
| gtid_owned | |
| gtid_purged | |
+--------------------------+-------+
5 rows in set (0.00 sec)
(system@127.0.0.1:3306) [(none)]> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000001 | 151 | | | |
+------------------+----------+--------------+------------------+-------------------+
73 (system@127.0.0.1:3306) [(none)]> show master status;
+------------------+-----------+--------------+------------------+--------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+-----------+--------------+------------------+--------------------------------------------+
| mysql-bin.000001 | 191447949 | | | e68afff3-85b5-11e8-856e-005056bff00b:1-362 |
+------------------+-----------+--------------+------------------+--------------------------------------------+
1 row in set (0.00 sec)
(system@127.0.0.1:3306) [(none)]> show global variables like '%gtid_executed%';
+---------------+--------------------------------------------+
| Variable_name | Value |
+---------------+--------------------------------------------+
| gtid_executed | e68afff3-85b5-11e8-856e-005056bff00b:1-362 |
+---------------+--------------------------------------------+
[system@3306][(none)]> show global variables like '%gtid%';
+--------------------------+--------------------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------------------+
| enforce_gtid_consistency | ON |
| gtid_executed | d4e95eb8-85ae-11e8-8540-000c29450242:1-362 |
| gtid_mode | ON |
| gtid_owned | |
| gtid_purged | |
+--------------------------+--------------------------------------------+
show global variables like '%gtid_executed%';
reset master;
CHANGE MASTER TO
MASTER_HOST='192.168.19.69',
MASTER_USER='repl',
MASTER_PASSWORD='***',
MASTER_PORT=3306,
MASTER_AUTO_POSITION = 1;
74 (system@127.0.0.1:3306) [(none)]> show master status;
+------------------+-----------+--------------+------------------+--------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+-----------+--------------+------------------+--------------------------------------------+
| mysql-bin.000001 | 191447949 | | | e68afff3-85b5-11e8-856e-005056bff00b:1-362 |
+------------------+-----------+--------------+------------------+--------------------------------------------+
1 row in set (0.00 sec)
(system@127.0.0.1:3306) [(none)]> show global variables like 'server_uuid';
+---------------+--------------------------------------+
| Variable_name | Value |
+---------------+--------------------------------------+
| server_uuid | e68afff3-85b5-11e8-856e-005056bff00b |
(system@127.0.0.1:3306) [(none)]> show global variables like '%gtid_executed%';
+---------------+--------------------------------------------+
| Variable_name | Value |
+---------------+--------------------------------------------+
| gtid_executed | e68afff3-85b5-11e8-856e-005056bff00b:1-362 |
+---------------+--------------------------------------------+
(system@127.0.0.1:3306) [mysql]> show global variables like '%gtid%';
+--------------------------+--------------------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------------------+
| enforce_gtid_consistency | ON |
| gtid_executed | e68afff3-85b5-11e8-856e-005056bff00b:1-362 |
| gtid_mode | ON |
| gtid_owned | |
[system@3306][(none)]> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.19.69
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 558
Relay_Log_File: relaylog.000002
Relay_Log_Pos: 768
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 558
Relay_Log_Space: 965
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 201807121
Master_UUID: cab5ec92-85ab-11e8-852c-005056bf5092
Master_Info_File: /data/mysqldata/3306/data/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: cab5ec92-85ab-11e8-852c-005056bf5092:1-2
Executed_Gtid_Set: cab5ec92-85ab-11e8-852c-005056bf5092:1-2
Auto_Position: 1
69 master
[mysql@mysql1 scripts]$ ./mysqlplus.sh
use test;
create table t1(id int,name varchar(10));
insert into t1 values(1,'AAAAA');
commit;
select * from test.t1;
use test;
create table t2(id int,name varchar(10));
insert into t2 values(1,'AAAAA');
commit;
select * from test.t2;
69 (system@127.0.0.1:3306) [test]> show master status;
+------------------+----------+--------------+------------------+------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+------------------------------------------+
| mysql-bin.000001 | 558 | | | cab5ec92-85ab-11e8-852c-005056bf5092:1-2 |
+------------------+----------+--------------+------------------+------------------------------------------+
(system@127.0.0.1:3306) [test]> show global variables like '%gtid%';
+--------------------------+------------------------------------------+
| Variable_name | Value |
+--------------------------+------------------------------------------+
| enforce_gtid_consistency | ON |
| gtid_executed | cab5ec92-85ab-11e8-852c-005056bf5092:1-2 |
| gtid_mode | ON |
| gtid_owned | |
| gtid_purged | |
+--------------------------+------------------------------------------+
73,74
(system@127.0.0.1:3306) [test]> show global variables like '%gtid%';
+--------------------------+------------------------------------------+
| Variable_name | Value |
+--------------------------+------------------------------------------+
| enforce_gtid_consistency | ON |
| gtid_executed | cab5ec92-85ab-11e8-852c-005056bf5092:1-2 |
| gtid_mode | ON |
| gtid_owned | |
| gtid_purged | |
+--------------------------+------------------------------------------+
4 mha 搭建
4.1 设置SSH公钥免密码登录
MHA环境需要三台主机需要相互信任:实现三台主机之间相互免密钥登录。
要是root登陆的话,需要修改ssh的配置文件
vim /etc/ssh/sshd_config 中的 PermitRootLogin ,设置为YES,再重启SSH
#root用户操作(/root),pwd
mkdir ~/.ssh
chmod 700 ~/.ssh
cd ~/.ssh
ssh-keygen -t rsa #一直回车
ls -al
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ls -al
chmod 600 ~/.ssh/authorized_keys
cd ..
scp -r .ssh 192.168.19.73:/root/
scp -r .ssh 192.168.19.74:/root/
ssh mysql1
ssh mysql2
ssh mysql3
[root@mysql1 ~]# ssh mysql2 #这里不用输入密码,这打通了ssh
Last login: Thu Jul 12 05:55:46 2018 from mysql1
[root@mysql2 ~]# logout
Connection to mysql2 closed.
[root@mysql1 ~]# ssh mysql3
[root@mysql3 ~]# logout
Connection to mysql3 closed.
4.2 安装mha manager和node节点
软件下载路径
[root@mysql1 ~]# cd /data/soft/
[root@mysql1 soft]# ll
total 297404
-rw-r--r-- 1 root root 104700 Jun 8 02:34 mha4mysql-manager-0.56.tar.gz
-rw-r--r-- 1 root root 45947 Jun 8 02:34 mha4mysql-node-0.56.tar.gz
在三个节点(node 和 manager)安装perl-DBD-MySQL,用光盘作yum源,#包括了3个包:Perl-DBD-MySQL,mysql-libs,perl-DBI
[root@mysql1 soft]# yum install -y perl-DBD-MySQL perl-DBI mysql-libs
[root@mysql2 soft]# yum install -y perl-DBD-MySQL perl-DBI mysql-libs
[root@mysql3 soft]# yum install -y perl-DBD-MySQL perl-DBI mysql-libs
# yum install perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker -y
在三个节点安装node(包括管理节点)
tar xzvf mha4mysql-node-0.56.tar.gz
cd mha4mysql-node-0.56
perl Makefile.PL
make && make install
[root@mysql1 mha4mysql-node-0.56]# perl Makefile.PL
*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
[Core Features]
- DBI ...loaded. (1.627)
- DBD::mysql ...loaded. (4.023)
*** Module::AutoInstall configuration finished.
Checking if your kit is complete...
Looks good
Writing Makefile for mha4mysql::node
在管理节点安装manager
#yum install -y perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager \
perl-Config-IniFiles rrdtool perl-rrdtool rrdtool-devel perl-Params-Validate perl-Time-HiRes \
perl-MIME-Types perl-Email-Date-Format perl-Mail-Sendmail perl-Mail-Sender perl-TimeDate\
perl-MailTools perl-MIME-Lite
# rpm -q perl-CPAN
# yum install -y perl-CPAN
# yum install -y 'perl(CGI)' 'perl(LWP::UserAgent)'
[root@mysql3 soft]# rpm -ivh perl-Log-Dispatch-2.41-1.el7.1.noarch.rpm --nodeps
[root@mysql3 soft]# rpm -ivh perl-Parallel-ForkManager-1.05-1.el7.noarch.rpm
tar xzvf mha4mysql-manager-0.56.tar.gz
cd mha4mysql-manager-0.56
perl Makefile.PL
make && make install
[root@mysql3 mha4mysql-manager-0.56]# perl Makefile.PL
*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
[Core Features]
- DBI ...loaded. (1.627)
- DBD::mysql ...loaded. (4.023)
- Time::HiRes ...loaded. (1.9725)
- Config::Tiny ...loaded. (2.23)
- Log::Dispatch ...missing.
- Parallel::ForkManager ...loaded. (1.05)
- MHA::NodeConst ...loaded. (0.56)
==> Auto-install the 1 mandatory module(s) from CPAN? [y] y
*** Dependencies will be installed the next time you type 'make'.
*** Module::AutoInstall configuration finished.
Checking if your kit is complete...
Looks good
Writing Makefile for mha4mysql::manager
[root@mysql3 mha4mysql-manager-0.56]# which masterha_check_ssh
/usr/local/bin/masterha_check_ssh
[root@mysql3 ~]# ll /usr/local/bin/masterha_*
-r-xr-xr-x 1 root root 1995 Jul 12 06:22 /usr/local/bin/masterha_check_repl
-r-xr-xr-x 1 root root 1779 Jul 12 06:22 /usr/local/bin/masterha_check_ssh
-r-xr-xr-x 1 root root 1865 Jul 12 06:22 /usr/local/bin/masterha_check_status
-r-xr-xr-x 1 root root 3201 Jul 12 06:22 /usr/local/bin/masterha_conf_host
-r-xr-xr-x 1 root root 2517 Jul 12 06:22 /usr/local/bin/masterha_manager
-r-xr-xr-x 1 root root 2165 Jul 12 06:22 /usr/local/bin/masterha_master_monitor
-r-xr-xr-x 1 root root 2373 Jul 12 06:22 /usr/local/bin/masterha_master_switch
-r-xr-xr-x 1 root root 3879 Jul 12 06:22 /usr/local/bin/masterha_secondary_check
-r-xr-xr-x 1 root root 1739 Jul 12 06:22 /usr/local/bin/masterha_stop
===
rpm -ivh perl-Mail-Sender-0.8.21-2.el7.noarch.rpm
rpm -ivh perl-Mail-Sendmail-0.79-21.el7.noarch.rpm
rpm -ivh perl-MIME-Types-1.38-2.el7.noarch.rpm
rpm -ivh perl-TimeDate-2.30-2.el7.noarch.rp
#Can't locate Class/Load.pm in @INC
rpm -ivh perl-Module-Runtime-0.013-4.el7.noarch.rpm
rpm -ivh perl-Config-Tiny-2.14-7.el7.noarch.rpm
rpm -ivh perl-Try-Tiny-0.12-2.el7.noarch.rpm
rpm -ivh perl-Module-Implementation-0.06-6.el7.noarch.rpm
rpm -ivh perl-Sub-Install-0.926-6.el7.noarch.rpm
rpm -ivh perl-Params-Util-1.07-6.el7.x86_64.rpm
rpm -ivh perl-Data-OptList-0.107-9.el7.noarch.rpm
rpm -ivh perl-Package-Stash-XS-0.26-3.el7.x86_64.rpm
rpm -ivh perl-List-MoreUtils-0.33-9.el7.x86_64.rpm
rpm -ivh perl-Package-DeprecationManager-0.13-7.el7.noarch.rpm
rpm -ivh perl-Package-Stash-0.34-2.el7.noarch.rpm
rpm -ivh perl-Class-Load-0.20-3.el7.noarch.rpm
#Can't locate Params/Validate.pm
rpm -ivh perl-Params-Validate-1.08-4.el7.x86_64.rpm
===
管理节点安装完毕后就应该去编辑主配文件了
[root@mysql3 soft]# mkdir -p /data/mha/etc
[root@mysql3 soft]# mkdir -p /data/mha/etc/app
[root@mysql3 soft]# mkdir -p /data/mha/log
[root@mysql3 soft]# cd /data/mha/etc/
[root@mysql3 etc]# touch app.cnf
[root@mysql3 etc]# chmod +x master_ip_failover
====
master_ip_failover #自动切换时vip管理的脚本,不是必须,如果我们使用keepalived的,我们可以自己编写脚本完成对vip的管理,比如监控mysql,如果mysql异常,我们停止keepalived就行,这样vip就会自动漂移
master_ip_online_change #在线切换时vip的管理,不是必须,同样可以可以自行编写简单的shell完成
power_manager #故障发生后关闭主机的脚本,不是必须
send_report #因故障切换后发送报警的脚本,不是必须,可自行编写简单的shell完成
masterha_secondary_check 一旦MHA到master的监控之间出现问题,MHA Manager将会判断其它两个slave是否能建立到master_ip 3306端口的连接
====
####################常用命令###############
/usr/local/bin/masterha_check_ssh --conf=/data/mha/etc/app.cnf
/usr/local/bin/masterha_check_repl --conf=/data/mha/etc/app.cnf
/usr/local/bin/masterha_manager --conf=/data/mha/etc/app.cnf &
/usr/local/bin/masterha_check_status --conf=/data/mha/etc/app.cnf
/usr/local/bin/masterha_stop --conf=/data/mha/etc/app.cnf
nohup /usr/local/bin/masterha_manager --conf=/data/mha/etc/app.cnf --remove_dead_master_conf --ignore_last_failover > /data/mha/log/manager.log < /dev/null 2>&1 &
====
/usr/local/bin/save_binary_logs
show slave status;master_log_file, master_log_pos
mysqlbinlog save_binlog_xxx,
1.偿试读取故障的master日志
2.找binlog server上的日志差异(56引入binlog_server)
3.去slave上找relay-log
==把所有的slave数据补成一样的
/usr/local/bin/apply_diff_relay_logs
[root@mysql3 etc]# /usr/local/bin/masterha_check_ssh --conf=/data/mha/etc/app.cnf
Can't locate Class/Load.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usr/share/perl5/vendor_perl/Log/Dispatch.pm line 12.
BEGIN failed--compilation aborted at /usr/share/perl5/vendor_perl/Log/Dispatch.pm line 12.
Compilation failed in require at /usr/local/share/perl5/MHA/SSHCheck.pm line 29.
BEGIN failed--compilation aborted at /usr/local/share/perl5/MHA/SSHCheck.pm line 29.
Compilation failed in require at /usr/local/bin/masterha_check_ssh line 25.
BEGIN failed--compilation aborted at /usr/local/bin/masterha_check_ssh line 25.
[root@mysql3 soft]# /usr/local/bin/masterha_check_ssh --conf=/data/mha/etc/app.cnf
Can't locate Params/Validate.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usr/share/perl5/vendor_perl/Log/Dispatch.pm line 13.
BEGIN failed--compilation aborted at /usr/share/perl5/vendor_perl/Log/Dispatch.pm line 13.
Compilation failed in require at /usr/local/share/perl5/MHA/SSHCheck.pm line 29.
BEGIN failed--compilation aborted at /usr/local/share/perl5/MHA/SSHCheck.pm line 29.
Compilation failed in require at /usr/local/bin/masterha_check_ssh line 25.
BEGIN failed--compilation aborted at /usr/local/bin/masterha_check_ssh line 25.
[root@mysql3 soft]# /usr/local/bin/masterha_check_ssh --conf=/data/mha/etc/app.cnf
Thu Jul 12 22:02:59 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
/data/mha/etc/app.cnf:
at /usr/local/share/perl5/MHA/SSHCheck.pm line 148.
==详细检查,是配置文件中的[server3]少了]。。。。。。
==ssh检查成功
[root@mysql3 etc]# /usr/local/bin/masterha_check_ssh --conf=/data/mha/etc/app.cnf
Thu Jul 12 22:15:11 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Jul 12 22:15:11 2018 - [info] Reading application default configurations from /data/mha/etc/app.cnf..
Thu Jul 12 22:15:11 2018 - [info] Reading server configurations from /data/mha/etc/app.cnf..
Thu Jul 12 22:15:11 2018 - [info] Starting SSH connection tests..
Thu Jul 12 22:15:12 2018 - [debug]
Thu Jul 12 22:15:11 2018 - [debug] Connecting via SSH from root@192.168.19.69(192.168.19.69:22) to root@192.168.19.73(192.168.19.73:22)..
Warning: Permanently added '192.168.19.69' (ECDSA) to the list of known hosts.
Thu Jul 12 22:15:12 2018 - [debug] ok.
Thu Jul 12 22:15:12 2018 - [debug] Connecting via SSH from root@192.168.19.69(192.168.19.69:22) to root@192.168.19.74(192.168.19.74:22)..
Thu Jul 12 22:15:12 2018 - [debug] ok.
Thu Jul 12 22:15:12 2018 - [debug]
Thu Jul 12 22:15:12 2018 - [debug] Connecting via SSH from root@192.168.19.73(192.168.19.73:22) to root@192.168.19.69(192.168.19.69:22)..
Warning: Permanently added '192.168.19.69' (ECDSA) to the list of known hosts.
Thu Jul 12 22:15:12 2018 - [debug] ok.
Thu Jul 12 22:15:12 2018 - [debug] Connecting via SSH from root@192.168.19.73(192.168.19.73:22) to root@192.168.19.74(192.168.19.74:22)..
Thu Jul 12 22:15:12 2018 - [debug] ok.
Thu Jul 12 22:15:13 2018 - [debug]
Thu Jul 12 22:15:12 2018 - [debug] Connecting via SSH from root@192.168.19.74(192.168.19.74:22) to root@192.168.19.69(192.168.19.69:22)..
Thu Jul 12 22:15:13 2018 - [debug] ok.
Thu Jul 12 22:15:13 2018 - [debug] Connecting via SSH from root@192.168.19.74(192.168.19.74:22) to root@192.168.19.73(192.168.19.73:22)..
Thu Jul 12 22:15:13 2018 - [debug] ok.
Thu Jul 12 22:15:13 2018 - [info] All SSH connection tests passed successfully.
==检查复制状态,有提示slave [warning] relay_log_purge=0 is not set on slave 192.168.19.73(192.168.19.73:3306)
[root@mysql3 etc]# /usr/local/bin/masterha_check_repl --conf=/data/mha/etc/app.cnf
Thu Jul 12 22:17:53 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Jul 12 22:17:53 2018 - [info] Reading application default configurations from /data/mha/etc/app.cnf..
Thu Jul 12 22:17:53 2018 - [info] Reading server configurations from /data/mha/etc/app.cnf..
Thu Jul 12 22:17:53 2018 - [info] MHA::MasterMonitor version 0.56.
Thu Jul 12 22:17:53 2018 - [info] Dead Servers:
Thu Jul 12 22:17:53 2018 - [info] Alive Servers:
Thu Jul 12 22:17:53 2018 - [info] 192.168.19.69(192.168.19.69:3306)
Thu Jul 12 22:17:53 2018 - [info] 192.168.19.73(192.168.19.73:3306)
Thu Jul 12 22:17:53 2018 - [info] 192.168.19.74(192.168.19.74:3306)
Thu Jul 12 22:17:53 2018 - [info] Alive Slaves:
Thu Jul 12 22:17:53 2018 - [info] 192.168.19.73(192.168.19.73:3306) Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Thu Jul 12 22:17:53 2018 - [info] GTID ON
Thu Jul 12 22:17:53 2018 - [info] Replicating from 192.168.19.69(192.168.19.69:3306)
Thu Jul 12 22:17:53 2018 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Jul 12 22:17:53 2018 - [info] 192.168.19.74(192.168.19.74:3306) Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Thu Jul 12 22:17:53 2018 - [info] GTID ON
Thu Jul 12 22:17:53 2018 - [info] Replicating from 192.168.19.69(192.168.19.69:3306)
Thu Jul 12 22:17:53 2018 - [info] Current Alive Master: 192.168.19.69(192.168.19.69:3306)
Thu Jul 12 22:17:53 2018 - [info] Checking slave configurations..
Thu Jul 12 22:17:53 2018 - [info] read_only=1 is not set on slave 192.168.19.73(192.168.19.73:3306).
Thu Jul 12 22:17:53 2018 - [warning] relay_log_purge=0 is not set on slave 192.168.19.73(192.168.19.73:3306).
Thu Jul 12 22:17:53 2018 - [info] read_only=1 is not set on slave 192.168.19.74(192.168.19.74:3306).
Thu Jul 12 22:17:53 2018 - [warning] relay_log_purge=0 is not set on slave 192.168.19.74(192.168.19.74:3306).
Thu Jul 12 22:17:53 2018 - [info] Checking replication filtering settings..
Thu Jul 12 22:17:53 2018 - [info] binlog_do_db= , binlog_ignore_db=
Thu Jul 12 22:17:53 2018 - [info] Replication filtering check ok.
Thu Jul 12 22:17:53 2018 - [info] GTID is supported. Skipping all SSH and Node package checking.
Thu Jul 12 22:17:53 2018 - [info] Checking SSH publickey authentication settings on the current master..
Thu Jul 12 22:17:53 2018 - [info] HealthCheck: SSH to 192.168.19.69 is reachable.
Thu Jul 12 22:17:53 2018 - [info]
192.168.19.69 (current master)
+--192.168.19.73
+--192.168.19.74
Thu Jul 12 22:17:53 2018 - [info] Checking replication health on 192.168.19.73..
Thu Jul 12 22:17:53 2018 - [info] ok.
Thu Jul 12 22:17:53 2018 - [info] Checking replication health on 192.168.19.74..
Thu Jul 12 22:17:53 2018 - [info] ok.
Thu Jul 12 22:17:53 2018 - [info] Checking master_ip_failover_script status:
Thu Jul 12 22:17:53 2018 - [info] /data/mha/etc/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.19.69 --orig_master_ip=192.168.19.69 --orig_master_port=3306
IN SCRIPT TEST====/sbin/ifconfig ens160:1 down==/sbin/ifconfig ens160:1 192.168.19.68/24===
Checking the Status of the script.. OK
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:50:56:bf:50:92 brd ff:ff:ff:ff:ff:ff
inet 192.168.19.69/24 brd 192.168.19.255 scope global ens160
valid_lft forever preferred_lft forever
inet 192.168.19.68/24 brd 192.168.19.255 scope global secondary ens160:0
valid_lft forever preferred_lft forever
inet6 fe80::4bdb:f907:5f14:a62b/64 scope link
valid_lft forever preferred_lft forever
Sun Jul 15 21:53:58 2018 - [info] OK.
Sun Jul 15 21:53:58 2018 - [warning] shutdown_script is not defined.
Sun Jul 15 21:53:58 2018 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
==检查mha状态
[root@mysql3 etc]# /usr/local/bin/masterha_check_status --conf=/data/mha/etc/app.cnf
app is stopped(2:NOT_RUNNING). #表示未启动mha
#启动mha manager
--remove_dead_master_conf 当mha发生切换以后,老的主库的ip将会从配置文件中移除
--ignore_last_failover,在缺省情况下,如果MHA检测到连续发生宕机,且两次宕机间隔不足8小时的话,则不会进行Failover,之所以这样限制是为了避免ping-pong效应。
默认情况,mha发生切换会产生app.failover.complete文件,下次再切换的时候如果发现目录下还有该文件,将不允许切换
,除非在第一次切换后手动rm 该文件,出于方便,每次启动加该参数
#nohup /usr/local/bin/masterha_manager --conf=/data/mha/etc/app.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/mha/log/manager.log 2>&1 &
[root@mysql3 etc]# nohup /usr/local/bin/masterha_manager --conf=/data/mha/etc/app.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/mha/log/manager.log 2>&1 &
[1] 14550
[root@mysql3 etc]# /usr/local/bin/masterha_check_status --conf=/data/mha/etc/app.cnf
app (pid:14550) is running(0:PING_OK), master:192.168.19.69
#master 增加虚拟ip(采用脚本管理vip)
在用虚拟IP的时候,需要在开启MHA程序之前要把虚拟IP先设置到主上去,否则MHA是不会自己的去设置VIP,第一次设置VIP之后,后续脚本的故障转移等功能会自动的对VIP进行切换。
### /sbin/ifconfig eth0:1 10.0.1.168 netmask 255.255.255.0 up
### ip addr del 192.168.19.68/24 dev ens160
设置:
ifconfig eth0:1 192.168.100.xxx
移除:
ifconfig eth0:1 down
备注:用ifconfig命令配置的网卡信息,在网卡重启后机器重启后,配置就不存在。
mha第一次启动的时候,需要手动加上vip
[root@mysql1 ~]# /sbin/ifconfig ens160:1 192.168.19.68/24
[root@mysql1 ~]# ip add
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:50:56:bf:50:92 brd ff:ff:ff:ff:ff:ff
inet 192.168.19.69/24 brd 192.168.19.255 scope global ens160
valid_lft forever preferred_lft forever
inet 192.168.19.68/24 brd 192.168.19.255 scope global secondary ens160:0
valid_lft forever preferred_lft forever
inet6 fe80::4bdb:f907:5f14:a62b/64 scope link
valid_lft forever preferred_lft forever
#关闭mha
#/usr/local/bin/masterha_stop --conf=/data/mha/etc/app.cnf
use test;
insert into t1 values(2,'BBBBBB');
insert into t1 values(3,'444');
insert into t1 values(4,'455');
insert into t1 values(5,'75');
select * from test.t1;
insert into t1 values(12,'1BBBBBB');
insert into t1 values(13,'1444');
insert into t1 values(14,'1455');
insert into t1 values(15,'175');
select * from test.t1;
select @@server_id;
5故障切换
http://www.cnblogs.com/gomysql/p/3675429.html
http://www.cnblogs.com/zhoujinyi/p/3808673.html
https://blog.csdn.net/wjhhjl/article/details/53705157
--slave,两台slave服务器设置read_only(从库对外提供读服务,只所以没有写进配置文件,是因为随时slave会提升为master)
mysql -usystem -p*** -h127.0.0.1 -P3306 -e'set global relay_log_purge=OFF;'
mysql -usystem -p*** -h127.0.0.1 -P3306 -e'set global read_only=1;'
注意
MHA在发生切换的过程中,从库的恢复过程中依赖于relay log的相关信息,所以这里要将relay log的自动清除设置为OFF
采用手动清除relay log的方式。在默认情况下,从服务器上的中继日志会在SQL线程执行完毕后被自动删除。
但是在MHA环境中,这些中继日志在恢复其他从服务器时可能会被用到,因此需要禁用中继日志的自动删除功能。
MHA节点中包含了pure_relay_logs命令工具
#cat purge_relay_log.sh
#!/bin/bash
user=system
passwd=***
port=3306
log_dir='/data/mha/log/'
work_dir='/data/mha/'
purge='/usr/local/bin/purge_relay_logs'
if [ ! -d $log_dir ]
then
mkdir $log_dir -p
fi
$purge --user=$user --password=$passwd --disable_relay_log_purge --port=$port --workdir=$work_dir >> $log_dir/purge_relay_logs.log 2>&1
定时任务
#crontab -l
0 4 * * * /bin/bash /root/purge_relay_log.sh
手动清除
# /usr/local/bin/purge_relay_logs --user=system --password=****--port=3306 --disable_relay_log_purge --workdir=/data/mha/
5.1 自动Failover
192.168.19.69 mysql1 master node
192.168.19.73 mysql2 slave node
192.168.19.74 mysql3 slave node/manager
必须先启动MHA Manager,否则无法自动切换,当然手动切换不需要开启MHA Manager监控
一旦发生切换管理进程(Manager)将会退出,app.cnf配置文件将删除server1的配置,无法进行再次测试,需将故障数据库解决掉之后,重新change加入到MHA环境中来,
并且要保证app1.failover.complete不存在或则加上--ignore_last_failover参数忽略,才能再次开启管理进程。
# nohup /usr/local/bin/masterha_manager --conf=/data/mha/etc/app.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/mha/log/manager.log 2>&1 &
# /usr/local/bin/masterha_check_status --conf=/data/mha/etc/app.cnf
[root@mysql3 ~]# /usr/local/bin/masterha_secondary_check -s 192.168.19.73 -s 192.168.19.74 --user=root --master_host=192.168.19.69 --master_ip=192.168.19.69 --master_port=3306
Master is reachable from 192.168.19.73!
[root@mysql3 sysbench]# /usr/local/bin/masterha_check_status --conf=/data/mha/etc/app.cnf
app (pid:347) is running(0:PING_OK), master:192.168.19.69
use test;
insert into t2 values(2,'BBBBBB');
insert into t2 values(3,'444');
insert into t2 values(4,'455');
insert into t2 values(5,'75');
select * from test.t2;
commit;
[mysql@mysql2 scripts]$ mysql -usystem -p*** -h127.0.0.1 -P3306 -e'select * from test.t2;'
[mysql@mysql1 scripts]$ ./mysql_db_shutdown.sh
[root@mysql3 etc]# tail -f -n 100 /data/mha/log/manager.log
Mon Jul 16 05:37:09 2018 - [warning] Got error on MySQL select ping: 2013 (Lost connection to MySQL server during query)
Mon Jul 16 05:37:09 2018 - [info] Executing SSH check script: exit 0
Mon Jul 16 05:37:10 2018 - [info] HealthCheck: SSH to 192.168.19.69 is reachable.
Mon Jul 16 05:37:10 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.19.69' (111))
Mon Jul 16 05:37:10 2018 - [warning] Connection failed 1 time(s)..
Mon Jul 16 05:37:11 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.19.69' (111))
Mon Jul 16 05:37:11 2018 - [warning] Connection failed 2 time(s)..
Mon Jul 16 05:37:12 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.19.69' (111))
Mon Jul 16 05:37:12 2018 - [warning] Connection failed 3 time(s)..
Mon Jul 16 05:37:12 2018 - [warning] Master is not reachable from health checker!
Mon Jul 16 05:37:12 2018 - [warning] Master 192.168.19.69(192.168.19.69:3306) is not reachable!
Mon Jul 16 05:37:12 2018 - [warning] SSH is reachable.
Mon Jul 16 05:37:12 2018 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /data/mha/etc/app.cnf again, and trying to connect to all servers to check server status..
Mon Jul 16 05:37:12 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jul 16 05:37:12 2018 - [info] Reading application default configurations from /data/mha/etc/app.cnf..
Mon Jul 16 05:37:12 2018 - [info] Reading server configurations from /data/mha/etc/app.cnf..
Mon Jul 16 05:37:13 2018 - [info] Dead Servers:
Mon Jul 16 05:37:13 2018 - [info] 192.168.19.69(192.168.19.69:3306)
Mon Jul 16 05:37:13 2018 - [info] Alive Servers:
Mon Jul 16 05:37:13 2018 - [info] 192.168.19.73(192.168.19.73:3306)
Mon Jul 16 05:37:13 2018 - [info] 192.168.19.74(192.168.19.74:3306)
Mon Jul 16 05:37:13 2018 - [info] Alive Slaves:
Mon Jul 16 05:37:13 2018 - [info] 192.168.19.73(192.168.19.73:3306) Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 16 05:37:13 2018 - [info] GTID ON
Mon Jul 16 05:37:13 2018 - [info] Replicating from 192.168.19.69(192.168.19.69:3306)
Mon Jul 16 05:37:13 2018 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jul 16 05:37:13 2018 - [info] 192.168.19.74(192.168.19.74:3306) Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 16 05:37:13 2018 - [info] GTID ON
Mon Jul 16 05:37:13 2018 - [info] Replicating from 192.168.19.69(192.168.19.69:3306)
Mon Jul 16 05:37:13 2018 - [info] Checking slave configurations..
Mon Jul 16 05:37:13 2018 - [info] Checking replication filtering settings..
Mon Jul 16 05:37:13 2018 - [info] Replication filtering check ok.
Mon Jul 16 05:37:13 2018 - [info] Master is down!
Mon Jul 16 05:37:13 2018 - [info] Terminating monitoring script.
Mon Jul 16 05:37:13 2018 - [info] Got exit code 20 (Master dead).
Mon Jul 16 05:37:13 2018 - [info] MHA::MasterFailover version 0.56.
Mon Jul 16 05:37:13 2018 - [info] Starting master failover.
Mon Jul 16 05:37:13 2018 - [info]
Mon Jul 16 05:37:13 2018 - [info] * Phase 1: Configuration Check Phase..
Mon Jul 16 05:37:13 2018 - [info]
Mon Jul 16 05:37:13 2018 - [info] Dead Servers:
Mon Jul 16 05:37:13 2018 - [info] 192.168.19.69(192.168.19.69:3306)
Mon Jul 16 05:37:13 2018 - [info] Checking master reachability via mysql(double check)..
Mon Jul 16 05:37:13 2018 - [info] ok.
Mon Jul 16 05:37:13 2018 - [info] Alive Servers:
Mon Jul 16 05:37:13 2018 - [info] 192.168.19.73(192.168.19.73:3306)
Mon Jul 16 05:37:13 2018 - [info] 192.168.19.74(192.168.19.74:3306)
Mon Jul 16 05:37:13 2018 - [info] Alive Slaves:
Mon Jul 16 05:37:13 2018 - [info] 192.168.19.73(192.168.19.73:3306) Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 16 05:37:13 2018 - [info] GTID ON
Mon Jul 16 05:37:13 2018 - [info] Replicating from 192.168.19.69(192.168.19.69:3306)
Mon Jul 16 05:37:13 2018 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jul 16 05:37:13 2018 - [info] 192.168.19.74(192.168.19.74:3306) Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 16 05:37:13 2018 - [info] GTID ON
Mon Jul 16 05:37:13 2018 - [info] Replicating from 192.168.19.69(192.168.19.69:3306)
Mon Jul 16 05:37:13 2018 - [info] ** Phase 1: Configuration Check Phase completed.
Mon Jul 16 05:37:13 2018 - [info]
Mon Jul 16 05:37:13 2018 - [info] * Phase 2: Dead Master Shutdown Phase..
Mon Jul 16 05:37:13 2018 - [info]
Mon Jul 16 05:37:13 2018 - [info] Forcing shutdown so that applications never connect to the current master..
Mon Jul 16 05:37:13 2018 - [info] Executing master IP deactivatation script:
Mon Jul 16 05:37:13 2018 - [info] /data/mha/etc/master_ip_failover --orig_master_host=192.168.19.69 --orig_master_ip=192.168.19.69 --orig_master_port=3306 --command=stopssh --ssh_user=root
IN SCRIPT TEST====/sbin/ifconfig ens160:1 down==/sbin/ifconfig ens160:1 192.168.19.68/24===
Disabling the VIP on old master: 192.168.19.69
SIOCSIFFLAGS: Cannot assign requested address
Mon Jul 16 05:37:13 2018 - [info] done.
Mon Jul 16 05:37:13 2018 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Mon Jul 16 05:37:13 2018 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Mon Jul 16 05:37:13 2018 - [info]
Mon Jul 16 05:37:13 2018 - [info] * Phase 3: Master Recovery Phase..
Mon Jul 16 05:37:13 2018 - [info]
Mon Jul 16 05:37:13 2018 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Mon Jul 16 05:37:13 2018 - [info]
Mon Jul 16 05:37:13 2018 - [info] The latest binary log file/position on all slaves is mysql-bin.000001:1532
Mon Jul 16 05:37:13 2018 - [info] Retrieved Gtid Set: cab5ec92-85ab-11e8-852c-005056bf5092:1-6
Mon Jul 16 05:37:13 2018 - [info] Latest slaves (Slaves that received relay log files to the latest):
Mon Jul 16 05:37:13 2018 - [info] 192.168.19.73(192.168.19.73:3306) Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 16 05:37:13 2018 - [info] GTID ON
Mon Jul 16 05:37:13 2018 - [info] Replicating from 192.168.19.69(192.168.19.69:3306)
Mon Jul 16 05:37:13 2018 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jul 16 05:37:13 2018 - [info] 192.168.19.74(192.168.19.74:3306) Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 16 05:37:13 2018 - [info] GTID ON
Mon Jul 16 05:37:13 2018 - [info] Replicating from 192.168.19.69(192.168.19.69:3306)
Mon Jul 16 05:37:13 2018 - [info] The oldest binary log file/position on all slaves is mysql-bin.000001:1532
Mon Jul 16 05:37:13 2018 - [info] Retrieved Gtid Set: cab5ec92-85ab-11e8-852c-005056bf5092:1-6
Mon Jul 16 05:37:13 2018 - [info] Oldest slaves:
Mon Jul 16 05:37:13 2018 - [info] 192.168.19.73(192.168.19.73:3306) Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 16 05:37:13 2018 - [info] GTID ON
Mon Jul 16 05:37:13 2018 - [info] Replicating from 192.168.19.69(192.168.19.69:3306)
Mon Jul 16 05:37:13 2018 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jul 16 05:37:13 2018 - [info] 192.168.19.74(192.168.19.74:3306) Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 16 05:37:13 2018 - [info] GTID ON
Mon Jul 16 05:37:13 2018 - [info] Replicating from 192.168.19.69(192.168.19.69:3306)
Mon Jul 16 05:37:13 2018 - [info]
Mon Jul 16 05:37:13 2018 - [info] * Phase 3.3: Determining New Master Phase..
Mon Jul 16 05:37:13 2018 - [info]
Mon Jul 16 05:37:13 2018 - [info] Searching new master from slaves..
Mon Jul 16 05:37:13 2018 - [info] Candidate masters from the configuration file:
Mon Jul 16 05:37:13 2018 - [info] 192.168.19.73(192.168.19.73:3306) Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 16 05:37:13 2018 - [info] GTID ON
Mon Jul 16 05:37:13 2018 - [info] Replicating from 192.168.19.69(192.168.19.69:3306)
Mon Jul 16 05:37:13 2018 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jul 16 05:37:13 2018 - [info] Non-candidate masters:
Mon Jul 16 05:37:13 2018 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Mon Jul 16 05:37:13 2018 - [info] New master is 192.168.19.73(192.168.19.73:3306)
Mon Jul 16 05:37:13 2018 - [info] Starting master failover..
Mon Jul 16 05:37:13 2018 - [info]
From:
192.168.19.69 (current master)
+--192.168.19.73
+--192.168.19.74
To:
192.168.19.73 (new master)
+--192.168.19.74
Mon Jul 16 05:37:13 2018 - [info]
Mon Jul 16 05:37:13 2018 - [info] * Phase 3.3: New Master Recovery Phase..
Mon Jul 16 05:37:13 2018 - [info]
Mon Jul 16 05:37:13 2018 - [info] Waiting all logs to be applied..
Mon Jul 16 05:37:13 2018 - [info] done.
Mon Jul 16 05:37:13 2018 - [info] Getting new master's binlog name and position..
Mon Jul 16 05:37:13 2018 - [info] mysql-bin.000001:1487
Mon Jul 16 05:37:13 2018 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.19.73', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Mon Jul 16 05:37:13 2018 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000001, 1487, cab5ec92-85ab-11e8-852c-005056bf5092:1-6
Mon Jul 16 05:37:13 2018 - [info] Executing master IP activate script:
Mon Jul 16 05:37:13 2018 - [info] /data/mha/etc/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.19.69 --orig_master_ip=192.168.19.69 --orig_master_port=3306 --new_master_host=192.168.19.73 --new_master_ip=192.168.19.73 --new_master_port=3306 --new_master_user='system' --new_master_password='mysql'
Unknown option: new_master_user
Unknown option: new_master_password
IN SCRIPT TEST====/sbin/ifconfig ens160:1 down==/sbin/ifconfig ens160:1 192.168.19.68/24===
Enabling the VIP - 192.168.19.68/24 on the new master - 192.168.19.73
Mon Jul 16 05:37:15 2018 - [info] OK.
Mon Jul 16 05:37:15 2018 - [info] Setting read_only=0 on 192.168.19.73(192.168.19.73:3306)..
Mon Jul 16 05:37:15 2018 - [info] ok.
Mon Jul 16 05:37:15 2018 - [info] ** Finished master recovery successfully.
Mon Jul 16 05:37:15 2018 - [info] * Phase 3: Master Recovery Phase completed.
Mon Jul 16 05:37:15 2018 - [info]
Mon Jul 16 05:37:15 2018 - [info] * Phase 4: Slaves Recovery Phase..
Mon Jul 16 05:37:15 2018 - [info]
Mon Jul 16 05:37:15 2018 - [info]
Mon Jul 16 05:37:15 2018 - [info] * Phase 4.1: Starting Slaves in parallel..
Mon Jul 16 05:37:15 2018 - [info]
Mon Jul 16 05:37:15 2018 - [info] -- Slave recovery on host 192.168.19.74(192.168.19.74:3306) started, pid: 4557. Check tmp log /data/mha/etc/app/192.168.19.74_3306_20180716053713.log if it takes time..
Mon Jul 16 05:37:16 2018 - [info]
Mon Jul 16 05:37:16 2018 - [info] Log messages from 192.168.19.74 ...
Mon Jul 16 05:37:16 2018 - [info]
Mon Jul 16 05:37:15 2018 - [info] Resetting slave 192.168.19.74(192.168.19.74:3306) and starting replication from the new master 192.168.19.73(192.168.19.73:3306)..
Mon Jul 16 05:37:15 2018 - [info] Executed CHANGE MASTER.
Mon Jul 16 05:37:16 2018 - [info] Slave started.
Mon Jul 16 05:37:16 2018 - [info] End of log messages from 192.168.19.74.
Mon Jul 16 05:37:16 2018 - [info] -- Slave on host 192.168.19.74(192.168.19.74:3306) started.
Mon Jul 16 05:37:16 2018 - [info] All new slave servers recovered successfully.
Mon Jul 16 05:37:16 2018 - [info]
Mon Jul 16 05:37:16 2018 - [info] * Phase 5: New master cleanup phase..
Mon Jul 16 05:37:16 2018 - [info]
Mon Jul 16 05:37:16 2018 - [info] Resetting slave info on the new master..
Mon Jul 16 05:37:16 2018 - [info] 192.168.19.73: Resetting slave info succeeded.
Mon Jul 16 05:37:16 2018 - [info] Master failover to 192.168.19.73(192.168.19.73:3306) completed successfully.
Mon Jul 16 05:37:16 2018 - [info] Deleted server1 entry from /data/mha/etc/app.cnf .
Mon Jul 16 05:37:16 2018 - [info]
----- Failover Report -----
app: MySQL Master failover 192.168.19.69 to 192.168.19.73 succeeded
Master 192.168.19.69 is down!
Check MHA Manager logs at mysql3:/data/mha/log/manager.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.19.69.
Selected 192.168.19.73 as a new master.
192.168.19.73: OK: Applying all logs succeeded.
192.168.19.73: OK: Activated master IP address.
192.168.19.74: OK: Slave started, replicating from 192.168.19.73.
192.168.19.73: Resetting slave info succeeded.
Master failover to 192.168.19.73(192.168.19.73:3306) completed successfully.
[root@mysql3 etc]# /usr/local/bin/masterha_check_repl --conf=/data/mha/etc/app.cnf
Mon Jul 16 05:38:25 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jul 16 05:38:25 2018 - [info] Reading application default configurations from /data/mha/etc/app.cnf..
Mon Jul 16 05:38:25 2018 - [info] Reading server configurations from /data/mha/etc/app.cnf..
Mon Jul 16 05:38:25 2018 - [info] MHA::MasterMonitor version 0.56.
Mon Jul 16 05:38:25 2018 - [info] Dead Servers:
Mon Jul 16 05:38:25 2018 - [info] Alive Servers:
Mon Jul 16 05:38:25 2018 - [info] 192.168.19.73(192.168.19.73:3306)
Mon Jul 16 05:38:25 2018 - [info] 192.168.19.74(192.168.19.74:3306)
Mon Jul 16 05:38:25 2018 - [info] Alive Slaves:
Mon Jul 16 05:38:25 2018 - [info] 192.168.19.74(192.168.19.74:3306) Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 16 05:38:25 2018 - [info] GTID ON
Mon Jul 16 05:38:25 2018 - [info] Replicating from 192.168.19.73(192.168.19.73:3306)
Mon Jul 16 05:38:25 2018 - [info] Current Alive Master: 192.168.19.73(192.168.19.73:3306)
Mon Jul 16 05:38:25 2018 - [info] Checking slave configurations..
Mon Jul 16 05:38:25 2018 - [info] Checking replication filtering settings..
Mon Jul 16 05:38:25 2018 - [info] binlog_do_db= , binlog_ignore_db=
Mon Jul 16 05:38:25 2018 - [info] Replication filtering check ok.
Mon Jul 16 05:38:25 2018 - [info] GTID is supported. Skipping all SSH and Node package checking.
Mon Jul 16 05:38:25 2018 - [info] Checking SSH publickey authentication settings on the current master..
Mon Jul 16 05:38:25 2018 - [info] HealthCheck: SSH to 192.168.19.73 is reachable.
Mon Jul 16 05:38:25 2018 - [info]
192.168.19.73 (current master)
+--192.168.19.74
Mon Jul 16 05:38:25 2018 - [info] Checking replication health on 192.168.19.74..
Mon Jul 16 05:38:25 2018 - [info] ok.
Mon Jul 16 05:38:25 2018 - [info] Checking master_ip_failover_script status:
Mon Jul 16 05:38:25 2018 - [info] /data/mha/etc/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.19.73 --orig_master_ip=192.168.19.73 --orig_master_port=3306
IN SCRIPT TEST====/sbin/ifconfig ens160:1 down==/sbin/ifconfig ens160:1 192.168.19.68/24===
Checking the Status of the script.. OK
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:45:02:42 brd ff:ff:ff:ff:ff:ff
inet 192.168.19.73/24 brd 192.168.19.255 scope global ens160
valid_lft forever preferred_lft forever
inet 192.168.19.68/24 brd 192.168.19.255 scope global secondary ens160:1
valid_lft forever preferred_lft forever
inet6 fe80::8cc8:7141:6d5a:7686/64 scope link
valid_lft forever preferred_lft forever
Mon Jul 16 05:38:25 2018 - [info] OK.
Mon Jul 16 05:38:25 2018 - [warning] shutdown_script is not defined.
Mon Jul 16 05:38:25 2018 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
[root@mysql3 sysbench]# /usr/local/bin/masterha_check_status --conf=/data/mha/etc/app.cnf
app is stopped(2:NOT_RUNNING).
[1]+ Done nohup /usr/local/bin/masterha_manager --conf=/data/mha/etc/app.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/mha/log/manager.log 2>&1
[root@mysql2 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:45:02:42 brd ff:ff:ff:ff:ff:ff
inet 192.168.19.73/24 brd 192.168.19.255 scope global ens160
valid_lft forever preferred_lft forever
inet 192.168.19.68/24 brd 192.168.19.255 scope global secondary ens160:1
valid_lft forever preferred_lft forever
inet6 fe80::8cc8:7141:6d5a:7686/64 scope link
valid_lft forever preferred_lft forever
从上面的输出可以看出整个MHA的切换过程,共包括以下的步骤:
1.配置文件检查阶段,这个阶段会检查整个集群配置文件配置
2.宕机的master处理,这个阶段包括虚拟ip摘除操作,主机关机操作
3.复制dead maste和最新slave相差的relay log,并保存到MHA Manger具体的目录下
4.识别含有最新更新的slave
5.应用从master保存的二进制日志事件(binlog events)
6.提升一个slave为新的master进行复制
7.使其他的slave连接新的master进行复制
8 自动切换后,manager会删除app.cnf文件中的[server1]配置
9 old的master加入主从后,进行chk检查,还是发现不了new slave,因为配置文件里面已经删除掉
[root@mysql3 etc]# cat /data/mha/log/manager.log |grep -i vip
Disabling the VIP on old master: 192.168.19.69
Enabling the VIP - 192.168.19.68/24 on the new master - 192.168.19.73
[root@mysql3 etc]# ssh mysql2 "ifconfig |grep 19.68 -B1"
ens160:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.19.68 netmask 255.255.255.0 broadcast 192.168.19.255
#192.168.19.68 插入并查看数据
insert into t2 values(12,'1BBBBBB');
insert into t2 values(13,'1444');
insert into t2 values(14,'1455');
insert into t2 values(15,'175');
[mysql@mysql3 scripts]$ mysql -usystem -p*** -h127.0.0.1 -P3306 -e'select * from test.t2;'
[mysql@hongquan1 ~]$ mysql -usystem -p*** -h192.168.19.68 -P3306
(system@192.168.19.68:3306) [(none)]> show slave status\G;
Empty set (0.00 sec)
ERROR:
No query specified
删除manager上的app.failover.complete
[root@mysql3 app]# rm /data/mha/etc/app/app.failover.complete
启动old的master,并登陆查看
[mysql@mysql1 scripts]$ ./mysql_db_startup.sh
[mysql@mysql1 scripts]$ ./mysqlplus.sh
(system@127.0.0.1:3306) [(none)]> show master status;
+------------------+----------+--------------+------------------+------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+------------------------------------------+
| mysql-bin.000002 | 191 | | | cab5ec92-85ab-11e8-852c-005056bf5092:1-6 |
+------------------+----------+--------------+------------------+------------------------------------------+
1 row in set (0.00 sec)
(system@127.0.0.1:3306) [(none)]> select *from test.t2; #还是宕机前的数据
+------+--------+
| id | name |
+------+--------+
| 1 | AAAAA |
| 2 | BBBBBB |
| 3 | 444 |
| 4 | 455 |
| 5 | 75 |
+------+--------+
查看manager的log文件,找到change的语句
[root@mysql3 etc]# cat /data/mha/log/manager.log |grep -i change
Mon Jul 16 05:37:13 2018 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.19.73', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Mon Jul 16 05:37:15 2018 - [info] Executed CHANGE MASTER.
在slave连接master
(system@127.0.0.1:3306) [(none)]> CHANGE MASTER TO MASTER_HOST='192.168.19.73', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='***';
Query OK, 0 rows affected, 2 warnings (0.02 sec)
(system@127.0.0.1:3306) [(none)]> start slave;
Query OK, 0 rows affected (0.00 sec)
(system@127.0.0.1:3306) [(none)]> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.19.73
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 2465
Relay_Log_File: relaylog.000002
Relay_Log_Pos: 1386
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 2465
Relay_Log_Space: 1583
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 2018071202
Master_UUID: d4e95eb8-85ae-11e8-8540-000c29450242
Master_Info_File: /data/mysqldata/3306/data/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: d4e95eb8-85ae-11e8-8540-000c29450242:1-4
Executed_Gtid_Set: cab5ec92-85ab-11e8-852c-005056bf5092:1-6,
d4e95eb8-85ae-11e8-8540-000c29450242:1-4
Auto_Position: 1
1 row in set (0.01 sec)
ERROR:
No query specified
(system@127.0.0.1:3306) [(none)]> select *from test.t2;
+------+---------+
| id | name |
+------+---------+
| 1 | AAAAA |
| 2 | BBBBBB |
| 3 | 444 |
| 4 | 455 |
| 5 | 75 |
| 12 | 1BBBBBB |
| 13 | 1444 |
| 14 | 1455 |
| 15 | 175 |
+------+---------+
9 rows in set (0.00 sec)
已经将old master作为新slave加入mha,修改app.cnf配置文件,加入相应的server
[root@mysql3 ~]# vim /data/mha/etc/app.cnf
[root@mysql3 ~]# /usr/local/bin/masterha_check_repl --conf=/data/mha/etc/app.cnf
[root@mysql3 etc]# /usr/local/bin/masterha_check_repl --conf=/data/mha/etc/app.cnf
启动管理节点
# nohup /usr/local/bin/masterha_manager --conf=/data/mha/etc/app.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/mha/log/manager.log 2>&1 &
[root@mysql3 ~]# /usr/local/bin/masterha_check_status --conf=/data/mha/etc/app.cnf
app (pid:17328) is running(0:PING_OK), master:192.168.19.73
# /usr/local/bin/masterha_stop --conf=/data/mha/etc/app.cnf
--3台服务器停电,重启后,74作为slave报错,
Last_SQL_Errno: 1050
Last_SQL_Error: Error 'Table 't2' already exists' on query. Default database: 'test'. Query: 'create table t2(id int,name varchar(10))'
(system@127.0.0.1:3306) [test]> reset master;
(system@127.0.0.1:3306) [test]> show master status;
(system@127.0.0.1:3306) [test]> set @@GLOBAL.GTID_PURGED='cab5ec92-85ab-11e8-852c-005056bf5092:1-6,d4e95eb8-85ae-11e8-8540-000c29450242:1-17';
(system@127.0.0.1:3306) [test]> start slave;
(system@127.0.0.1:3306) [test]> show slave status\G;
5.2 手动Failover
masterha_master_switch工具进行主从切换,在手动切换的同时需要保证没有启用MHA自动切换功能
MHA Manager必须没有运行
手动failover,这种场景意味着在业务上没有启用MHA自动切换功能,当主服务器故障时,人工手动调用MHA来进行故障切换操作,具体命令如下
注意:如果,MHA manager检测到没有dead的server,将报错,并结束failover
# /usr/local/bin/masterha_stop --conf=/data/mha/etc/app.cnf
# /usr/local/bin/masterha_master_switch --master_state=dead --conf=/data/mha/etc/app.cnf --dead_master_host=192.168.19.73 --dead_master_port=3306 --new_master_host=192.168.19.69 --new_master_port=3306 --ignore_last_failover
--dead_master_ip=<dead_master_ip> is not set. Using 192.168.19.73.
Mon Jul 16 22:11:01 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jul 16 22:11:01 2018 - [info] Reading application default configurations from /data/mha/etc/app.cnf..
Mon Jul 16 22:11:01 2018 - [info] Reading server configurations from /data/mha/etc/app.cnf..
Mon Jul 16 22:11:01 2018 - [info] MHA::MasterFailover version 0.56.
Mon Jul 16 22:11:01 2018 - [info] Starting master failover.
Mon Jul 16 22:11:01 2018 - [info]
Mon Jul 16 22:11:01 2018 - [info] * Phase 1: Configuration Check Phase..
Mon Jul 16 22:11:01 2018 - [info]
Mon Jul 16 22:11:01 2018 - [info] Dead Servers:
Mon Jul 16 22:11:01 2018 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln183] None of server is dead. Stop failover.
Mon Jul 16 22:11:01 2018 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln178] Got ERROR: at /usr/local/bin/masterha_master_switch line 53
因为这里master没有问题,所以手动failover切换会失败,有的mha没有开启自动切换,手动切换(当主服务故障时)
5.3 在线进行切换
MHA在线切换的大概过程:
1.检测复制设置和确定当前主服务器
2.确定新的主服务器
3.阻塞写入到当前主服务器
4.等待所有从服务器赶上复制
5.授予写入到新的主服务器
6.重新设置从服务器
注意,在线切换的时候应用架构需要考虑以下两个问题:
1.自动识别master和slave的问题(master的机器可能会切换),如果采用了vip的方式,基本可以解决这个问题。
2.负载均衡的问题(可以定义大概的读写比例,每台机器可承担的负载比例,当有机器离开集群时,需要考虑这个问题)
为了保证数据完全一致性,在最快的时间内完成切换,MHA的在线切换必须满足以下条件才会切换成功,否则会切换失败。
1.所有slave的IO线程都在运行
2.所有slave的SQL线程都在运行
3.所有的show slave status的输出中Seconds_Behind_Master参数小于或者等于running_updates_limit秒,如果在切换过程中不指定running_updates_limit,那么默认情况下running_updates_limit为1秒。
4.在master端,通过show processlist输出,没有一个更新花费的时间大于running_updates_limit秒。
在线切换步骤如下:
首先,停掉MHA监控:
# /usr/local/bin/masterha_stop --conf=/data/mha/etc/app.cnf
# /usr/local/bin/masterha_master_switch --conf=/data/mha/etc/app.cnf --master_state=alive --new_master_host=192.168.19.69 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000
Mon Jul 16 22:20:40 2018 - [info] MHA::MasterRotate version 0.56.
Mon Jul 16 22:20:40 2018 - [info] Starting online master switch..
Mon Jul 16 22:20:40 2018 - [info]
Mon Jul 16 22:20:40 2018 - [info] * Phase 1: Configuration Check Phase..
Mon Jul 16 22:20:40 2018 - [info]
Mon Jul 16 22:20:40 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jul 16 22:20:40 2018 - [info] Reading application default configurations from /data/mha/etc/app.cnf..
Mon Jul 16 22:20:40 2018 - [info] Reading server configurations from /data/mha/etc/app.cnf..
Mon Jul 16 22:20:40 2018 - [info] Current Alive Master: 192.168.19.73(192.168.19.73:3306)
Mon Jul 16 22:20:40 2018 - [info] Alive Slaves:
Mon Jul 16 22:20:40 2018 - [info] 192.168.19.69(192.168.19.69:3306) Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 16 22:20:40 2018 - [info] GTID ON
Mon Jul 16 22:20:40 2018 - [info] Replicating from 192.168.19.73(192.168.19.73:3306)
Mon Jul 16 22:20:40 2018 - [info] 192.168.19.74(192.168.19.74:3306) Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 16 22:20:40 2018 - [info] GTID ON
Mon Jul 16 22:20:40 2018 - [info] Replicating from 192.168.19.73(192.168.19.73:3306)
It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.19.73(192.168.19.73:3306)? (YES/no): yes
Mon Jul 16 22:20:46 2018 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Mon Jul 16 22:20:46 2018 - [info] ok.
Mon Jul 16 22:20:46 2018 - [info] Checking MHA is not monitoring or doing failover..
Mon Jul 16 22:20:46 2018 - [info] Checking replication health on 192.168.19.69..
Mon Jul 16 22:20:46 2018 - [info] ok.
Mon Jul 16 22:20:46 2018 - [info] Checking replication health on 192.168.19.74..
Mon Jul 16 22:20:46 2018 - [info] ok.
Mon Jul 16 22:20:46 2018 - [error][/usr/local/share/perl5/MHA/MasterRotate.pm, ln161] We should not start online master switch when one of connections are running long updates on the current master(192.168.19.73(192.168.19.73:3306)). Currently 3 update thread(s) are running.
Details:
{'Time' => '407522','db' => undef,'Id' => '1','User' => 'event_scheduler','State' => 'Waiting on empty queue','Command' => 'Daemon','Info' => undef,'Host' => 'localhost'}
{'Time' => '60211','db' => undef,'Id' => '85','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for binlog to be updated','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => '192.168.19.74:50422'}
{'Time' => '59571','db' => undef,'Id' => '91','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for binlog to be updated','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => '192.168.19.69:55086'}
Mon Jul 16 22:20:46 2018 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln178] Got ERROR: at /usr/local/bin/masterha_master_switch line 53.
master
[system@3306][(none)]> set global event_scheduler=off;
[system@3306][(none)]> Select @@event_scheduler;
+-------------------+
| @@event_scheduler |
+-------------------+
| OFF |
+-------------------+
Mon Jul 16 22:26:06 2018 - [error][/usr/local/share/perl5/MHA/MasterRotate.pm, ln161] We should not start online master switch when one of connections are running long updates on the current master(192.168.19.73(192.168.19.73:3306)). Currently 2 update thread(s) are running.
Details:
{'Time' => '60531','db' => undef,'Id' => '85','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for binlog to be updated','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => '192.168.19.74:50422'}
{'Time' => '59891','db' => undef,'Id' => '91','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for binlog to be updated','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => '192.168.19.69:55086'}
Mon Jul 16 22:26:06 2018 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln178] Got ERROR: at /usr/local/bin/masterha_master_switch line 53.
--orig_master_is_new_slave 切换时加上此参数是将原 master 变为 slave 节点,如果不加此参数,原来的 master 将不启动
--running_updates_limit=10000,故障切换时,候选master 如果有延迟的话, mha 切换不能成功,加上此参数表示延迟在此时间范围内都可切换(单位为s),但是切换的时间长短是由recover 时relay 日志的大小决定
注意:由于在线进行切换需要调用到master_ip_online_change这个脚本
修复宕机的Master
通常情况下自动切换以后,原master可能已经废弃掉,待原master主机修复后,如果数据完整的情况下,可能想把原来master重新作为新主库的slave
,这时我们可以借助当时自动切换时刻的MHA日志来完成对原master的修复。下面是提取相关日志的命令:
# grep -i "All other slaves" /data/mha/log/manager.log
=========
MHA 参数列表
Local : 指每一个配置块内部。 Local功能的参数需要放置在[server_xxx] 块下面
App : 参数作用于master/slave, 这些参数需要配置在[server_default]块的下面
Global : 作用于master/slave, Global级别的参数用于管理多组master/slave结构,可以统一化管理一些参数
candidate_master
你可能对于同一组slave中有着不同的规划,有的其望在Master故障时,提升为新的Master(如: Raid1的slave比Raid0的slave更适合做Master)
这个参数的作用是当设计candidate_master = 1时,这个服务器有较高的优先级提升为新的master(还要具备: 开启binlog, 复制没有延迟)。
所以当设置了candidate_master = 1的机器在master故障时必然成为新的master. 但这是很有用的设置优先级的一个参数。
no_master 当设置了no_master = 1的服务器,这个服务器永远不会提升为新的master. 这个参数据对于永远不期望成为master的机器很有用。
ignore_fail 在默认情况下,MHA manager不会在slave存在故障的情况下(已经不能通过SSH/MySQL进行连接,或是SQL Thread已经停止出错,其它原因)的情况下进行Master的故障切换
(当不存在slave时MHA manager也会退出)。 但有些情况下你期望在slave存在问题的情况下,也进行故障切换。 所以当设置了ignore_fail = 1时,MHA会在所有的机器有问题的时间也会进行故障切换。 默认是0.
skip_reset_slave (从MHA 0.56开始支持) Master故障切换后新的master上执行RESET SLAVE(ALL)
disable_log_bin 当设置了这个参数,在slave应用差异的relay log时不会产生二进制日志。 内部实现通过mysqlbinlog的disable-log-bin实现。
remote_workdir MHA node上工作目录的全路径名。如果不存在,MHA node会自动创建,如果不允许创建,MHA Node自动异常退出。
需要注意MHA manager 或是MHA node上需要检查空间是否可用,所以需要注意检查这个。 一般默认, remote_workdir是的”/var/tmp”
master_binlog_dir master上用于存储binary日志的全路径。这个参数用于当master上mysql死掉后,通过ssh连到mysql服务器上,找到需要binary日志事件。这个参数可以帮助用于无法找到master进程死掉后binary日志存储位置。
log_level 设置MHA manager记录LOG的级别。 默认是info级别而且大多数情况也是比较适合。 同样可以设置为: debug/info/warning/error.
manager_workdir 用于指定mha manager产生相关状态文件全路径。 如果没设置 默认是/var/tmp
check_repl_delay
check_repl_filter 在默认下情况,当master和slave设置了不同了binary log/replication 过滤规则时,MHA直接报错不会进行监控和故障切换。 这些将会导致出现一些异想不到的错误”Table not exists”。
如果你100%确认不同的过滤规则不会导致恢复中的错误,设置check_repl_filter=0。 需要注意: 当使用了check_repl_filter = 0时,MHA不会检查过滤规则在应用差异的relay日志,所以有可能会出现
”Table not exists”的错误。当你设置了这个参数请小心处理。
latest_priority 在默认情况下,和Master最接近的slave(一个slave从Master上获得了最一个binlog事件)是最有优先权成为新的master。
multi_tier_slave 从MHA 0.52开始, 多层复制可以支持了。在默认情况下,不支持三层或是更多层的复制配置。 如果: host2从host1上复制,host3从host2上复制。 在默认配置的情况下不支持写host{1,2,3},
因为这是一个三层的复制,MHA Manager会停止报错。 当设置了multi_tier_slave, MHA Manager就不会在三层复制报错停止。 但是会忽略第三层的机器。也就是如果host1挂了,host2将会成为新的master,host3还是从host2上进行复制。
ping_interval 这个参数设置MHA Manager多长时间去ping一下master(执行一些SQL语句). 当失去和master三次偿试,MHA Manager会认为MySQL Master死掉了。也就是说,最大的故障切换时间是4次ping_interval的时间,默认是3秒。
ping_type (从MHA 0.53后开始支持) 在默认情况下, MHA manager和MySQL创建一个连接执行”select 1″(ping_type=select)用于检查master是否健康。
但有一些情况: 每次检测都连接/然后断开会比较好一点,这样对于tcp方面的错误感知更快一点。设置ping_type=CONNECT 就行了。从MHA 0.56后pint_type=INSERT也被添加。
secondary_check_script 一般来讲, 非常推荐使用更多网络上机器是不同路由策略上的机器来检查MySQL Master是否存活。 默认情况下,只有MHA Manager通过一个路由来
检查Master是否存活。这个也是不推荐的。MHA可以通过外部secondary_check_script配置的脚本来做到从多个路由策略下进行检查。
master_ip_failover_script
master_ip_online_changes_script
shutdown_script
report_script
init_conf_load_script 这个参数用于不想设置明文在配置文件中(如:密码相关)。 只用返回”name=value”这样的值。 这个可以用来复盖global配置中的一些值。一个例子脚本如下
#!/usr/bin/perl
print “password=$ROOT_PASS\n”;
print “repl_password=$REPL_PASS\n”;
init_conf_load_script No App/Global null report_script= /usr/local/custom_script/init_conf_loader
[root@mysql3 log]# cat /etc/sysconfig/network-scripts/ifcfg-ens160
[root@mysql3 etc]# vim app.cnf
[server default]
user = system #mysql用戶名,#需要的权限:Super,select,create,insert,update,delete,drop,reload
password = *** #mysql密码
ssh_user = root #ssh免密钥登录的帐号名
repl_user = repl #mysql复制帐号
repl_password = ****#mysql复制密码
ping_interval = 1 #ping间隔,用来检测master是否正常,默认是3秒,尝试三次没有回应的时候自动进行failover
ping_type = SELECT
manager_workdir=/data/mha/etc/app #manager的工作目录,主要该目录的权限,需要有创建的权限
manager_log=/data/mha/log/manager.log #manager的日志
remote_workdir=/data/mha/etc/app #设置远端mysql在发生切换时binlog的保存位置 #另外2台机子在运行时候需要创建的目录
master_binlog_dir="/data/mysqldata/3306/binlog" #master 保存binlog的位置,以便MHA可以找到master的日志
master_ip_failover_script="/data/mha/etc/master_ip_failover" #自动failover时候的切换脚本
master_ip_online_change_script="/data/mha/etc/master_ip_failover" #手动切换时候的切换脚本
shutdown_script=""
report_script=""
#check_repl_delay=0
[server1]
hostname=192.168.19.69
port=3306
master_binlog_dir="/data/mysqldata/3306/binlog"
#candidate_master=1
#ignore_fail=1
[server2]
hostname=192.168.19.73
port=3306
master_binlog_dir="/data/mysqldata/3306/binlog/"
candidate_master=1 #master机宕掉后,优先启用这台作为新master
check_repl_delay=0 #默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该slave作为一个新的master
#ignore_fail=1
[server3]
hostname=192.168.19.74
port=3306
master_binlog_dir="/data/mysqldata/3306/binlog/"
----master_ip_failover,虚拟ip配置等
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
my $vip = '192.168.19.68/24'; # Virtual IP
my $key = "1";
my $int = "ens160";
my $ssh_start_vip = "/sbin/ifconfig $int:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig $int:$key down";
my $arp_effect = "/sbin/arping -Uq -s192.168.19.68 -I $int 192.168.19.99 -c 3"; # Virtual IP and gateway
#my $test = "echo successfull >/tmp/test.txt";
$ssh_user = "root";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
# $orig_master_host, $orig_master_ip, $orig_master_port are passed.
# If you manage master ip address at global catalog database,
# invalidate orig_master_ip here.
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
# all arguments are passed.
# If you manage master ip address at global catalog database,
# activate new_master_ip here.
# You can also grant write access (create user, set read_only=0, etc) here.
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
#`ssh $ssh_user\@cluster1 \" $ssh_start_vip \"`;
&status();
exit 0;
}
else {
&usage();
exit 1;
}
}
# A simple system call that enable the VIP on the new master
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
`ssh $ssh_user\@$new_master_host \" $arp_effect \"`;
# `ssh $ssh_user\@$new_master_host \" $test \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub status() {
print `ssh $ssh_user\@$orig_master_host \" ip add show $int \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_maste
r_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
---master_ip_online_changes_script
#!/usr/bin/env perl
# Copyright (C) 2011 DeNA Co.,Ltd.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
## Note: This is a sample script and is not complete. Modify the script based on your environment.
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
use MHA::DBHelper;
use MHA::NodeUtil;
use Time::HiRes qw( sleep gettimeofday tv_interval );
use Data::Dumper;
my $_tstart;
my $_running_interval = 0.1;
my (
$command, $orig_master_host, $orig_master_ip,
$orig_master_port, $orig_master_user,
$new_master_host, $new_master_ip, $new_master_port,
$new_master_user,
);
my $vip = '192.168.19.68/24'; # Virtual IP
my $key = "1";
my $int = "ens160";
my $ssh_start_vip = "/sbin/ifconfig $int:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig $int:$key down";
my $ssh_user = "root";
my $new_master_password='mysql';
my $orig_master_password='mysql';
GetOptions(
'command=s' => \$command,
#'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'orig_master_user=s' => \$orig_master_user,
#'orig_master_password=s' => \$orig_master_password,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
'new_master_user=s' => \$new_master_user,
#'new_master_password=s' => \$new_master_password,
);
exit &main();
sub current_time_us {
my ( $sec, $microsec ) = gettimeofday();
my $curdate = localtime($sec);
return $curdate . " " . sprintf( "%06d", $microsec );
}
sub sleep_until {
my $elapsed = tv_interval($_tstart);
if ( $_running_interval > $elapsed ) {
sleep( $_running_interval - $elapsed );
}
}
sub get_threads_util {
my $dbh = shift;
my $my_connection_id = shift;
my $running_time_threshold = shift;
my $type = shift;
$running_time_threshold = 0 unless ($running_time_threshold);
$type = 0 unless ($type);
my @threads;
my $sth = $dbh->prepare("SHOW PROCESSLIST");
$sth->execute();
while ( my $ref = $sth->fetchrow_hashref() ) {
my $id = $ref->{Id};
my $user = $ref->{User};
my $host = $ref->{Host};
my $command = $ref->{Command};
my $state = $ref->{State};
my $query_time = $ref->{Time};
my $info = $ref->{Info};
$info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);
next if ( $my_connection_id == $id );
next if ( defined($query_time) && $query_time < $running_time_threshold );
next if ( defined($command) && $command eq "Binlog Dump" );
next if ( defined($user) && $user eq "system user" );
next
if ( defined($command)
&& $command eq "Sleep"
&& defined($query_time)
&& $query_time >= 1 );
if ( $type >= 1 ) {
next if ( defined($command) && $command eq "Sleep" );
next if ( defined($command) && $command eq "Connect" );
}
if ( $type >= 2 ) {
next if ( defined($info) && $info =~ m/^select/i );
next if ( defined($info) && $info =~ m/^show/i );
}
push @threads, $ref;
}
return @threads;
}
sub main {
if ( $command eq "stop" ) {
## Gracefully killing connections on the current master
# 1. Set read_only= 1 on the new master
# 2. DROP USER so that no app user can establish new connections
# 3. Set read_only= 1 on the current master
# 4. Kill current queries
# * Any database access failure will result in script die.
my $exit_code = 1;
eval {
## Setting read_only=1 on the new master (to avoid accident)
my $new_master_handler = new MHA::DBHelper();
# args: hostname, port, user, password, raise_error(die_on_error)_or_not
$new_master_handler->connect( $new_master_ip, $new_master_port,
$new_master_user, $new_master_password, 1 );
print current_time_us() . " Set read_only on the new master.. ";
$new_master_handler->enable_read_only();
if ( $new_master_handler->is_read_only() ) {
print "ok.\n";
}
else {
die "Failed!\n";
}
$new_master_handler->disconnect();
# Connecting to the orig master, die if any database error happens
my $orig_master_handler = new MHA::DBHelper();
$orig_master_handler->connect( $orig_master_ip, $orig_master_port,
$orig_master_user, $orig_master_password, 1 );
## Drop application user so that nobody can connect. Disabling per-session binlog beforehand
#$orig_master_handler->disable_log_bin_local();
#print current_time_us() . " Drpping app user on the orig master..\n";
#FIXME_xxx_drop_app_user($orig_master_handler);
## Waiting for N * 100 milliseconds so that current connections can exit
my $time_until_read_only = 15;
$_tstart = [gettimeofday];
my @threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
while ( $time_until_read_only > 0 && $#threads >= 0 ) {
if ( $time_until_read_only % 5 == 0 ) {
printf
"%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n",
current_time_us(), $#threads + 1, $time_until_read_only * 100;
if ( $#threads < 5 ) {
print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
foreach (@threads);
}
}
sleep_until();
$_tstart = [gettimeofday];
$time_until_read_only--;
@threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
}
## Setting read_only=1 on the current master so that nobody(except SUPER) can write
print current_time_us() . " Set read_only=1 on the orig master.. ";
$orig_master_handler->enable_read_only();
if ( $orig_master_handler->is_read_only() ) {
print "ok.\n";
}
else {
die "Failed!\n";
}
## Waiting for M * 100 milliseconds so that current update queries can complete
my $time_until_kill_threads = 5;
@threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
while ( $time_until_kill_threads > 0 && $#threads >= 0 ) {
if ( $time_until_kill_threads % 5 == 0 ) {
printf
"%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n",
current_time_us(), $#threads + 1, $time_until_kill_threads * 100;
if ( $#threads < 5 ) {
print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
foreach (@threads);
}
}
sleep_until();
$_tstart = [gettimeofday];
$time_until_kill_threads--;
@threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
}
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
## Terminating all threads
print current_time_us() . " Killing all application threads..\n";
$orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 );
print current_time_us() . " done.\n";
#$orig_master_handler->enable_log_bin_local();
$orig_master_handler->disconnect();
## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
## Activating master ip on the new master
# 1. Create app user with write privileges
# 2. Moving backup script if needed
# 3. Register new master's ip to the catalog database
# We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery.
# If exit code is 0 or 10, MHA does not abort
my $exit_code = 10;
eval {
my $new_master_handler = new MHA::DBHelper();
# args: hostname, port, user, password, raise_error_or_not
$new_master_handler->connect( $new_master_ip, $new_master_port,
$new_master_user, $new_master_password, 1 );
## Set read_only=0 on the new master
#$new_master_handler->disable_log_bin_local();
print current_time_us() . " Set read_only=0 on the new master.\n";
$new_master_handler->disable_read_only();
## Creating an app user on the new master
#print current_time_us() . " Creating app user on the new master..\n";
#FIXME_xxx_create_app_user($new_master_handler);
#$new_master_handler->enable_log_bin_local();
$new_master_handler->disconnect();
## Update master ip on the catalog database, etc
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
# do nothing
exit 0;
}
else {
&usage();
exit 1;
}
}
# A simple system call that enable the VIP on the new master
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
die;
}
Mysql 5.6 MHA (gtid) on Kylin的更多相关文章
- MYSQL 5.7 MHA(GTID+ROW)部署及failover,online_change实战演练
文章结构如下: 1.MHA简介 Masterhigh availability manager and toolsfor mysql,是日本的一位mysql专家采用perl语言编写的一个脚本管理工具, ...
- 【MySql】——MHA+GTID+failover+binlog-server+Atlas
一.环境准备 1.mysql-db01 #系统版本 [root@mysql-db01 ~]# cat /etc/redhat-release CentOS release 6.7 (Final) #内 ...
- MySQL 5.7基于GTID复制的常见问题和修复步骤(二)
[问题二] 有一个集群(MySQL5.7.23)切换后复制slave报1236,其实是不小心在slave上执行了事务导致 Got fatal error 1236 from master when r ...
- Mysql 高可用(MHA)-读写分离(Atlas)-分布式架构(Mycat)
Mysql 高可用(MHA)-读写分离(Atlas) 1. 搭建主从复制(一主两从) 1.1 准备环境 1 主库:10.0.0.51/db01 2 从库:10.0.0.52/db02,10.0.0.5 ...
- MySQL HA方案之MySQL半复制+MHA+Keepalived+Atlas+LVS[转]
MySQL HA方案之MySQL半复制+MHA+Keepalived+Atlas+LVS 简介 目前Mysql高可用的方案有好多,比如MMM,heartbeat+drbd,Cluster等,还有per ...
- percona mysql server5.7基于gtid的主从复制
配置mysql基于gtid主从复制架构 一.二进制安装mysql [root@node5 data]# --Linux.x86_64.ssl101.tar.gz [root@node5 data]# ...
- mysql 5.6在gtid复制模式下复制错误,如何跳过??
mysql 5.6在gtid复制模式下复制错误,如何跳过?? http://www.xuchanggang.cn/archives/918.html
- MySQL 在线开启&关闭GTID模式
MySQL 在线开启&关闭GTID模式 目录 MySQL 在线开启&关闭GTID模式 基本概述 在线开启GTID 1. 设置GTID校验ENFORCE_GTID_CONSISTENCY ...
- Mysql MHA(GTID)配置(实操)
实现环境 centos6.7 MYSQL5.6.36 主:192.168.1.191 从1:192.168.1.145 从2:192.168.1.146 监测:放在从2上 192.168.1.146 ...
随机推荐
- Python日期时间函数
所有日期.时间的api都在datetime模块内. 1. 日期输出格式化 datetime => string import datetime now = datetime.datetime.n ...
- 【P2361】yyy棋(博弈论+贪心+模拟)
这个题看上去本来不好处理,然而善意的题面已经基本告诉你做法了,小时候玩的那个游戏就是代码的核心.动动脑子想想,如果长和宽的积是奇数,那么一定要先手,如果是偶数,那么后手就会获胜. 好了,那么怎么处理对 ...
- 生成一个ipa的包,使非开发机也能安装成功 (Xcode5.1)
for example: 想为com.apple.cloud的bundle identifier生成一个非开发机也能安装的ipa包.你需要一个apple的企业账号(apple有两种账号:开发者账号和企 ...
- R语言笔记006——分组获取描述性统计量
方法一:使用aggregate()分组获取描述性统计量 aggregate(mtcars[vars],by=list(am=mtcars$am),mean) aggregate(mtcars[vars ...
- ggplot笔记001——ggplot2安装
R3.2.2版安装ggplot2 今天安装ggplot2,开始用的是R3.2.1版本,但是一直报错.后面换了一个最新的R3.2.2,但安装时还是一样报错,原因是munsell这个包 ...
- hihoCoder挑战赛27题目一 福字 (dp)
题目: 一个n × n的矩阵,其中每个位置都是一个非负整数. 一个福字被定义成是大小为 k 的正方形,满足其中的每个位置上的数都恰好比他的左边的那个和上边的那个大1(如果左边或上边的那个不存在的话就无 ...
- Nginx虚拟主机配置模板
/////////////////////////////写在前头//////////////////////////////////////////Nginx 服务器中文文档:http://www. ...
- tomcat配置 启动
<Context docBase="E:\apache-tomcat-7.0.50\wtpwebapps\mycms" path="" reloadab ...
- asp.mvc 基本知识
(1)@Html.DisplayNameFor(model => model.Title)是显示列名, (2)@Html.DisplayFor(modelItem => item.Titl ...
- NAVICAT PREMIUM 初识
1 问题运行SQL取数语句出错?. 答案:因为运行SQL文件时没有这个表.新的数据库里面 解决方法: 获取需要建立的二维表创建语句 新建查询 输入语句,点击运行即可 解释注释语句: 主体为五个字段的主 ...