Mysql 5.6 MHA (gtid) on Kylin
mha on Kylin
ip hostname repl role mha role
192.168.19.69 mysql1 master node
192.168.19.73 mysql2 slave node
192.168.19.74 mysql3 slave node/manager
mysql version MySQL Community Server 5.6.15
1 系统配置
[root@localhost ~]# cat /etc/issue
\nKylin 3.3
Kernel \r on an \m
\KRelease (Trial)
[root@localhost ~]# cat /etc/redhat-release
Kylin Linux release 3.3.1707 (Core)
[root@localhost ~]# sysctl -a |grep swap
vm.swappiness = 30
[root@localhost ~]# vim /etc/sysctl.conf
vm.swappiness = 10
[root@localhost ~]# sysctl -p
vm.swappiness = 10
[root@localhost ~]# sysctl -a |grep swap
vm.swappiness = 10
[root@localhost ~]# cat /sys/block/sda/queue/rotational
1
[root@localhost ~]# cat /sys/block/sda/queue/scheduler
noop [deadline] cfq
[root@localhost ~]# vim /etc/profile
HISTTIMEFORMAT="%Y:%M:%D %H-%m-%s"
export=HISTTIMEFORMAT
[root@localhost ~]# date
Thu Jul 12 03:09:27 EDT 2018
[root@localhost ~]# hostname
localhost.localdomain
[root@localhost ~]# vim /etc/hosts
127.0.0.1 mysql1
[root@localhost ~]# vim /etc/sysconfig/network
HOSTNAME=mysql1
[root@localhost ~]# getenforce
Enforcing
[root@localhost ~]# vim /etc/selinux/config
SELINUX=disabled
[root@localhost ~]# systemctl stop firewalld.service
[root@localhost ~]# systemctl disable firewalld.service
[root@localhost ~]# reboot
[root@localhost ~]# hostnamectl set-hostname mysql1
[root@localhost ~]# hostname
mysql1
[root@mysql1 ~]# systemctl status firewalld.service
--# vim /etc/hosts
192.168.19.69 mysql1
192.168.19.73 mysql2
192.168.19.74 mysql3
2 安装mysql
[root@mysql1 ~]# groupadd mysql
[root@mysql1 ~]# useradd -g mysql mysql
[root@mysql1 ~]# passwd mysql
[root@mysql1 ~]# ulimit -n
1024
[root@mysql1 ~]# vim /etc/security/limits.conf
mysql soft nproc 65535
mysql hard nproc 65535
mysql soft nofile 65535
mysql hard nofile 65535
[root@mysql1 ~]# ulimit -n 65535
[root@mysql1 ~]# vim /home/mysql/.bash_profile
export LANG=en_US.UTF-8
export PATH=/usr/local/mysql/bin:$PATH
export MYSQL_PS1="(\u@\h:\p) [\d]> "
yum安装相应的依赖包
[root@mysql1 ~]# yum -y install lrzsz
[root@mysql1 ~]# yum install -y gcc gcc-* make cmake gcc-c++ libaio libaio-devel bison bison-devel autoconf automake zlib* libxml* ncurses-devel libmcrypt* libtool-ltdl-devel* --skip-broken
[root@mysql1 ~]# yum install -y openssl openssl-devel ncurses ncurses-devel
[root@mysql1 ~]# mkdir -p /data/mysqldata/{3306/{data,tmp,binlog,slave,log/iblog},backup,scripts}
[root@mysql1 soft]# yum list|grep mysql
Repository gcc-4.9 is listed more than once in the configuration
Repository other is listed more than once in the configuration
mysql-community-client.x86_64 5.6.15-4.ky3 other
mysql-community-common.x86_64 5.6.15-4.ky3 other
mysql-community-devel.x86_64 5.6.15-4.ky3 other
mysql-community-embedded.x86_64 5.6.15-4.ky3 other
mysql-community-embedded-devel.x86_64
mysql-community-libs.x86_64 5.6.15-4.ky3 other
mysql-community-server.x86_64 5.6.15-4.ky3 other
mysql-community-test.x86_64 5.6.15-4.ky3 other
mysql-connector-odbc.x86_64 5.2.5-6.ky3.kb2 base
[root@mysql1 soft]# tar -zxvf mysql-5.6.15-linux-glibc2.5-x86_64.tar.gz -C /usr/local/
[root@mysql1 soft]# cd /usr/local/
[root@mysql1 local]# ln -s mysql-5.6.15-linux-glibc2.5-x86_64 mysql
[root@mysql1 local]# chown -R mysql:mysql /usr/local/mysql/
[root@mysql1 local]# chown -R mysql:mysql /data/mysqldata/
--my.cnf
[mysql@mysql1 3306]$ /usr/local/mysql/scripts/mysql_install_db --defaults-file=/data/mysqldata/3306/my.cnf --datadir=/data/mysqldata/3306/data --basedir=/usr/local/mysql --user=mysql
[mysql@mysql1 3306]$ /usr/local/mysql/bin/mysqld_safe --defaults-file=/data/mysqldata/3306/my.cnf &
[mysql@mysql1 3306]$ ps -ef|grep mysql
root 2673 2652 0 04:08 pts/1 00:00:00 su - mysql
mysql 2674 2673 0 04:08 pts/1 00:00:00 -bash
mysql 2880 2674 0 04:15 pts/1 00:00:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --defaults-file=/data/mysqldata/3306/my.cnf
mysql 3865 2880 11 04:15 pts/1 00:00:00 /usr/local/mysql/bin/mysqld --defaults-file=/data/mysqldata/3306/my.cnf --basedir=/usr/local/mysql --datadir=/data/mysqldata/3306/data --plugin-dir=/usr/local/mysql/lib/plugin --log-error=/data/mysqldata/3306/log/mysql-error.log --open-files-limit=65535 --pid-file=/data/mysqldata/3306/data/mysql1.pid --socket=/data/mysqldata/3306/mysql.sock --port=3306
mysql 3900 2674 0 04:15 pts/1 00:00:00 ps -ef
mysql 3901 2674 0 04:15 pts/1 00:00:00 grep --color=auto mysql
[mysql@mysql1 3306]$ mysql -S /data/mysqldata/3306/mysql.sock
(root@localhost:mysql.sock) [(none)]> select user,host,password from mysql.user;
+------+-----------+----------+
| user | host | password |
+------+-----------+----------+
| root | localhost | |
| root | mysql1 | |
| root | 127.0.0.1 | |
| root | ::1 | |
| | localhost | |
| | mysql1 | |
+------+-----------+----------+
(root@localhost:mysql.sock) [(none)]> delete from mysql.user where (user,host) not in(select 'root','localhost');
Query OK, 5 rows affected (0.01 sec)
(root@localhost:mysql.sock) [(none)]> select user,host,password from mysql.user;
+------+-----------+----------+
| user | host | password |
+------+-----------+----------+
| root | localhost | |
+------+-----------+----------+
1 row in set (0.00 sec)
(root@localhost:mysql.sock) [(none)]> update mysql.user set password=password('***');
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
(root@localhost:mysql.sock) [(none)]> flush privileges;
Query OK, 0 rows affected (0.00 sec)
(root@localhost:mysql.sock) [(none)]> truncate table mysql.db;
Query OK, 0 rows affected (0.00 sec)
(root@localhost:mysql.sock) [(none)]> grant ALL PRIVILEGES ON *.* TO 'system'@'%' IDENTIFIED BY "***";
Query OK, 0 rows affected (0.00 sec)
(root@localhost:mysql.sock) [(none)]> grant ALL PRIVILEGES ON *.* TO 'system'@'127.0.0.1' IDENTIFIED BY "***";
Query OK, 0 rows affected (0.00 sec)
(root@localhost:mysql.sock) [(none)]> flush privileges;
Query OK, 0 rows affected (0.00 sec)
(root@localhost:mysql.sock) [(none)]> select user,host,password from mysql.user;
+--------+-----------+-------------------------------------------+
| user | host | password |
+--------+-----------+-------------------------------------------+
| root | localhost | *E74858DB86EBA20BC33D0AECAE8A8108C56B17FA |
| system | 127.0.0.1 | *E74858DB86EBA20BC33D0AECAE8A8108C56B17FA |
| system | % | *E74858DB86EBA20BC33D0AECAE8A8108C56B17FA |
+--------+-----------+-------------------------------------------+
[mysql@mysql1 3306]$ mysql -uroot -p -S /data/mysqldata/3306/mysql.sock
Enter password:
--生产环境严格用户权限,密码控制
grant select ,insert ,update,delete on test.* to 'test_prog'@'%'identified by "***" ;
flush privileges;
show grants for test_prog@'%';
revoke insert ,update,delete on test.* from test_prog@'%';
grant insert ,update,delete on test.* to 'test_prog'@'%';
3 主从搭建
192.168.19.69 master
192.168.19.73 slave1
192.168.19.74 slave2
69
(root@localhost:mysql.sock) [(none)]> GRANT REPLICATION SLAVE,RELOAD,SUPER ON *.* to repl@'192.168.%' identified by "***";
--mysqldump
/usr/local/mysql/bin/mysqldump -h 127.0.0.1 -usystem -p'mysql' -A -R -S /data/mysqldata/3306/mysql.sock \
--single-transaction --default-character-set=utf8 --master-data=2 --set-gtid-purged=OFF | gzip > /data/mysqldata/backup/mysqldumpfullbk_20180712.sql.gz
scp mysqldumpfullbk_20180712.sql mysql@192.168.19.73:/data/mysqldata/backup/.
scp mysqldumpfullbk_20180712.sql mysql@192.168.19.74:/data/mysqldata/backup/.
--mysql -usystem -pmysql -h127.0.0.1 -P3306 <mysqldumpfullbk_20180712.sql
(system@127.0.0.1:3306) [(none)]> show global variables like '%gtid%';
+--------------------------+-------+
| Variable_name | Value |
+--------------------------+-------+
| enforce_gtid_consistency | ON |
| gtid_executed | |
| gtid_mode | ON |
| gtid_owned | |
| gtid_purged | |
+--------------------------+-------+
5 rows in set (0.00 sec)
(system@127.0.0.1:3306) [(none)]> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000001 | 151 | | | |
+------------------+----------+--------------+------------------+-------------------+
73 (system@127.0.0.1:3306) [(none)]> show master status;
+------------------+-----------+--------------+------------------+--------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+-----------+--------------+------------------+--------------------------------------------+
| mysql-bin.000001 | 191447949 | | | e68afff3-85b5-11e8-856e-005056bff00b:1-362 |
+------------------+-----------+--------------+------------------+--------------------------------------------+
1 row in set (0.00 sec)
(system@127.0.0.1:3306) [(none)]> show global variables like '%gtid_executed%';
+---------------+--------------------------------------------+
| Variable_name | Value |
+---------------+--------------------------------------------+
| gtid_executed | e68afff3-85b5-11e8-856e-005056bff00b:1-362 |
+---------------+--------------------------------------------+
[system@3306][(none)]> show global variables like '%gtid%';
+--------------------------+--------------------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------------------+
| enforce_gtid_consistency | ON |
| gtid_executed | d4e95eb8-85ae-11e8-8540-000c29450242:1-362 |
| gtid_mode | ON |
| gtid_owned | |
| gtid_purged | |
+--------------------------+--------------------------------------------+
show global variables like '%gtid_executed%';
reset master;
CHANGE MASTER TO
MASTER_HOST='192.168.19.69',
MASTER_USER='repl',
MASTER_PASSWORD='***',
MASTER_PORT=3306,
MASTER_AUTO_POSITION = 1;
74 (system@127.0.0.1:3306) [(none)]> show master status;
+------------------+-----------+--------------+------------------+--------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+-----------+--------------+------------------+--------------------------------------------+
| mysql-bin.000001 | 191447949 | | | e68afff3-85b5-11e8-856e-005056bff00b:1-362 |
+------------------+-----------+--------------+------------------+--------------------------------------------+
1 row in set (0.00 sec)
(system@127.0.0.1:3306) [(none)]> show global variables like 'server_uuid';
+---------------+--------------------------------------+
| Variable_name | Value |
+---------------+--------------------------------------+
| server_uuid | e68afff3-85b5-11e8-856e-005056bff00b |
(system@127.0.0.1:3306) [(none)]> show global variables like '%gtid_executed%';
+---------------+--------------------------------------------+
| Variable_name | Value |
+---------------+--------------------------------------------+
| gtid_executed | e68afff3-85b5-11e8-856e-005056bff00b:1-362 |
+---------------+--------------------------------------------+
(system@127.0.0.1:3306) [mysql]> show global variables like '%gtid%';
+--------------------------+--------------------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------------------+
| enforce_gtid_consistency | ON |
| gtid_executed | e68afff3-85b5-11e8-856e-005056bff00b:1-362 |
| gtid_mode | ON |
| gtid_owned | |
[system@3306][(none)]> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.19.69
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 558
Relay_Log_File: relaylog.000002
Relay_Log_Pos: 768
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 558
Relay_Log_Space: 965
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 201807121
Master_UUID: cab5ec92-85ab-11e8-852c-005056bf5092
Master_Info_File: /data/mysqldata/3306/data/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: cab5ec92-85ab-11e8-852c-005056bf5092:1-2
Executed_Gtid_Set: cab5ec92-85ab-11e8-852c-005056bf5092:1-2
Auto_Position: 1
69 master
[mysql@mysql1 scripts]$ ./mysqlplus.sh
use test;
create table t1(id int,name varchar(10));
insert into t1 values(1,'AAAAA');
commit;
select * from test.t1;
use test;
create table t2(id int,name varchar(10));
insert into t2 values(1,'AAAAA');
commit;
select * from test.t2;
69 (system@127.0.0.1:3306) [test]> show master status;
+------------------+----------+--------------+------------------+------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+------------------------------------------+
| mysql-bin.000001 | 558 | | | cab5ec92-85ab-11e8-852c-005056bf5092:1-2 |
+------------------+----------+--------------+------------------+------------------------------------------+
(system@127.0.0.1:3306) [test]> show global variables like '%gtid%';
+--------------------------+------------------------------------------+
| Variable_name | Value |
+--------------------------+------------------------------------------+
| enforce_gtid_consistency | ON |
| gtid_executed | cab5ec92-85ab-11e8-852c-005056bf5092:1-2 |
| gtid_mode | ON |
| gtid_owned | |
| gtid_purged | |
+--------------------------+------------------------------------------+
73,74
(system@127.0.0.1:3306) [test]> show global variables like '%gtid%';
+--------------------------+------------------------------------------+
| Variable_name | Value |
+--------------------------+------------------------------------------+
| enforce_gtid_consistency | ON |
| gtid_executed | cab5ec92-85ab-11e8-852c-005056bf5092:1-2 |
| gtid_mode | ON |
| gtid_owned | |
| gtid_purged | |
+--------------------------+------------------------------------------+
4 mha 搭建
4.1 设置SSH公钥免密码登录
MHA环境需要三台主机需要相互信任:实现三台主机之间相互免密钥登录。
要是root登陆的话,需要修改ssh的配置文件
vim /etc/ssh/sshd_config 中的 PermitRootLogin ,设置为YES,再重启SSH
#root用户操作(/root),pwd
mkdir ~/.ssh
chmod 700 ~/.ssh
cd ~/.ssh
ssh-keygen -t rsa #一直回车
ls -al
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ls -al
chmod 600 ~/.ssh/authorized_keys
cd ..
scp -r .ssh 192.168.19.73:/root/
scp -r .ssh 192.168.19.74:/root/
ssh mysql1
ssh mysql2
ssh mysql3
[root@mysql1 ~]# ssh mysql2 #这里不用输入密码,这打通了ssh
Last login: Thu Jul 12 05:55:46 2018 from mysql1
[root@mysql2 ~]# logout
Connection to mysql2 closed.
[root@mysql1 ~]# ssh mysql3
[root@mysql3 ~]# logout
Connection to mysql3 closed.
4.2 安装mha manager和node节点
软件下载路径
[root@mysql1 ~]# cd /data/soft/
[root@mysql1 soft]# ll
total 297404
-rw-r--r-- 1 root root 104700 Jun 8 02:34 mha4mysql-manager-0.56.tar.gz
-rw-r--r-- 1 root root 45947 Jun 8 02:34 mha4mysql-node-0.56.tar.gz
在三个节点(node 和 manager)安装perl-DBD-MySQL,用光盘作yum源,#包括了3个包:Perl-DBD-MySQL,mysql-libs,perl-DBI
[root@mysql1 soft]# yum install -y perl-DBD-MySQL perl-DBI mysql-libs
[root@mysql2 soft]# yum install -y perl-DBD-MySQL perl-DBI mysql-libs
[root@mysql3 soft]# yum install -y perl-DBD-MySQL perl-DBI mysql-libs
# yum install perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker -y
在三个节点安装node(包括管理节点)
tar xzvf mha4mysql-node-0.56.tar.gz
cd mha4mysql-node-0.56
perl Makefile.PL
make && make install
[root@mysql1 mha4mysql-node-0.56]# perl Makefile.PL
*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
[Core Features]
- DBI ...loaded. (1.627)
- DBD::mysql ...loaded. (4.023)
*** Module::AutoInstall configuration finished.
Checking if your kit is complete...
Looks good
Writing Makefile for mha4mysql::node
在管理节点安装manager
#yum install -y perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager \
perl-Config-IniFiles rrdtool perl-rrdtool rrdtool-devel perl-Params-Validate perl-Time-HiRes \
perl-MIME-Types perl-Email-Date-Format perl-Mail-Sendmail perl-Mail-Sender perl-TimeDate\
perl-MailTools perl-MIME-Lite
# rpm -q perl-CPAN
# yum install -y perl-CPAN
# yum install -y 'perl(CGI)' 'perl(LWP::UserAgent)'
[root@mysql3 soft]# rpm -ivh perl-Log-Dispatch-2.41-1.el7.1.noarch.rpm --nodeps
[root@mysql3 soft]# rpm -ivh perl-Parallel-ForkManager-1.05-1.el7.noarch.rpm
tar xzvf mha4mysql-manager-0.56.tar.gz
cd mha4mysql-manager-0.56
perl Makefile.PL
make && make install
[root@mysql3 mha4mysql-manager-0.56]# perl Makefile.PL
*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
[Core Features]
- DBI ...loaded. (1.627)
- DBD::mysql ...loaded. (4.023)
- Time::HiRes ...loaded. (1.9725)
- Config::Tiny ...loaded. (2.23)
- Log::Dispatch ...missing.
- Parallel::ForkManager ...loaded. (1.05)
- MHA::NodeConst ...loaded. (0.56)
==> Auto-install the 1 mandatory module(s) from CPAN? [y] y
*** Dependencies will be installed the next time you type 'make'.
*** Module::AutoInstall configuration finished.
Checking if your kit is complete...
Looks good
Writing Makefile for mha4mysql::manager
[root@mysql3 mha4mysql-manager-0.56]# which masterha_check_ssh
/usr/local/bin/masterha_check_ssh
[root@mysql3 ~]# ll /usr/local/bin/masterha_*
-r-xr-xr-x 1 root root 1995 Jul 12 06:22 /usr/local/bin/masterha_check_repl
-r-xr-xr-x 1 root root 1779 Jul 12 06:22 /usr/local/bin/masterha_check_ssh
-r-xr-xr-x 1 root root 1865 Jul 12 06:22 /usr/local/bin/masterha_check_status
-r-xr-xr-x 1 root root 3201 Jul 12 06:22 /usr/local/bin/masterha_conf_host
-r-xr-xr-x 1 root root 2517 Jul 12 06:22 /usr/local/bin/masterha_manager
-r-xr-xr-x 1 root root 2165 Jul 12 06:22 /usr/local/bin/masterha_master_monitor
-r-xr-xr-x 1 root root 2373 Jul 12 06:22 /usr/local/bin/masterha_master_switch
-r-xr-xr-x 1 root root 3879 Jul 12 06:22 /usr/local/bin/masterha_secondary_check
-r-xr-xr-x 1 root root 1739 Jul 12 06:22 /usr/local/bin/masterha_stop
===
rpm -ivh perl-Mail-Sender-0.8.21-2.el7.noarch.rpm
rpm -ivh perl-Mail-Sendmail-0.79-21.el7.noarch.rpm
rpm -ivh perl-MIME-Types-1.38-2.el7.noarch.rpm
rpm -ivh perl-TimeDate-2.30-2.el7.noarch.rp
#Can't locate Class/Load.pm in @INC
rpm -ivh perl-Module-Runtime-0.013-4.el7.noarch.rpm
rpm -ivh perl-Config-Tiny-2.14-7.el7.noarch.rpm
rpm -ivh perl-Try-Tiny-0.12-2.el7.noarch.rpm
rpm -ivh perl-Module-Implementation-0.06-6.el7.noarch.rpm
rpm -ivh perl-Sub-Install-0.926-6.el7.noarch.rpm
rpm -ivh perl-Params-Util-1.07-6.el7.x86_64.rpm
rpm -ivh perl-Data-OptList-0.107-9.el7.noarch.rpm
rpm -ivh perl-Package-Stash-XS-0.26-3.el7.x86_64.rpm
rpm -ivh perl-List-MoreUtils-0.33-9.el7.x86_64.rpm
rpm -ivh perl-Package-DeprecationManager-0.13-7.el7.noarch.rpm
rpm -ivh perl-Package-Stash-0.34-2.el7.noarch.rpm
rpm -ivh perl-Class-Load-0.20-3.el7.noarch.rpm
#Can't locate Params/Validate.pm
rpm -ivh perl-Params-Validate-1.08-4.el7.x86_64.rpm
===
管理节点安装完毕后就应该去编辑主配文件了
[root@mysql3 soft]# mkdir -p /data/mha/etc
[root@mysql3 soft]# mkdir -p /data/mha/etc/app
[root@mysql3 soft]# mkdir -p /data/mha/log
[root@mysql3 soft]# cd /data/mha/etc/
[root@mysql3 etc]# touch app.cnf
[root@mysql3 etc]# chmod +x master_ip_failover
====
master_ip_failover #自动切换时vip管理的脚本,不是必须,如果我们使用keepalived的,我们可以自己编写脚本完成对vip的管理,比如监控mysql,如果mysql异常,我们停止keepalived就行,这样vip就会自动漂移
master_ip_online_change #在线切换时vip的管理,不是必须,同样可以可以自行编写简单的shell完成
power_manager #故障发生后关闭主机的脚本,不是必须
send_report #因故障切换后发送报警的脚本,不是必须,可自行编写简单的shell完成
masterha_secondary_check 一旦MHA到master的监控之间出现问题,MHA Manager将会判断其它两个slave是否能建立到master_ip 3306端口的连接
====
####################常用命令###############
/usr/local/bin/masterha_check_ssh --conf=/data/mha/etc/app.cnf
/usr/local/bin/masterha_check_repl --conf=/data/mha/etc/app.cnf
/usr/local/bin/masterha_manager --conf=/data/mha/etc/app.cnf &
/usr/local/bin/masterha_check_status --conf=/data/mha/etc/app.cnf
/usr/local/bin/masterha_stop --conf=/data/mha/etc/app.cnf
nohup /usr/local/bin/masterha_manager --conf=/data/mha/etc/app.cnf --remove_dead_master_conf --ignore_last_failover > /data/mha/log/manager.log < /dev/null 2>&1 &
====
/usr/local/bin/save_binary_logs
show slave status;master_log_file, master_log_pos
mysqlbinlog save_binlog_xxx,
1.偿试读取故障的master日志
2.找binlog server上的日志差异(56引入binlog_server)
3.去slave上找relay-log
==把所有的slave数据补成一样的
/usr/local/bin/apply_diff_relay_logs
[root@mysql3 etc]# /usr/local/bin/masterha_check_ssh --conf=/data/mha/etc/app.cnf
Can't locate Class/Load.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usr/share/perl5/vendor_perl/Log/Dispatch.pm line 12.
BEGIN failed--compilation aborted at /usr/share/perl5/vendor_perl/Log/Dispatch.pm line 12.
Compilation failed in require at /usr/local/share/perl5/MHA/SSHCheck.pm line 29.
BEGIN failed--compilation aborted at /usr/local/share/perl5/MHA/SSHCheck.pm line 29.
Compilation failed in require at /usr/local/bin/masterha_check_ssh line 25.
BEGIN failed--compilation aborted at /usr/local/bin/masterha_check_ssh line 25.
[root@mysql3 soft]# /usr/local/bin/masterha_check_ssh --conf=/data/mha/etc/app.cnf
Can't locate Params/Validate.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usr/share/perl5/vendor_perl/Log/Dispatch.pm line 13.
BEGIN failed--compilation aborted at /usr/share/perl5/vendor_perl/Log/Dispatch.pm line 13.
Compilation failed in require at /usr/local/share/perl5/MHA/SSHCheck.pm line 29.
BEGIN failed--compilation aborted at /usr/local/share/perl5/MHA/SSHCheck.pm line 29.
Compilation failed in require at /usr/local/bin/masterha_check_ssh line 25.
BEGIN failed--compilation aborted at /usr/local/bin/masterha_check_ssh line 25.
[root@mysql3 soft]# /usr/local/bin/masterha_check_ssh --conf=/data/mha/etc/app.cnf
Thu Jul 12 22:02:59 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
/data/mha/etc/app.cnf:
at /usr/local/share/perl5/MHA/SSHCheck.pm line 148.
==详细检查,是配置文件中的[server3]少了]。。。。。。
==ssh检查成功
[root@mysql3 etc]# /usr/local/bin/masterha_check_ssh --conf=/data/mha/etc/app.cnf
Thu Jul 12 22:15:11 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Jul 12 22:15:11 2018 - [info] Reading application default configurations from /data/mha/etc/app.cnf..
Thu Jul 12 22:15:11 2018 - [info] Reading server configurations from /data/mha/etc/app.cnf..
Thu Jul 12 22:15:11 2018 - [info] Starting SSH connection tests..
Thu Jul 12 22:15:12 2018 - [debug]
Thu Jul 12 22:15:11 2018 - [debug] Connecting via SSH from root@192.168.19.69(192.168.19.69:22) to root@192.168.19.73(192.168.19.73:22)..
Warning: Permanently added '192.168.19.69' (ECDSA) to the list of known hosts.
Thu Jul 12 22:15:12 2018 - [debug] ok.
Thu Jul 12 22:15:12 2018 - [debug] Connecting via SSH from root@192.168.19.69(192.168.19.69:22) to root@192.168.19.74(192.168.19.74:22)..
Thu Jul 12 22:15:12 2018 - [debug] ok.
Thu Jul 12 22:15:12 2018 - [debug]
Thu Jul 12 22:15:12 2018 - [debug] Connecting via SSH from root@192.168.19.73(192.168.19.73:22) to root@192.168.19.69(192.168.19.69:22)..
Warning: Permanently added '192.168.19.69' (ECDSA) to the list of known hosts.
Thu Jul 12 22:15:12 2018 - [debug] ok.
Thu Jul 12 22:15:12 2018 - [debug] Connecting via SSH from root@192.168.19.73(192.168.19.73:22) to root@192.168.19.74(192.168.19.74:22)..
Thu Jul 12 22:15:12 2018 - [debug] ok.
Thu Jul 12 22:15:13 2018 - [debug]
Thu Jul 12 22:15:12 2018 - [debug] Connecting via SSH from root@192.168.19.74(192.168.19.74:22) to root@192.168.19.69(192.168.19.69:22)..
Thu Jul 12 22:15:13 2018 - [debug] ok.
Thu Jul 12 22:15:13 2018 - [debug] Connecting via SSH from root@192.168.19.74(192.168.19.74:22) to root@192.168.19.73(192.168.19.73:22)..
Thu Jul 12 22:15:13 2018 - [debug] ok.
Thu Jul 12 22:15:13 2018 - [info] All SSH connection tests passed successfully.
==检查复制状态,有提示slave [warning] relay_log_purge=0 is not set on slave 192.168.19.73(192.168.19.73:3306)
[root@mysql3 etc]# /usr/local/bin/masterha_check_repl --conf=/data/mha/etc/app.cnf
Thu Jul 12 22:17:53 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Jul 12 22:17:53 2018 - [info] Reading application default configurations from /data/mha/etc/app.cnf..
Thu Jul 12 22:17:53 2018 - [info] Reading server configurations from /data/mha/etc/app.cnf..
Thu Jul 12 22:17:53 2018 - [info] MHA::MasterMonitor version 0.56.
Thu Jul 12 22:17:53 2018 - [info] Dead Servers:
Thu Jul 12 22:17:53 2018 - [info] Alive Servers:
Thu Jul 12 22:17:53 2018 - [info] 192.168.19.69(192.168.19.69:3306)
Thu Jul 12 22:17:53 2018 - [info] 192.168.19.73(192.168.19.73:3306)
Thu Jul 12 22:17:53 2018 - [info] 192.168.19.74(192.168.19.74:3306)
Thu Jul 12 22:17:53 2018 - [info] Alive Slaves:
Thu Jul 12 22:17:53 2018 - [info] 192.168.19.73(192.168.19.73:3306) Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Thu Jul 12 22:17:53 2018 - [info] GTID ON
Thu Jul 12 22:17:53 2018 - [info] Replicating from 192.168.19.69(192.168.19.69:3306)
Thu Jul 12 22:17:53 2018 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Jul 12 22:17:53 2018 - [info] 192.168.19.74(192.168.19.74:3306) Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Thu Jul 12 22:17:53 2018 - [info] GTID ON
Thu Jul 12 22:17:53 2018 - [info] Replicating from 192.168.19.69(192.168.19.69:3306)
Thu Jul 12 22:17:53 2018 - [info] Current Alive Master: 192.168.19.69(192.168.19.69:3306)
Thu Jul 12 22:17:53 2018 - [info] Checking slave configurations..
Thu Jul 12 22:17:53 2018 - [info] read_only=1 is not set on slave 192.168.19.73(192.168.19.73:3306).
Thu Jul 12 22:17:53 2018 - [warning] relay_log_purge=0 is not set on slave 192.168.19.73(192.168.19.73:3306).
Thu Jul 12 22:17:53 2018 - [info] read_only=1 is not set on slave 192.168.19.74(192.168.19.74:3306).
Thu Jul 12 22:17:53 2018 - [warning] relay_log_purge=0 is not set on slave 192.168.19.74(192.168.19.74:3306).
Thu Jul 12 22:17:53 2018 - [info] Checking replication filtering settings..
Thu Jul 12 22:17:53 2018 - [info] binlog_do_db= , binlog_ignore_db=
Thu Jul 12 22:17:53 2018 - [info] Replication filtering check ok.
Thu Jul 12 22:17:53 2018 - [info] GTID is supported. Skipping all SSH and Node package checking.
Thu Jul 12 22:17:53 2018 - [info] Checking SSH publickey authentication settings on the current master..
Thu Jul 12 22:17:53 2018 - [info] HealthCheck: SSH to 192.168.19.69 is reachable.
Thu Jul 12 22:17:53 2018 - [info]
192.168.19.69 (current master)
+--192.168.19.73
+--192.168.19.74
Thu Jul 12 22:17:53 2018 - [info] Checking replication health on 192.168.19.73..
Thu Jul 12 22:17:53 2018 - [info] ok.
Thu Jul 12 22:17:53 2018 - [info] Checking replication health on 192.168.19.74..
Thu Jul 12 22:17:53 2018 - [info] ok.
Thu Jul 12 22:17:53 2018 - [info] Checking master_ip_failover_script status:
Thu Jul 12 22:17:53 2018 - [info] /data/mha/etc/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.19.69 --orig_master_ip=192.168.19.69 --orig_master_port=3306
IN SCRIPT TEST====/sbin/ifconfig ens160:1 down==/sbin/ifconfig ens160:1 192.168.19.68/24===
Checking the Status of the script.. OK
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:50:56:bf:50:92 brd ff:ff:ff:ff:ff:ff
inet 192.168.19.69/24 brd 192.168.19.255 scope global ens160
valid_lft forever preferred_lft forever
inet 192.168.19.68/24 brd 192.168.19.255 scope global secondary ens160:0
valid_lft forever preferred_lft forever
inet6 fe80::4bdb:f907:5f14:a62b/64 scope link
valid_lft forever preferred_lft forever
Sun Jul 15 21:53:58 2018 - [info] OK.
Sun Jul 15 21:53:58 2018 - [warning] shutdown_script is not defined.
Sun Jul 15 21:53:58 2018 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
==检查mha状态
[root@mysql3 etc]# /usr/local/bin/masterha_check_status --conf=/data/mha/etc/app.cnf
app is stopped(2:NOT_RUNNING). #表示未启动mha
#启动mha manager
--remove_dead_master_conf 当mha发生切换以后,老的主库的ip将会从配置文件中移除
--ignore_last_failover,在缺省情况下,如果MHA检测到连续发生宕机,且两次宕机间隔不足8小时的话,则不会进行Failover,之所以这样限制是为了避免ping-pong效应。
默认情况,mha发生切换会产生app.failover.complete文件,下次再切换的时候如果发现目录下还有该文件,将不允许切换
,除非在第一次切换后手动rm 该文件,出于方便,每次启动加该参数
#nohup /usr/local/bin/masterha_manager --conf=/data/mha/etc/app.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/mha/log/manager.log 2>&1 &
[root@mysql3 etc]# nohup /usr/local/bin/masterha_manager --conf=/data/mha/etc/app.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/mha/log/manager.log 2>&1 &
[1] 14550
[root@mysql3 etc]# /usr/local/bin/masterha_check_status --conf=/data/mha/etc/app.cnf
app (pid:14550) is running(0:PING_OK), master:192.168.19.69
#master 增加虚拟ip(采用脚本管理vip)
在用虚拟IP的时候,需要在开启MHA程序之前要把虚拟IP先设置到主上去,否则MHA是不会自己的去设置VIP,第一次设置VIP之后,后续脚本的故障转移等功能会自动的对VIP进行切换。
### /sbin/ifconfig eth0:1 10.0.1.168 netmask 255.255.255.0 up
### ip addr del 192.168.19.68/24 dev ens160
设置:
ifconfig eth0:1 192.168.100.xxx
移除:
ifconfig eth0:1 down
备注:用ifconfig命令配置的网卡信息,在网卡重启后机器重启后,配置就不存在。
mha第一次启动的时候,需要手动加上vip
[root@mysql1 ~]# /sbin/ifconfig ens160:1 192.168.19.68/24
[root@mysql1 ~]# ip add
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:50:56:bf:50:92 brd ff:ff:ff:ff:ff:ff
inet 192.168.19.69/24 brd 192.168.19.255 scope global ens160
valid_lft forever preferred_lft forever
inet 192.168.19.68/24 brd 192.168.19.255 scope global secondary ens160:0
valid_lft forever preferred_lft forever
inet6 fe80::4bdb:f907:5f14:a62b/64 scope link
valid_lft forever preferred_lft forever
#关闭mha
#/usr/local/bin/masterha_stop --conf=/data/mha/etc/app.cnf
use test;
insert into t1 values(2,'BBBBBB');
insert into t1 values(3,'444');
insert into t1 values(4,'455');
insert into t1 values(5,'75');
select * from test.t1;
insert into t1 values(12,'1BBBBBB');
insert into t1 values(13,'1444');
insert into t1 values(14,'1455');
insert into t1 values(15,'175');
select * from test.t1;
select @@server_id;
5故障切换
http://www.cnblogs.com/gomysql/p/3675429.html
http://www.cnblogs.com/zhoujinyi/p/3808673.html
https://blog.csdn.net/wjhhjl/article/details/53705157
--slave,两台slave服务器设置read_only(从库对外提供读服务,只所以没有写进配置文件,是因为随时slave会提升为master)
mysql -usystem -p*** -h127.0.0.1 -P3306 -e'set global relay_log_purge=OFF;'
mysql -usystem -p*** -h127.0.0.1 -P3306 -e'set global read_only=1;'
注意
MHA在发生切换的过程中,从库的恢复过程中依赖于relay log的相关信息,所以这里要将relay log的自动清除设置为OFF
采用手动清除relay log的方式。在默认情况下,从服务器上的中继日志会在SQL线程执行完毕后被自动删除。
但是在MHA环境中,这些中继日志在恢复其他从服务器时可能会被用到,因此需要禁用中继日志的自动删除功能。
MHA节点中包含了pure_relay_logs命令工具
#cat purge_relay_log.sh
#!/bin/bash
user=system
passwd=***
port=3306
log_dir='/data/mha/log/'
work_dir='/data/mha/'
purge='/usr/local/bin/purge_relay_logs'
if [ ! -d $log_dir ]
then
mkdir $log_dir -p
fi
$purge --user=$user --password=$passwd --disable_relay_log_purge --port=$port --workdir=$work_dir >> $log_dir/purge_relay_logs.log 2>&1
定时任务
#crontab -l
0 4 * * * /bin/bash /root/purge_relay_log.sh
手动清除
# /usr/local/bin/purge_relay_logs --user=system --password=****--port=3306 --disable_relay_log_purge --workdir=/data/mha/
5.1 自动Failover
192.168.19.69 mysql1 master node
192.168.19.73 mysql2 slave node
192.168.19.74 mysql3 slave node/manager
必须先启动MHA Manager,否则无法自动切换,当然手动切换不需要开启MHA Manager监控
一旦发生切换管理进程(Manager)将会退出,app.cnf配置文件将删除server1的配置,无法进行再次测试,需将故障数据库解决掉之后,重新change加入到MHA环境中来,
并且要保证app1.failover.complete不存在或则加上--ignore_last_failover参数忽略,才能再次开启管理进程。
# nohup /usr/local/bin/masterha_manager --conf=/data/mha/etc/app.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/mha/log/manager.log 2>&1 &
# /usr/local/bin/masterha_check_status --conf=/data/mha/etc/app.cnf
[root@mysql3 ~]# /usr/local/bin/masterha_secondary_check -s 192.168.19.73 -s 192.168.19.74 --user=root --master_host=192.168.19.69 --master_ip=192.168.19.69 --master_port=3306
Master is reachable from 192.168.19.73!
[root@mysql3 sysbench]# /usr/local/bin/masterha_check_status --conf=/data/mha/etc/app.cnf
app (pid:347) is running(0:PING_OK), master:192.168.19.69
use test;
insert into t2 values(2,'BBBBBB');
insert into t2 values(3,'444');
insert into t2 values(4,'455');
insert into t2 values(5,'75');
select * from test.t2;
commit;
[mysql@mysql2 scripts]$ mysql -usystem -p*** -h127.0.0.1 -P3306 -e'select * from test.t2;'
[mysql@mysql1 scripts]$ ./mysql_db_shutdown.sh
[root@mysql3 etc]# tail -f -n 100 /data/mha/log/manager.log
Mon Jul 16 05:37:09 2018 - [warning] Got error on MySQL select ping: 2013 (Lost connection to MySQL server during query)
Mon Jul 16 05:37:09 2018 - [info] Executing SSH check script: exit 0
Mon Jul 16 05:37:10 2018 - [info] HealthCheck: SSH to 192.168.19.69 is reachable.
Mon Jul 16 05:37:10 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.19.69' (111))
Mon Jul 16 05:37:10 2018 - [warning] Connection failed 1 time(s)..
Mon Jul 16 05:37:11 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.19.69' (111))
Mon Jul 16 05:37:11 2018 - [warning] Connection failed 2 time(s)..
Mon Jul 16 05:37:12 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.19.69' (111))
Mon Jul 16 05:37:12 2018 - [warning] Connection failed 3 time(s)..
Mon Jul 16 05:37:12 2018 - [warning] Master is not reachable from health checker!
Mon Jul 16 05:37:12 2018 - [warning] Master 192.168.19.69(192.168.19.69:3306) is not reachable!
Mon Jul 16 05:37:12 2018 - [warning] SSH is reachable.
Mon Jul 16 05:37:12 2018 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /data/mha/etc/app.cnf again, and trying to connect to all servers to check server status..
Mon Jul 16 05:37:12 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jul 16 05:37:12 2018 - [info] Reading application default configurations from /data/mha/etc/app.cnf..
Mon Jul 16 05:37:12 2018 - [info] Reading server configurations from /data/mha/etc/app.cnf..
Mon Jul 16 05:37:13 2018 - [info] Dead Servers:
Mon Jul 16 05:37:13 2018 - [info] 192.168.19.69(192.168.19.69:3306)
Mon Jul 16 05:37:13 2018 - [info] Alive Servers:
Mon Jul 16 05:37:13 2018 - [info] 192.168.19.73(192.168.19.73:3306)
Mon Jul 16 05:37:13 2018 - [info] 192.168.19.74(192.168.19.74:3306)
Mon Jul 16 05:37:13 2018 - [info] Alive Slaves:
Mon Jul 16 05:37:13 2018 - [info] 192.168.19.73(192.168.19.73:3306) Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 16 05:37:13 2018 - [info] GTID ON
Mon Jul 16 05:37:13 2018 - [info] Replicating from 192.168.19.69(192.168.19.69:3306)
Mon Jul 16 05:37:13 2018 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jul 16 05:37:13 2018 - [info] 192.168.19.74(192.168.19.74:3306) Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 16 05:37:13 2018 - [info] GTID ON
Mon Jul 16 05:37:13 2018 - [info] Replicating from 192.168.19.69(192.168.19.69:3306)
Mon Jul 16 05:37:13 2018 - [info] Checking slave configurations..
Mon Jul 16 05:37:13 2018 - [info] Checking replication filtering settings..
Mon Jul 16 05:37:13 2018 - [info] Replication filtering check ok.
Mon Jul 16 05:37:13 2018 - [info] Master is down!
Mon Jul 16 05:37:13 2018 - [info] Terminating monitoring script.
Mon Jul 16 05:37:13 2018 - [info] Got exit code 20 (Master dead).
Mon Jul 16 05:37:13 2018 - [info] MHA::MasterFailover version 0.56.
Mon Jul 16 05:37:13 2018 - [info] Starting master failover.
Mon Jul 16 05:37:13 2018 - [info]
Mon Jul 16 05:37:13 2018 - [info] * Phase 1: Configuration Check Phase..
Mon Jul 16 05:37:13 2018 - [info]
Mon Jul 16 05:37:13 2018 - [info] Dead Servers:
Mon Jul 16 05:37:13 2018 - [info] 192.168.19.69(192.168.19.69:3306)
Mon Jul 16 05:37:13 2018 - [info] Checking master reachability via mysql(double check)..
Mon Jul 16 05:37:13 2018 - [info] ok.
Mon Jul 16 05:37:13 2018 - [info] Alive Servers:
Mon Jul 16 05:37:13 2018 - [info] 192.168.19.73(192.168.19.73:3306)
Mon Jul 16 05:37:13 2018 - [info] 192.168.19.74(192.168.19.74:3306)
Mon Jul 16 05:37:13 2018 - [info] Alive Slaves:
Mon Jul 16 05:37:13 2018 - [info] 192.168.19.73(192.168.19.73:3306) Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 16 05:37:13 2018 - [info] GTID ON
Mon Jul 16 05:37:13 2018 - [info] Replicating from 192.168.19.69(192.168.19.69:3306)
Mon Jul 16 05:37:13 2018 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jul 16 05:37:13 2018 - [info] 192.168.19.74(192.168.19.74:3306) Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 16 05:37:13 2018 - [info] GTID ON
Mon Jul 16 05:37:13 2018 - [info] Replicating from 192.168.19.69(192.168.19.69:3306)
Mon Jul 16 05:37:13 2018 - [info] ** Phase 1: Configuration Check Phase completed.
Mon Jul 16 05:37:13 2018 - [info]
Mon Jul 16 05:37:13 2018 - [info] * Phase 2: Dead Master Shutdown Phase..
Mon Jul 16 05:37:13 2018 - [info]
Mon Jul 16 05:37:13 2018 - [info] Forcing shutdown so that applications never connect to the current master..
Mon Jul 16 05:37:13 2018 - [info] Executing master IP deactivatation script:
Mon Jul 16 05:37:13 2018 - [info] /data/mha/etc/master_ip_failover --orig_master_host=192.168.19.69 --orig_master_ip=192.168.19.69 --orig_master_port=3306 --command=stopssh --ssh_user=root
IN SCRIPT TEST====/sbin/ifconfig ens160:1 down==/sbin/ifconfig ens160:1 192.168.19.68/24===
Disabling the VIP on old master: 192.168.19.69
SIOCSIFFLAGS: Cannot assign requested address
Mon Jul 16 05:37:13 2018 - [info] done.
Mon Jul 16 05:37:13 2018 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Mon Jul 16 05:37:13 2018 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Mon Jul 16 05:37:13 2018 - [info]
Mon Jul 16 05:37:13 2018 - [info] * Phase 3: Master Recovery Phase..
Mon Jul 16 05:37:13 2018 - [info]
Mon Jul 16 05:37:13 2018 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Mon Jul 16 05:37:13 2018 - [info]
Mon Jul 16 05:37:13 2018 - [info] The latest binary log file/position on all slaves is mysql-bin.000001:1532
Mon Jul 16 05:37:13 2018 - [info] Retrieved Gtid Set: cab5ec92-85ab-11e8-852c-005056bf5092:1-6
Mon Jul 16 05:37:13 2018 - [info] Latest slaves (Slaves that received relay log files to the latest):
Mon Jul 16 05:37:13 2018 - [info] 192.168.19.73(192.168.19.73:3306) Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 16 05:37:13 2018 - [info] GTID ON
Mon Jul 16 05:37:13 2018 - [info] Replicating from 192.168.19.69(192.168.19.69:3306)
Mon Jul 16 05:37:13 2018 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jul 16 05:37:13 2018 - [info] 192.168.19.74(192.168.19.74:3306) Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 16 05:37:13 2018 - [info] GTID ON
Mon Jul 16 05:37:13 2018 - [info] Replicating from 192.168.19.69(192.168.19.69:3306)
Mon Jul 16 05:37:13 2018 - [info] The oldest binary log file/position on all slaves is mysql-bin.000001:1532
Mon Jul 16 05:37:13 2018 - [info] Retrieved Gtid Set: cab5ec92-85ab-11e8-852c-005056bf5092:1-6
Mon Jul 16 05:37:13 2018 - [info] Oldest slaves:
Mon Jul 16 05:37:13 2018 - [info] 192.168.19.73(192.168.19.73:3306) Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 16 05:37:13 2018 - [info] GTID ON
Mon Jul 16 05:37:13 2018 - [info] Replicating from 192.168.19.69(192.168.19.69:3306)
Mon Jul 16 05:37:13 2018 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jul 16 05:37:13 2018 - [info] 192.168.19.74(192.168.19.74:3306) Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 16 05:37:13 2018 - [info] GTID ON
Mon Jul 16 05:37:13 2018 - [info] Replicating from 192.168.19.69(192.168.19.69:3306)
Mon Jul 16 05:37:13 2018 - [info]
Mon Jul 16 05:37:13 2018 - [info] * Phase 3.3: Determining New Master Phase..
Mon Jul 16 05:37:13 2018 - [info]
Mon Jul 16 05:37:13 2018 - [info] Searching new master from slaves..
Mon Jul 16 05:37:13 2018 - [info] Candidate masters from the configuration file:
Mon Jul 16 05:37:13 2018 - [info] 192.168.19.73(192.168.19.73:3306) Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 16 05:37:13 2018 - [info] GTID ON
Mon Jul 16 05:37:13 2018 - [info] Replicating from 192.168.19.69(192.168.19.69:3306)
Mon Jul 16 05:37:13 2018 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jul 16 05:37:13 2018 - [info] Non-candidate masters:
Mon Jul 16 05:37:13 2018 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Mon Jul 16 05:37:13 2018 - [info] New master is 192.168.19.73(192.168.19.73:3306)
Mon Jul 16 05:37:13 2018 - [info] Starting master failover..
Mon Jul 16 05:37:13 2018 - [info]
From:
192.168.19.69 (current master)
+--192.168.19.73
+--192.168.19.74
To:
192.168.19.73 (new master)
+--192.168.19.74
Mon Jul 16 05:37:13 2018 - [info]
Mon Jul 16 05:37:13 2018 - [info] * Phase 3.3: New Master Recovery Phase..
Mon Jul 16 05:37:13 2018 - [info]
Mon Jul 16 05:37:13 2018 - [info] Waiting all logs to be applied..
Mon Jul 16 05:37:13 2018 - [info] done.
Mon Jul 16 05:37:13 2018 - [info] Getting new master's binlog name and position..
Mon Jul 16 05:37:13 2018 - [info] mysql-bin.000001:1487
Mon Jul 16 05:37:13 2018 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.19.73', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Mon Jul 16 05:37:13 2018 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: mysql-bin.000001, 1487, cab5ec92-85ab-11e8-852c-005056bf5092:1-6
Mon Jul 16 05:37:13 2018 - [info] Executing master IP activate script:
Mon Jul 16 05:37:13 2018 - [info] /data/mha/etc/master_ip_failover --command=start --ssh_user=root --orig_master_host=192.168.19.69 --orig_master_ip=192.168.19.69 --orig_master_port=3306 --new_master_host=192.168.19.73 --new_master_ip=192.168.19.73 --new_master_port=3306 --new_master_user='system' --new_master_password='mysql'
Unknown option: new_master_user
Unknown option: new_master_password
IN SCRIPT TEST====/sbin/ifconfig ens160:1 down==/sbin/ifconfig ens160:1 192.168.19.68/24===
Enabling the VIP - 192.168.19.68/24 on the new master - 192.168.19.73
Mon Jul 16 05:37:15 2018 - [info] OK.
Mon Jul 16 05:37:15 2018 - [info] Setting read_only=0 on 192.168.19.73(192.168.19.73:3306)..
Mon Jul 16 05:37:15 2018 - [info] ok.
Mon Jul 16 05:37:15 2018 - [info] ** Finished master recovery successfully.
Mon Jul 16 05:37:15 2018 - [info] * Phase 3: Master Recovery Phase completed.
Mon Jul 16 05:37:15 2018 - [info]
Mon Jul 16 05:37:15 2018 - [info] * Phase 4: Slaves Recovery Phase..
Mon Jul 16 05:37:15 2018 - [info]
Mon Jul 16 05:37:15 2018 - [info]
Mon Jul 16 05:37:15 2018 - [info] * Phase 4.1: Starting Slaves in parallel..
Mon Jul 16 05:37:15 2018 - [info]
Mon Jul 16 05:37:15 2018 - [info] -- Slave recovery on host 192.168.19.74(192.168.19.74:3306) started, pid: 4557. Check tmp log /data/mha/etc/app/192.168.19.74_3306_20180716053713.log if it takes time..
Mon Jul 16 05:37:16 2018 - [info]
Mon Jul 16 05:37:16 2018 - [info] Log messages from 192.168.19.74 ...
Mon Jul 16 05:37:16 2018 - [info]
Mon Jul 16 05:37:15 2018 - [info] Resetting slave 192.168.19.74(192.168.19.74:3306) and starting replication from the new master 192.168.19.73(192.168.19.73:3306)..
Mon Jul 16 05:37:15 2018 - [info] Executed CHANGE MASTER.
Mon Jul 16 05:37:16 2018 - [info] Slave started.
Mon Jul 16 05:37:16 2018 - [info] End of log messages from 192.168.19.74.
Mon Jul 16 05:37:16 2018 - [info] -- Slave on host 192.168.19.74(192.168.19.74:3306) started.
Mon Jul 16 05:37:16 2018 - [info] All new slave servers recovered successfully.
Mon Jul 16 05:37:16 2018 - [info]
Mon Jul 16 05:37:16 2018 - [info] * Phase 5: New master cleanup phase..
Mon Jul 16 05:37:16 2018 - [info]
Mon Jul 16 05:37:16 2018 - [info] Resetting slave info on the new master..
Mon Jul 16 05:37:16 2018 - [info] 192.168.19.73: Resetting slave info succeeded.
Mon Jul 16 05:37:16 2018 - [info] Master failover to 192.168.19.73(192.168.19.73:3306) completed successfully.
Mon Jul 16 05:37:16 2018 - [info] Deleted server1 entry from /data/mha/etc/app.cnf .
Mon Jul 16 05:37:16 2018 - [info]
----- Failover Report -----
app: MySQL Master failover 192.168.19.69 to 192.168.19.73 succeeded
Master 192.168.19.69 is down!
Check MHA Manager logs at mysql3:/data/mha/log/manager.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.19.69.
Selected 192.168.19.73 as a new master.
192.168.19.73: OK: Applying all logs succeeded.
192.168.19.73: OK: Activated master IP address.
192.168.19.74: OK: Slave started, replicating from 192.168.19.73.
192.168.19.73: Resetting slave info succeeded.
Master failover to 192.168.19.73(192.168.19.73:3306) completed successfully.
[root@mysql3 etc]# /usr/local/bin/masterha_check_repl --conf=/data/mha/etc/app.cnf
Mon Jul 16 05:38:25 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jul 16 05:38:25 2018 - [info] Reading application default configurations from /data/mha/etc/app.cnf..
Mon Jul 16 05:38:25 2018 - [info] Reading server configurations from /data/mha/etc/app.cnf..
Mon Jul 16 05:38:25 2018 - [info] MHA::MasterMonitor version 0.56.
Mon Jul 16 05:38:25 2018 - [info] Dead Servers:
Mon Jul 16 05:38:25 2018 - [info] Alive Servers:
Mon Jul 16 05:38:25 2018 - [info] 192.168.19.73(192.168.19.73:3306)
Mon Jul 16 05:38:25 2018 - [info] 192.168.19.74(192.168.19.74:3306)
Mon Jul 16 05:38:25 2018 - [info] Alive Slaves:
Mon Jul 16 05:38:25 2018 - [info] 192.168.19.74(192.168.19.74:3306) Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 16 05:38:25 2018 - [info] GTID ON
Mon Jul 16 05:38:25 2018 - [info] Replicating from 192.168.19.73(192.168.19.73:3306)
Mon Jul 16 05:38:25 2018 - [info] Current Alive Master: 192.168.19.73(192.168.19.73:3306)
Mon Jul 16 05:38:25 2018 - [info] Checking slave configurations..
Mon Jul 16 05:38:25 2018 - [info] Checking replication filtering settings..
Mon Jul 16 05:38:25 2018 - [info] binlog_do_db= , binlog_ignore_db=
Mon Jul 16 05:38:25 2018 - [info] Replication filtering check ok.
Mon Jul 16 05:38:25 2018 - [info] GTID is supported. Skipping all SSH and Node package checking.
Mon Jul 16 05:38:25 2018 - [info] Checking SSH publickey authentication settings on the current master..
Mon Jul 16 05:38:25 2018 - [info] HealthCheck: SSH to 192.168.19.73 is reachable.
Mon Jul 16 05:38:25 2018 - [info]
192.168.19.73 (current master)
+--192.168.19.74
Mon Jul 16 05:38:25 2018 - [info] Checking replication health on 192.168.19.74..
Mon Jul 16 05:38:25 2018 - [info] ok.
Mon Jul 16 05:38:25 2018 - [info] Checking master_ip_failover_script status:
Mon Jul 16 05:38:25 2018 - [info] /data/mha/etc/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.19.73 --orig_master_ip=192.168.19.73 --orig_master_port=3306
IN SCRIPT TEST====/sbin/ifconfig ens160:1 down==/sbin/ifconfig ens160:1 192.168.19.68/24===
Checking the Status of the script.. OK
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:45:02:42 brd ff:ff:ff:ff:ff:ff
inet 192.168.19.73/24 brd 192.168.19.255 scope global ens160
valid_lft forever preferred_lft forever
inet 192.168.19.68/24 brd 192.168.19.255 scope global secondary ens160:1
valid_lft forever preferred_lft forever
inet6 fe80::8cc8:7141:6d5a:7686/64 scope link
valid_lft forever preferred_lft forever
Mon Jul 16 05:38:25 2018 - [info] OK.
Mon Jul 16 05:38:25 2018 - [warning] shutdown_script is not defined.
Mon Jul 16 05:38:25 2018 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
[root@mysql3 sysbench]# /usr/local/bin/masterha_check_status --conf=/data/mha/etc/app.cnf
app is stopped(2:NOT_RUNNING).
[1]+ Done nohup /usr/local/bin/masterha_manager --conf=/data/mha/etc/app.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/mha/log/manager.log 2>&1
[root@mysql2 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:45:02:42 brd ff:ff:ff:ff:ff:ff
inet 192.168.19.73/24 brd 192.168.19.255 scope global ens160
valid_lft forever preferred_lft forever
inet 192.168.19.68/24 brd 192.168.19.255 scope global secondary ens160:1
valid_lft forever preferred_lft forever
inet6 fe80::8cc8:7141:6d5a:7686/64 scope link
valid_lft forever preferred_lft forever
从上面的输出可以看出整个MHA的切换过程,共包括以下的步骤:
1.配置文件检查阶段,这个阶段会检查整个集群配置文件配置
2.宕机的master处理,这个阶段包括虚拟ip摘除操作,主机关机操作
3.复制dead maste和最新slave相差的relay log,并保存到MHA Manger具体的目录下
4.识别含有最新更新的slave
5.应用从master保存的二进制日志事件(binlog events)
6.提升一个slave为新的master进行复制
7.使其他的slave连接新的master进行复制
8 自动切换后,manager会删除app.cnf文件中的[server1]配置
9 old的master加入主从后,进行chk检查,还是发现不了new slave,因为配置文件里面已经删除掉
[root@mysql3 etc]# cat /data/mha/log/manager.log |grep -i vip
Disabling the VIP on old master: 192.168.19.69
Enabling the VIP - 192.168.19.68/24 on the new master - 192.168.19.73
[root@mysql3 etc]# ssh mysql2 "ifconfig |grep 19.68 -B1"
ens160:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.19.68 netmask 255.255.255.0 broadcast 192.168.19.255
#192.168.19.68 插入并查看数据
insert into t2 values(12,'1BBBBBB');
insert into t2 values(13,'1444');
insert into t2 values(14,'1455');
insert into t2 values(15,'175');
[mysql@mysql3 scripts]$ mysql -usystem -p*** -h127.0.0.1 -P3306 -e'select * from test.t2;'
[mysql@hongquan1 ~]$ mysql -usystem -p*** -h192.168.19.68 -P3306
(system@192.168.19.68:3306) [(none)]> show slave status\G;
Empty set (0.00 sec)
ERROR:
No query specified
删除manager上的app.failover.complete
[root@mysql3 app]# rm /data/mha/etc/app/app.failover.complete
启动old的master,并登陆查看
[mysql@mysql1 scripts]$ ./mysql_db_startup.sh
[mysql@mysql1 scripts]$ ./mysqlplus.sh
(system@127.0.0.1:3306) [(none)]> show master status;
+------------------+----------+--------------+------------------+------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+------------------------------------------+
| mysql-bin.000002 | 191 | | | cab5ec92-85ab-11e8-852c-005056bf5092:1-6 |
+------------------+----------+--------------+------------------+------------------------------------------+
1 row in set (0.00 sec)
(system@127.0.0.1:3306) [(none)]> select *from test.t2; #还是宕机前的数据
+------+--------+
| id | name |
+------+--------+
| 1 | AAAAA |
| 2 | BBBBBB |
| 3 | 444 |
| 4 | 455 |
| 5 | 75 |
+------+--------+
查看manager的log文件,找到change的语句
[root@mysql3 etc]# cat /data/mha/log/manager.log |grep -i change
Mon Jul 16 05:37:13 2018 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.19.73', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Mon Jul 16 05:37:15 2018 - [info] Executed CHANGE MASTER.
在slave连接master
(system@127.0.0.1:3306) [(none)]> CHANGE MASTER TO MASTER_HOST='192.168.19.73', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='***';
Query OK, 0 rows affected, 2 warnings (0.02 sec)
(system@127.0.0.1:3306) [(none)]> start slave;
Query OK, 0 rows affected (0.00 sec)
(system@127.0.0.1:3306) [(none)]> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.19.73
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 2465
Relay_Log_File: relaylog.000002
Relay_Log_Pos: 1386
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 2465
Relay_Log_Space: 1583
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 2018071202
Master_UUID: d4e95eb8-85ae-11e8-8540-000c29450242
Master_Info_File: /data/mysqldata/3306/data/master.info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: d4e95eb8-85ae-11e8-8540-000c29450242:1-4
Executed_Gtid_Set: cab5ec92-85ab-11e8-852c-005056bf5092:1-6,
d4e95eb8-85ae-11e8-8540-000c29450242:1-4
Auto_Position: 1
1 row in set (0.01 sec)
ERROR:
No query specified
(system@127.0.0.1:3306) [(none)]> select *from test.t2;
+------+---------+
| id | name |
+------+---------+
| 1 | AAAAA |
| 2 | BBBBBB |
| 3 | 444 |
| 4 | 455 |
| 5 | 75 |
| 12 | 1BBBBBB |
| 13 | 1444 |
| 14 | 1455 |
| 15 | 175 |
+------+---------+
9 rows in set (0.00 sec)
已经将old master作为新slave加入mha,修改app.cnf配置文件,加入相应的server
[root@mysql3 ~]# vim /data/mha/etc/app.cnf
[root@mysql3 ~]# /usr/local/bin/masterha_check_repl --conf=/data/mha/etc/app.cnf
[root@mysql3 etc]# /usr/local/bin/masterha_check_repl --conf=/data/mha/etc/app.cnf
启动管理节点
# nohup /usr/local/bin/masterha_manager --conf=/data/mha/etc/app.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/mha/log/manager.log 2>&1 &
[root@mysql3 ~]# /usr/local/bin/masterha_check_status --conf=/data/mha/etc/app.cnf
app (pid:17328) is running(0:PING_OK), master:192.168.19.73
# /usr/local/bin/masterha_stop --conf=/data/mha/etc/app.cnf
--3台服务器停电,重启后,74作为slave报错,
Last_SQL_Errno: 1050
Last_SQL_Error: Error 'Table 't2' already exists' on query. Default database: 'test'. Query: 'create table t2(id int,name varchar(10))'
(system@127.0.0.1:3306) [test]> reset master;
(system@127.0.0.1:3306) [test]> show master status;
(system@127.0.0.1:3306) [test]> set @@GLOBAL.GTID_PURGED='cab5ec92-85ab-11e8-852c-005056bf5092:1-6,d4e95eb8-85ae-11e8-8540-000c29450242:1-17';
(system@127.0.0.1:3306) [test]> start slave;
(system@127.0.0.1:3306) [test]> show slave status\G;
5.2 手动Failover
masterha_master_switch工具进行主从切换,在手动切换的同时需要保证没有启用MHA自动切换功能
MHA Manager必须没有运行
手动failover,这种场景意味着在业务上没有启用MHA自动切换功能,当主服务器故障时,人工手动调用MHA来进行故障切换操作,具体命令如下
注意:如果,MHA manager检测到没有dead的server,将报错,并结束failover
# /usr/local/bin/masterha_stop --conf=/data/mha/etc/app.cnf
# /usr/local/bin/masterha_master_switch --master_state=dead --conf=/data/mha/etc/app.cnf --dead_master_host=192.168.19.73 --dead_master_port=3306 --new_master_host=192.168.19.69 --new_master_port=3306 --ignore_last_failover
--dead_master_ip=<dead_master_ip> is not set. Using 192.168.19.73.
Mon Jul 16 22:11:01 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jul 16 22:11:01 2018 - [info] Reading application default configurations from /data/mha/etc/app.cnf..
Mon Jul 16 22:11:01 2018 - [info] Reading server configurations from /data/mha/etc/app.cnf..
Mon Jul 16 22:11:01 2018 - [info] MHA::MasterFailover version 0.56.
Mon Jul 16 22:11:01 2018 - [info] Starting master failover.
Mon Jul 16 22:11:01 2018 - [info]
Mon Jul 16 22:11:01 2018 - [info] * Phase 1: Configuration Check Phase..
Mon Jul 16 22:11:01 2018 - [info]
Mon Jul 16 22:11:01 2018 - [info] Dead Servers:
Mon Jul 16 22:11:01 2018 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln183] None of server is dead. Stop failover.
Mon Jul 16 22:11:01 2018 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln178] Got ERROR: at /usr/local/bin/masterha_master_switch line 53
因为这里master没有问题,所以手动failover切换会失败,有的mha没有开启自动切换,手动切换(当主服务故障时)
5.3 在线进行切换
MHA在线切换的大概过程:
1.检测复制设置和确定当前主服务器
2.确定新的主服务器
3.阻塞写入到当前主服务器
4.等待所有从服务器赶上复制
5.授予写入到新的主服务器
6.重新设置从服务器
注意,在线切换的时候应用架构需要考虑以下两个问题:
1.自动识别master和slave的问题(master的机器可能会切换),如果采用了vip的方式,基本可以解决这个问题。
2.负载均衡的问题(可以定义大概的读写比例,每台机器可承担的负载比例,当有机器离开集群时,需要考虑这个问题)
为了保证数据完全一致性,在最快的时间内完成切换,MHA的在线切换必须满足以下条件才会切换成功,否则会切换失败。
1.所有slave的IO线程都在运行
2.所有slave的SQL线程都在运行
3.所有的show slave status的输出中Seconds_Behind_Master参数小于或者等于running_updates_limit秒,如果在切换过程中不指定running_updates_limit,那么默认情况下running_updates_limit为1秒。
4.在master端,通过show processlist输出,没有一个更新花费的时间大于running_updates_limit秒。
在线切换步骤如下:
首先,停掉MHA监控:
# /usr/local/bin/masterha_stop --conf=/data/mha/etc/app.cnf
# /usr/local/bin/masterha_master_switch --conf=/data/mha/etc/app.cnf --master_state=alive --new_master_host=192.168.19.69 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000
Mon Jul 16 22:20:40 2018 - [info] MHA::MasterRotate version 0.56.
Mon Jul 16 22:20:40 2018 - [info] Starting online master switch..
Mon Jul 16 22:20:40 2018 - [info]
Mon Jul 16 22:20:40 2018 - [info] * Phase 1: Configuration Check Phase..
Mon Jul 16 22:20:40 2018 - [info]
Mon Jul 16 22:20:40 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jul 16 22:20:40 2018 - [info] Reading application default configurations from /data/mha/etc/app.cnf..
Mon Jul 16 22:20:40 2018 - [info] Reading server configurations from /data/mha/etc/app.cnf..
Mon Jul 16 22:20:40 2018 - [info] Current Alive Master: 192.168.19.73(192.168.19.73:3306)
Mon Jul 16 22:20:40 2018 - [info] Alive Slaves:
Mon Jul 16 22:20:40 2018 - [info] 192.168.19.69(192.168.19.69:3306) Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 16 22:20:40 2018 - [info] GTID ON
Mon Jul 16 22:20:40 2018 - [info] Replicating from 192.168.19.73(192.168.19.73:3306)
Mon Jul 16 22:20:40 2018 - [info] 192.168.19.74(192.168.19.74:3306) Version=5.6.15-log (oldest major version between slaves) log-bin:enabled
Mon Jul 16 22:20:40 2018 - [info] GTID ON
Mon Jul 16 22:20:40 2018 - [info] Replicating from 192.168.19.73(192.168.19.73:3306)
It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 192.168.19.73(192.168.19.73:3306)? (YES/no): yes
Mon Jul 16 22:20:46 2018 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Mon Jul 16 22:20:46 2018 - [info] ok.
Mon Jul 16 22:20:46 2018 - [info] Checking MHA is not monitoring or doing failover..
Mon Jul 16 22:20:46 2018 - [info] Checking replication health on 192.168.19.69..
Mon Jul 16 22:20:46 2018 - [info] ok.
Mon Jul 16 22:20:46 2018 - [info] Checking replication health on 192.168.19.74..
Mon Jul 16 22:20:46 2018 - [info] ok.
Mon Jul 16 22:20:46 2018 - [error][/usr/local/share/perl5/MHA/MasterRotate.pm, ln161] We should not start online master switch when one of connections are running long updates on the current master(192.168.19.73(192.168.19.73:3306)). Currently 3 update thread(s) are running.
Details:
{'Time' => '407522','db' => undef,'Id' => '1','User' => 'event_scheduler','State' => 'Waiting on empty queue','Command' => 'Daemon','Info' => undef,'Host' => 'localhost'}
{'Time' => '60211','db' => undef,'Id' => '85','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for binlog to be updated','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => '192.168.19.74:50422'}
{'Time' => '59571','db' => undef,'Id' => '91','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for binlog to be updated','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => '192.168.19.69:55086'}
Mon Jul 16 22:20:46 2018 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln178] Got ERROR: at /usr/local/bin/masterha_master_switch line 53.
master
[system@3306][(none)]> set global event_scheduler=off;
[system@3306][(none)]> Select @@event_scheduler;
+-------------------+
| @@event_scheduler |
+-------------------+
| OFF |
+-------------------+
Mon Jul 16 22:26:06 2018 - [error][/usr/local/share/perl5/MHA/MasterRotate.pm, ln161] We should not start online master switch when one of connections are running long updates on the current master(192.168.19.73(192.168.19.73:3306)). Currently 2 update thread(s) are running.
Details:
{'Time' => '60531','db' => undef,'Id' => '85','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for binlog to be updated','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => '192.168.19.74:50422'}
{'Time' => '59891','db' => undef,'Id' => '91','User' => 'repl','State' => 'Master has sent all binlog to slave; waiting for binlog to be updated','Command' => 'Binlog Dump GTID','Info' => undef,'Host' => '192.168.19.69:55086'}
Mon Jul 16 22:26:06 2018 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln178] Got ERROR: at /usr/local/bin/masterha_master_switch line 53.
--orig_master_is_new_slave 切换时加上此参数是将原 master 变为 slave 节点,如果不加此参数,原来的 master 将不启动
--running_updates_limit=10000,故障切换时,候选master 如果有延迟的话, mha 切换不能成功,加上此参数表示延迟在此时间范围内都可切换(单位为s),但是切换的时间长短是由recover 时relay 日志的大小决定
注意:由于在线进行切换需要调用到master_ip_online_change这个脚本
修复宕机的Master
通常情况下自动切换以后,原master可能已经废弃掉,待原master主机修复后,如果数据完整的情况下,可能想把原来master重新作为新主库的slave
,这时我们可以借助当时自动切换时刻的MHA日志来完成对原master的修复。下面是提取相关日志的命令:
# grep -i "All other slaves" /data/mha/log/manager.log
=========
MHA 参数列表
Local : 指每一个配置块内部。 Local功能的参数需要放置在[server_xxx] 块下面
App : 参数作用于master/slave, 这些参数需要配置在[server_default]块的下面
Global : 作用于master/slave, Global级别的参数用于管理多组master/slave结构,可以统一化管理一些参数
candidate_master
你可能对于同一组slave中有着不同的规划,有的其望在Master故障时,提升为新的Master(如: Raid1的slave比Raid0的slave更适合做Master)
这个参数的作用是当设计candidate_master = 1时,这个服务器有较高的优先级提升为新的master(还要具备: 开启binlog, 复制没有延迟)。
所以当设置了candidate_master = 1的机器在master故障时必然成为新的master. 但这是很有用的设置优先级的一个参数。
no_master 当设置了no_master = 1的服务器,这个服务器永远不会提升为新的master. 这个参数据对于永远不期望成为master的机器很有用。
ignore_fail 在默认情况下,MHA manager不会在slave存在故障的情况下(已经不能通过SSH/MySQL进行连接,或是SQL Thread已经停止出错,其它原因)的情况下进行Master的故障切换
(当不存在slave时MHA manager也会退出)。 但有些情况下你期望在slave存在问题的情况下,也进行故障切换。 所以当设置了ignore_fail = 1时,MHA会在所有的机器有问题的时间也会进行故障切换。 默认是0.
skip_reset_slave (从MHA 0.56开始支持) Master故障切换后新的master上执行RESET SLAVE(ALL)
disable_log_bin 当设置了这个参数,在slave应用差异的relay log时不会产生二进制日志。 内部实现通过mysqlbinlog的disable-log-bin实现。
remote_workdir MHA node上工作目录的全路径名。如果不存在,MHA node会自动创建,如果不允许创建,MHA Node自动异常退出。
需要注意MHA manager 或是MHA node上需要检查空间是否可用,所以需要注意检查这个。 一般默认, remote_workdir是的”/var/tmp”
master_binlog_dir master上用于存储binary日志的全路径。这个参数用于当master上mysql死掉后,通过ssh连到mysql服务器上,找到需要binary日志事件。这个参数可以帮助用于无法找到master进程死掉后binary日志存储位置。
log_level 设置MHA manager记录LOG的级别。 默认是info级别而且大多数情况也是比较适合。 同样可以设置为: debug/info/warning/error.
manager_workdir 用于指定mha manager产生相关状态文件全路径。 如果没设置 默认是/var/tmp
check_repl_delay
check_repl_filter 在默认下情况,当master和slave设置了不同了binary log/replication 过滤规则时,MHA直接报错不会进行监控和故障切换。 这些将会导致出现一些异想不到的错误”Table not exists”。
如果你100%确认不同的过滤规则不会导致恢复中的错误,设置check_repl_filter=0。 需要注意: 当使用了check_repl_filter = 0时,MHA不会检查过滤规则在应用差异的relay日志,所以有可能会出现
”Table not exists”的错误。当你设置了这个参数请小心处理。
latest_priority 在默认情况下,和Master最接近的slave(一个slave从Master上获得了最一个binlog事件)是最有优先权成为新的master。
multi_tier_slave 从MHA 0.52开始, 多层复制可以支持了。在默认情况下,不支持三层或是更多层的复制配置。 如果: host2从host1上复制,host3从host2上复制。 在默认配置的情况下不支持写host{1,2,3},
因为这是一个三层的复制,MHA Manager会停止报错。 当设置了multi_tier_slave, MHA Manager就不会在三层复制报错停止。 但是会忽略第三层的机器。也就是如果host1挂了,host2将会成为新的master,host3还是从host2上进行复制。
ping_interval 这个参数设置MHA Manager多长时间去ping一下master(执行一些SQL语句). 当失去和master三次偿试,MHA Manager会认为MySQL Master死掉了。也就是说,最大的故障切换时间是4次ping_interval的时间,默认是3秒。
ping_type (从MHA 0.53后开始支持) 在默认情况下, MHA manager和MySQL创建一个连接执行”select 1″(ping_type=select)用于检查master是否健康。
但有一些情况: 每次检测都连接/然后断开会比较好一点,这样对于tcp方面的错误感知更快一点。设置ping_type=CONNECT 就行了。从MHA 0.56后pint_type=INSERT也被添加。
secondary_check_script 一般来讲, 非常推荐使用更多网络上机器是不同路由策略上的机器来检查MySQL Master是否存活。 默认情况下,只有MHA Manager通过一个路由来
检查Master是否存活。这个也是不推荐的。MHA可以通过外部secondary_check_script配置的脚本来做到从多个路由策略下进行检查。
master_ip_failover_script
master_ip_online_changes_script
shutdown_script
report_script
init_conf_load_script 这个参数用于不想设置明文在配置文件中(如:密码相关)。 只用返回”name=value”这样的值。 这个可以用来复盖global配置中的一些值。一个例子脚本如下
#!/usr/bin/perl
print “password=$ROOT_PASS\n”;
print “repl_password=$REPL_PASS\n”;
init_conf_load_script No App/Global null report_script= /usr/local/custom_script/init_conf_loader
[root@mysql3 log]# cat /etc/sysconfig/network-scripts/ifcfg-ens160
[root@mysql3 etc]# vim app.cnf
[server default]
user = system #mysql用戶名,#需要的权限:Super,select,create,insert,update,delete,drop,reload
password = *** #mysql密码
ssh_user = root #ssh免密钥登录的帐号名
repl_user = repl #mysql复制帐号
repl_password = ****#mysql复制密码
ping_interval = 1 #ping间隔,用来检测master是否正常,默认是3秒,尝试三次没有回应的时候自动进行failover
ping_type = SELECT
manager_workdir=/data/mha/etc/app #manager的工作目录,主要该目录的权限,需要有创建的权限
manager_log=/data/mha/log/manager.log #manager的日志
remote_workdir=/data/mha/etc/app #设置远端mysql在发生切换时binlog的保存位置 #另外2台机子在运行时候需要创建的目录
master_binlog_dir="/data/mysqldata/3306/binlog" #master 保存binlog的位置,以便MHA可以找到master的日志
master_ip_failover_script="/data/mha/etc/master_ip_failover" #自动failover时候的切换脚本
master_ip_online_change_script="/data/mha/etc/master_ip_failover" #手动切换时候的切换脚本
shutdown_script=""
report_script=""
#check_repl_delay=0
[server1]
hostname=192.168.19.69
port=3306
master_binlog_dir="/data/mysqldata/3306/binlog"
#candidate_master=1
#ignore_fail=1
[server2]
hostname=192.168.19.73
port=3306
master_binlog_dir="/data/mysqldata/3306/binlog/"
candidate_master=1 #master机宕掉后,优先启用这台作为新master
check_repl_delay=0 #默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该slave作为一个新的master
#ignore_fail=1
[server3]
hostname=192.168.19.74
port=3306
master_binlog_dir="/data/mysqldata/3306/binlog/"
----master_ip_failover,虚拟ip配置等
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
my $vip = '192.168.19.68/24'; # Virtual IP
my $key = "1";
my $int = "ens160";
my $ssh_start_vip = "/sbin/ifconfig $int:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig $int:$key down";
my $arp_effect = "/sbin/arping -Uq -s192.168.19.68 -I $int 192.168.19.99 -c 3"; # Virtual IP and gateway
#my $test = "echo successfull >/tmp/test.txt";
$ssh_user = "root";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
# $orig_master_host, $orig_master_ip, $orig_master_port are passed.
# If you manage master ip address at global catalog database,
# invalidate orig_master_ip here.
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
# all arguments are passed.
# If you manage master ip address at global catalog database,
# activate new_master_ip here.
# You can also grant write access (create user, set read_only=0, etc) here.
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
#`ssh $ssh_user\@cluster1 \" $ssh_start_vip \"`;
&status();
exit 0;
}
else {
&usage();
exit 1;
}
}
# A simple system call that enable the VIP on the new master
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
`ssh $ssh_user\@$new_master_host \" $arp_effect \"`;
# `ssh $ssh_user\@$new_master_host \" $test \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub status() {
print `ssh $ssh_user\@$orig_master_host \" ip add show $int \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_maste
r_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
---master_ip_online_changes_script
#!/usr/bin/env perl
# Copyright (C) 2011 DeNA Co.,Ltd.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
## Note: This is a sample script and is not complete. Modify the script based on your environment.
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
use MHA::DBHelper;
use MHA::NodeUtil;
use Time::HiRes qw( sleep gettimeofday tv_interval );
use Data::Dumper;
my $_tstart;
my $_running_interval = 0.1;
my (
$command, $orig_master_host, $orig_master_ip,
$orig_master_port, $orig_master_user,
$new_master_host, $new_master_ip, $new_master_port,
$new_master_user,
);
my $vip = '192.168.19.68/24'; # Virtual IP
my $key = "1";
my $int = "ens160";
my $ssh_start_vip = "/sbin/ifconfig $int:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig $int:$key down";
my $ssh_user = "root";
my $new_master_password='mysql';
my $orig_master_password='mysql';
GetOptions(
'command=s' => \$command,
#'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'orig_master_user=s' => \$orig_master_user,
#'orig_master_password=s' => \$orig_master_password,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
'new_master_user=s' => \$new_master_user,
#'new_master_password=s' => \$new_master_password,
);
exit &main();
sub current_time_us {
my ( $sec, $microsec ) = gettimeofday();
my $curdate = localtime($sec);
return $curdate . " " . sprintf( "%06d", $microsec );
}
sub sleep_until {
my $elapsed = tv_interval($_tstart);
if ( $_running_interval > $elapsed ) {
sleep( $_running_interval - $elapsed );
}
}
sub get_threads_util {
my $dbh = shift;
my $my_connection_id = shift;
my $running_time_threshold = shift;
my $type = shift;
$running_time_threshold = 0 unless ($running_time_threshold);
$type = 0 unless ($type);
my @threads;
my $sth = $dbh->prepare("SHOW PROCESSLIST");
$sth->execute();
while ( my $ref = $sth->fetchrow_hashref() ) {
my $id = $ref->{Id};
my $user = $ref->{User};
my $host = $ref->{Host};
my $command = $ref->{Command};
my $state = $ref->{State};
my $query_time = $ref->{Time};
my $info = $ref->{Info};
$info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);
next if ( $my_connection_id == $id );
next if ( defined($query_time) && $query_time < $running_time_threshold );
next if ( defined($command) && $command eq "Binlog Dump" );
next if ( defined($user) && $user eq "system user" );
next
if ( defined($command)
&& $command eq "Sleep"
&& defined($query_time)
&& $query_time >= 1 );
if ( $type >= 1 ) {
next if ( defined($command) && $command eq "Sleep" );
next if ( defined($command) && $command eq "Connect" );
}
if ( $type >= 2 ) {
next if ( defined($info) && $info =~ m/^select/i );
next if ( defined($info) && $info =~ m/^show/i );
}
push @threads, $ref;
}
return @threads;
}
sub main {
if ( $command eq "stop" ) {
## Gracefully killing connections on the current master
# 1. Set read_only= 1 on the new master
# 2. DROP USER so that no app user can establish new connections
# 3. Set read_only= 1 on the current master
# 4. Kill current queries
# * Any database access failure will result in script die.
my $exit_code = 1;
eval {
## Setting read_only=1 on the new master (to avoid accident)
my $new_master_handler = new MHA::DBHelper();
# args: hostname, port, user, password, raise_error(die_on_error)_or_not
$new_master_handler->connect( $new_master_ip, $new_master_port,
$new_master_user, $new_master_password, 1 );
print current_time_us() . " Set read_only on the new master.. ";
$new_master_handler->enable_read_only();
if ( $new_master_handler->is_read_only() ) {
print "ok.\n";
}
else {
die "Failed!\n";
}
$new_master_handler->disconnect();
# Connecting to the orig master, die if any database error happens
my $orig_master_handler = new MHA::DBHelper();
$orig_master_handler->connect( $orig_master_ip, $orig_master_port,
$orig_master_user, $orig_master_password, 1 );
## Drop application user so that nobody can connect. Disabling per-session binlog beforehand
#$orig_master_handler->disable_log_bin_local();
#print current_time_us() . " Drpping app user on the orig master..\n";
#FIXME_xxx_drop_app_user($orig_master_handler);
## Waiting for N * 100 milliseconds so that current connections can exit
my $time_until_read_only = 15;
$_tstart = [gettimeofday];
my @threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
while ( $time_until_read_only > 0 && $#threads >= 0 ) {
if ( $time_until_read_only % 5 == 0 ) {
printf
"%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n",
current_time_us(), $#threads + 1, $time_until_read_only * 100;
if ( $#threads < 5 ) {
print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
foreach (@threads);
}
}
sleep_until();
$_tstart = [gettimeofday];
$time_until_read_only--;
@threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
}
## Setting read_only=1 on the current master so that nobody(except SUPER) can write
print current_time_us() . " Set read_only=1 on the orig master.. ";
$orig_master_handler->enable_read_only();
if ( $orig_master_handler->is_read_only() ) {
print "ok.\n";
}
else {
die "Failed!\n";
}
## Waiting for M * 100 milliseconds so that current update queries can complete
my $time_until_kill_threads = 5;
@threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
while ( $time_until_kill_threads > 0 && $#threads >= 0 ) {
if ( $time_until_kill_threads % 5 == 0 ) {
printf
"%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n",
current_time_us(), $#threads + 1, $time_until_kill_threads * 100;
if ( $#threads < 5 ) {
print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
foreach (@threads);
}
}
sleep_until();
$_tstart = [gettimeofday];
$time_until_kill_threads--;
@threads = get_threads_util( $orig_master_handler->{dbh},
$orig_master_handler->{connection_id} );
}
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
## Terminating all threads
print current_time_us() . " Killing all application threads..\n";
$orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 );
print current_time_us() . " done.\n";
#$orig_master_handler->enable_log_bin_local();
$orig_master_handler->disconnect();
## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
## Activating master ip on the new master
# 1. Create app user with write privileges
# 2. Moving backup script if needed
# 3. Register new master's ip to the catalog database
# We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery.
# If exit code is 0 or 10, MHA does not abort
my $exit_code = 10;
eval {
my $new_master_handler = new MHA::DBHelper();
# args: hostname, port, user, password, raise_error_or_not
$new_master_handler->connect( $new_master_ip, $new_master_port,
$new_master_user, $new_master_password, 1 );
## Set read_only=0 on the new master
#$new_master_handler->disable_log_bin_local();
print current_time_us() . " Set read_only=0 on the new master.\n";
$new_master_handler->disable_read_only();
## Creating an app user on the new master
#print current_time_us() . " Creating app user on the new master..\n";
#FIXME_xxx_create_app_user($new_master_handler);
#$new_master_handler->enable_log_bin_local();
$new_master_handler->disconnect();
## Update master ip on the catalog database, etc
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
# do nothing
exit 0;
}
else {
&usage();
exit 1;
}
}
# A simple system call that enable the VIP on the new master
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
die;
}
Mysql 5.6 MHA (gtid) on Kylin的更多相关文章
- MYSQL 5.7 MHA(GTID+ROW)部署及failover,online_change实战演练
文章结构如下: 1.MHA简介 Masterhigh availability manager and toolsfor mysql,是日本的一位mysql专家采用perl语言编写的一个脚本管理工具, ...
- 【MySql】——MHA+GTID+failover+binlog-server+Atlas
一.环境准备 1.mysql-db01 #系统版本 [root@mysql-db01 ~]# cat /etc/redhat-release CentOS release 6.7 (Final) #内 ...
- MySQL 5.7基于GTID复制的常见问题和修复步骤(二)
[问题二] 有一个集群(MySQL5.7.23)切换后复制slave报1236,其实是不小心在slave上执行了事务导致 Got fatal error 1236 from master when r ...
- Mysql 高可用(MHA)-读写分离(Atlas)-分布式架构(Mycat)
Mysql 高可用(MHA)-读写分离(Atlas) 1. 搭建主从复制(一主两从) 1.1 准备环境 1 主库:10.0.0.51/db01 2 从库:10.0.0.52/db02,10.0.0.5 ...
- MySQL HA方案之MySQL半复制+MHA+Keepalived+Atlas+LVS[转]
MySQL HA方案之MySQL半复制+MHA+Keepalived+Atlas+LVS 简介 目前Mysql高可用的方案有好多,比如MMM,heartbeat+drbd,Cluster等,还有per ...
- percona mysql server5.7基于gtid的主从复制
配置mysql基于gtid主从复制架构 一.二进制安装mysql [root@node5 data]# --Linux.x86_64.ssl101.tar.gz [root@node5 data]# ...
- mysql 5.6在gtid复制模式下复制错误,如何跳过??
mysql 5.6在gtid复制模式下复制错误,如何跳过?? http://www.xuchanggang.cn/archives/918.html
- MySQL 在线开启&关闭GTID模式
MySQL 在线开启&关闭GTID模式 目录 MySQL 在线开启&关闭GTID模式 基本概述 在线开启GTID 1. 设置GTID校验ENFORCE_GTID_CONSISTENCY ...
- Mysql MHA(GTID)配置(实操)
实现环境 centos6.7 MYSQL5.6.36 主:192.168.1.191 从1:192.168.1.145 从2:192.168.1.146 监测:放在从2上 192.168.1.146 ...
随机推荐
- [Android]开源中国源码分析之二---DrawerLayout
从启动界面到主界面之后的效果如图所示,采用的是v4包下的DrawerLayout, activity_main.xml文件如下: <!-- A DrawerLayout is intended ...
- VisualStudio卸载后无法再次安装的解决方法
解决方法如下: 1.删除 Visual Studio 2013 安装目录文件夹 Visual Studio 12.0 2.win+R 输入 %UserProfile%\Appdata\Local\Mi ...
- 分布式任务调度平台XXL-Job集群版搭建
如果集群: 保存后: 启动两个 job 第二个job的配置: # web port server.port=8082 # log config logging.config=classpath:log ...
- 【NOIP2013 普及组】车站分级
[NOIP2013 普及组]车站分级 一.题目 [NOIP2013 普及组]车站分级 时间限制: 1 Sec 内存限制: 128 MB 提交: 3 解决: 0 [提交][状态][讨论版] 题目描述 ...
- unity调用C++ dll文件
首先建立Plugins文件夹,把dll文件放在里面 一一对应,我踩的坑是文件名加了后缀.dll,虽然不知道网上为什么都加了我这加了就报找不到dll文件错误,反正解决啦
- ES6-Set和Map数据结构学习笔记
Set和Map数据结构 Set 基本用法 ES6提供了新的数据结构--Set,类似于数组,但是成员的值是唯一的,没有重复的值,Set本身是一种构造函数,用来生成Set数据结构 var s = new ...
- 卸载全部appx应用(包括应用商店)
在PowerShell中粘贴: Get-AppXPackage | Remove-AppxPackage
- 格式化SQL和逆格式SQL
上篇说过SQLyog中的计算合适数据类型的功能,现在说下Navicat格式化SQL和逆格式化SQL 啥也不说了,一看图,一目了然,Beautify SQL(格式化,即变成最美的语句)和Minify S ...
- NSObject头文件解析 / 消息机制 / Runtime解读 (二)
本章接着NSObject头文件解析 / 消息机制 / Runtime解读(一)写 给类添加属性: BOOL class_addProperty(Class cls, const char *name, ...
- 【C#】62. 异步读写文件的几种方法: Task.Factory.FromAsync,WriteAsync
一.这里主要说明2种异步写入文件的方法: 1)异步编程模型API转为Task——使用Task.Factory.FromAsync方法 2)对于StreamWriter使用WriteAsync方法 请记 ...