MHA ssh检查,repl复制检查和在线切换日志分析
一、SSh 检查日志分析
执行过程及对应的日志:
1、读取MHA manger 节点上的配置文件
2、根据配置文件,得到各个主机的信息,逐一进行SSH检查
3、每个主机都通过SSH连接除了自己以外的其他所有主机
4、当所有主机相互之间都能通过SSH免密登录,SSH检查就通过。
[root@A2 app1]# masterha_check_ssh --conf=/etc/masterha/app1.conf
Sun Jun :: - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Jun :: - [info] Reading application default configuration from /etc/masterha/app1.conf..
Sun Jun :: - [info] Reading server configuration from /etc/masterha/app1.conf..
Sun Jun :: - [info] Starting SSH connection tests..
Sun Jun :: - [debug]
Sun Jun :: - [debug] Connecting via SSH from root@172.16.13.15(172.16.13.15:) to root@172.16.15.3(172.16.15.3:)..
Sun Jun :: - [debug] ok.
Sun Jun :: - [debug] Connecting via SSH from root@172.16.13.15(172.16.13.15:) to root@172.16.15.2(172.16.15.2:)..
Sun Jun :: - [debug] ok.
Sun Jun :: - [debug]
Sun Jun :: - [debug] Connecting via SSH from root@172.16.15.3(172.16.15.3:) to root@172.16.13.15(172.16.13.15:)..
Sun Jun :: - [debug] ok.
Sun Jun :: - [debug] Connecting via SSH from root@172.16.15.3(172.16.15.3:) to root@172.16.15.2(172.16.15.2:)..
Sun Jun :: - [debug] ok.
Sun Jun :: - [debug]
Sun Jun :: - [debug] Connecting via SSH from root@172.16.15.2(172.16.15.2:) to root@172.16.13.15(172.16.13.15:)..
Sun Jun :: - [debug] ok.
Sun Jun :: - [debug] Connecting via SSH from root@172.16.15.2(172.16.15.2:) to root@172.16.15.3(172.16.15.3:)..
Sun Jun :: - [debug] ok.
Sun Jun :: - [info] All SSH connection tests passed successfully.
二、主从复制检查日志分析
1、读取配置文件,根据配置文件,检查当前的所有主机状态,MHA Node版本,是否支持GTID主从复制,得到当前的主从复制架构
[root@A2 app1]# masterha_check_repl --conf=/etc/masterha/app1.conf
Sun Jun :: - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Jun :: - [info] Reading application default configuration from /etc/masterha/app1.conf..
Sun Jun :: - [info] Reading server configuration from /etc/masterha/app1.conf..
Sun Jun :: - [info] MHA::MasterMonitor version 0.56.
Sun Jun :: - [info] GTID failover mode =
Sun Jun :: - [info] Dead Servers:
Sun Jun :: - [info] Alive Servers:
Sun Jun :: - [info] 172.16.13.15(172.16.13.15:)
Sun Jun :: - [info] 172.16.15.3(172.16.15.3:)
Sun Jun :: - [info] 172.16.15.2(172.16.15.2:)
Sun Jun :: - [info] Alive Slaves:
Sun Jun :: - [info] 172.16.13.15(172.16.13.15:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Sun Jun :: - [info] Replicating from 172.16.15.3(172.16.15.3:)
Sun Jun :: - [info] Primary candidate for the new Master (candidate_master is set)
Sun Jun :: - [info] 172.16.15.2(172.16.15.2:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Sun Jun :: - [info] Replicating from 172.16.15.3(172.16.15.3:)
Sun Jun :: - [info] Current Alive Master: 172.16.15.3(172.16.15.3:)
2、检查从库配置,检查主从复制是否过滤,是否支持GTID复制
Sun Jun :: - [info] Checking slave configurations..
Sun Jun :: - [info] Checking replication filtering settings..
Sun Jun :: - [info] binlog_do_db= , binlog_ignore_db=
Sun Jun :: - [info] Replication filtering check ok.
Sun Jun :: - [info] GTID (with auto-pos) is not supported
3、进行SSH连接测试,MHA版本检查
Sun Jun :: - [info] Starting SSH connection tests..
Sun Jun :: - [info] All SSH connection tests passed successfully.
Sun Jun :: - [info] Checking MHA Node version..
Sun Jun :: - [info] Version check ok.
4、检查主库上SSH 配置,测试恢复脚本(save_binary_logs)的可用性,对binlog设置进行检查
Sun Jun :: - [info] Checking SSH publickey authentication settings on the current master..
Sun Jun :: - [info] HealthCheck: SSH to 172.16.15.3 is reachable.
Sun Jun :: - [info] Master MHA Node version is 0.56.
Sun Jun :: - [info] Checking recovery script configurations on 172.16.15.3(172.16.15.3:)..
Sun Jun :: - [info] Executing command: save_binary_logs --command=test --start_pos= --binlog_dir=/usr/local/mysql/data --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --start_file=mysql_bin.
Sun Jun :: - [info] Connecting to root@172.16.15.3(172.16.15.3:)..
Creating /tmp if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /usr/local/mysql/data, up to mysql_bin.
Sun Jun :: - [info] Binlog setting check done.
5、检查从库SSH配置,测试应用差异日志脚本(apply_diff_relay_logs)的可用性,检查从库恢复环境和 relay log 的情况,检查MySQL的连接和权限,清理测试文件
Sun Jun :: - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Sun Jun :: - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=172.16.13.15 --slave_ip=172.16.13.15 --slave_port= --workdir=/tmp --target_version=5.7.-log --manager_version=0.56 --relay_log_info=/usr/local/mysql/data/relay-log.info --relay_dir=/usr/local/mysql/data/ --slave_pass=xxx
Sun Jun :: - [info] Connecting to root@172.16.13.15(172.16.13.15:)..
Checking slave recovery environment settings..
Opening /usr/local/mysql/data/relay-log.info ... ok.
Relay log found at /usr/local/mysql/data, up to mysqlserver-relay-bin.
Temporary relay log file is /usr/local/mysql/data/mysqlserver-relay-bin.
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done. Sun Jun :: - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=172.16.15.2 --slave_ip=172.16.15.2 --slave_port= --workdir=/tmp --target_version=5.7.-log --manager_version=0.56 --relay_log_info=/usr/local/mysql/data/relay-log.info --relay_dir=/usr/local/mysql/data/ --slave_pass=xxx
Sun Jun :: - [info] Connecting to root@172.16.15.2(172.16.15.2:)..
Checking slave recovery environment settings..
Opening /usr/local/mysql/data/relay-log.info ... ok.
Relay log found at /usr/local/mysql/data, up to A2-relay-bin.
Temporary relay log file is /usr/local/mysql/data/A2-relay-bin.
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Sun Jun :: - [info] Slaves settings check done.
6、得到当前的主从结构,检查每个从库的复制状态。检查故障切换等脚本的状态,完成主从复制检查
Sun Jun :: - [info]
172.16.15.3(172.16.15.3:) (current master)
+--172.16.13.15(172.16.13.15:)
+--172.16.15.2(172.16.15.2:) Sun Jun :: - [info] Checking replication health on 172.16.13.15..
Sun Jun :: - [info] ok.
Sun Jun :: - [info] Checking replication health on 172.16.15.2..
Sun Jun :: - [info] ok.
Sun Jun :: - [info] Checking master_ip_failover_script status:
Sun Jun :: - [info] /var/log/masterha/scripts/master_ip_failover --command=status --ssh_user=root --orig_master_host=172.16.15.3 --orig_master_ip=172.16.15.3 --orig_master_port=
Checking the Status of the script.. OK
Sun Jun :: - [info] OK.
Sun Jun :: - [warning] shutdown_script is not defined.
Sun Jun :: - [info] Got exit code (Not master dead). MySQL Replication Health is OK.
三、在线日志切换分析
1、读取配置文件,检查是否支持GTID复制,得到当前的主从结构
[root@A2 app1]# masterha_master_switch --conf=/etc/masterha/app1.conf --master_state=alive --new_master_host=172.16.13.15 --new_master_port= --orig_master_is_new_slave --running_updates_limit=
Sun Jun :: - [info] MHA::MasterRotate version 0.56.
Sun Jun :: - [info] Starting online master switch..
Sun Jun :: - [info]
Sun Jun :: - [info] * Phase : Configuration Check Phase..
Sun Jun :: - [info]
Sun Jun :: - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Jun :: - [info] Reading application default configuration from /etc/masterha/app1.conf..
Sun Jun :: - [info] Reading server configuration from /etc/masterha/app1.conf..
Sun Jun :: - [info] GTID failover mode =
Sun Jun :: - [info] Current Alive Master: 172.16.15.3(172.16.15.3:)
Sun Jun :: - [info] Alive Slaves:
Sun Jun :: - [info] 172.16.13.15(172.16.13.15:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Sun Jun :: - [info] Replicating from 172.16.15.3(172.16.15.3:)
Sun Jun :: - [info] Primary candidate for the new Master (candidate_master is set)
Sun Jun :: - [info] 172.16.15.2(172.16.15.2:) Version=5.7.-log (oldest major version between slaves) log-bin:enabled
Sun Jun :: - [info] Replicating from 172.16.15.3(172.16.15.3:)
2、在主库上确认执行 FLUSH NO_WRITE_TO_BINLOG TABLES,关闭已经打开的表,不再记录binlog,进行主从复制检查,得到新主库的信息
It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 172.16.15.3(172.16.15.3:)? (YES/no): yes
Sun Jun :: - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Sun Jun :: - [info] ok.
Sun Jun :: - [info] Checking MHA is not monitoring or doing failover..
Sun Jun :: - [info] Checking replication health on 172.16.13.15..
Sun Jun :: - [info] ok.
Sun Jun :: - [info] Checking replication health on 172.16.15.2..
Sun Jun :: - [info] ok.
Sun Jun :: - [info] 172.16.13.15 can be new master.
Sun Jun :: - [info]
From:
172.16.15.3(172.16.15.3:) (current master)
+--172.16.13.15(172.16.13.15:)
+--172.16.15.2(172.16.15.2:) To:
172.16.13.15(172.16.13.15:) (new master)
+--172.16.15.2(172.16.15.2:)
+--172.16.15.3(172.16.15.3:)
3、开始从旧主切换到新主,检查新主能否成为主库。检查复制过滤,临时将旧主change master to到一个dummy地址
Starting master switch from 172.16.15.3(172.16.15.3:) to 172.16.13.15(172.16.13.15:)? (yes/NO): yes
Sun Jun :: - [info] Checking whether 172.16.13.15(172.16.13.15:) is ok for the new master..
Sun Jun :: - [info] ok.
Sun Jun :: - [info] 172.16.15.3(172.16.15.3:): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Sun Jun :: - [info] 172.16.15.3(172.16.15.3:): Resetting slave pointing to the dummy host.
Sun Jun :: - [info] ** Phase : Configuration Check Phase completed.
Sun Jun :: - [info]
4、旧主库上执行master_ip_online_change,停止虚拟IP。在旧主上执行 FLUSH TABLES WITH READ LOCK..,实现全局读锁。
Sun Jun :: - [info] * Phase : Rejecting updates Phase..
Sun Jun :: - [info]
Sun Jun :: - [info] Executing master ip online change script to disable write on the current master:
Sun Jun :: - [info] /var/log/masterha/scripts/master_ip_online_change --command=stop --orig_master_host=172.16.15.3 --orig_master_ip=172.16.15.3 --orig_master_port= --orig_master_user='root' --orig_master_password='' --new_master_host=172.16.13.15 --new_master_ip=172.16.13.15 --new_master_port= --new_master_user='root' --new_master_password='' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave *************************************************************** Disabling the VIP - 172.16.13.141/ on old master: 172.16.15.3 Disabled the VIP successfully
*************************************************************** Sun Jun :: - [info] ok.
Sun Jun :: - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Sun Jun :: - [info] Executing FLUSH TABLES WITH READ LOCK..
Sun Jun :: - [info] ok.
5、获取旧主的binlog位置,在新主上面应用中继日志。得到新主的binlog 位置,用于后期在其他从库上执行change master to,在新主库上面开启虚拟IP,set read_only =0
Sun Jun :: - [info] Orig master binlog:pos is mysql_bin.:.
Sun Jun :: - [info] Waiting to execute all relay logs on 172.16.13.15(172.16.13.15:)..
Sun Jun :: - [info] master_pos_wait(mysql_bin.:) completed on 172.16.13.15(172.16.13.15:). Executed events.
Sun Jun :: - [info] done.
Sun Jun :: - [info] Getting new master's binlog name and position..
Sun Jun :: - [info] mysql_bin.:
Sun Jun :: - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.16.13.15', MASTER_PORT=, MASTER_LOG_FILE='mysql_bin.000058', MASTER_LOG_POS=, MASTER_USER='root', MASTER_PASSWORD='xxx';
Sun Jun :: - [info] Executing master ip online change script to allow write on the new master:
Sun Jun :: - [info] /var/log/masterha/scripts/master_ip_online_change --command=start --orig_master_host=172.16.15.3 --orig_master_ip=172.16.15.3 --orig_master_port= --orig_master_user='root' --orig_master_password='' --new_master_host=172.16.13.15 --new_master_ip=172.16.13.15 --new_master_port= --new_master_user='root' --new_master_password='' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave *************************************************************** Enabling the VIP - 172.16.13.141/ on new master: 172.16.13.15 Enabled the VIP successfully
*************************************************************** Sun Jun :: - [info] ok.
Sun Jun :: - [info] Setting read_only= on 172.16.13.15(172.16.13.15:)..
Sun Jun :: - [info] ok.
6、并行切换从库,应用中继日志到 旧主binlog位置,执行change master to;旧主也同时执行,而且执行 UNLOCK TABLES ,解锁。至此,从库切换完成
Sun Jun :: - [info]
Sun Jun :: - [info] * Switching slaves in parallel..
Sun Jun :: - [info]
Sun Jun :: - [info] -- Slave switch on host 172.16.15.2(172.16.15.2:) started, pid:
Sun Jun :: - [info]
Sun Jun :: - [info] Log messages from 172.16.15.2 ...
Sun Jun :: - [info]
Sun Jun :: - [info] Waiting to execute all relay logs on 172.16.15.2(172.16.15.2:)..
Sun Jun :: - [info] master_pos_wait(mysql_bin.:) completed on 172.16.15.2(172.16.15.2:). Executed events.
Sun Jun :: - [info] done.
Sun Jun :: - [info] Resetting slave 172.16.15.2(172.16.15.2:) and starting replication from the new master 172.16.13.15(172.16.13.15:)..
Sun Jun :: - [info] Executed CHANGE MASTER.
Sun Jun :: - [info] Slave started.
Sun Jun :: - [info] End of log messages from 172.16.15.2 ...
Sun Jun :: - [info]
Sun Jun :: - [info] -- Slave switch on host 172.16.15.2(172.16.15.2:) succeeded.
Sun Jun :: - [info] Unlocking all tables on the orig master:
Sun Jun :: - [info] Executing UNLOCK TABLES..
Sun Jun :: - [info] ok.
Sun Jun :: - [info] Starting orig master as a new slave..
Sun Jun :: - [info] Resetting slave 172.16.15.3(172.16.15.3:) and starting replication from the new master 172.16.13.15(172.16.13.15:)..
Sun Jun :: - [info] Executed CHANGE MASTER.
Sun Jun :: - [info] Slave started.
Sun Jun :: - [info] All new slave servers switched successfully.
Sun Jun :: - [info]
7、对新主清理,更改从库信息
Sun Jun :: - [info] * Phase : New master cleanup phase..
Sun Jun :: - [info]
Sun Jun :: - [info] 172.16.13.15: Resetting slave info succeeded.
Sun Jun :: - [info] Switching master to 172.16.13.15(172.16.13.15:) completed successfully.
MHA ssh检查,repl复制检查和在线切换日志分析的更多相关文章
- MySQL--19 MHA切换日志分析
MHA切换检测日志分析 GTID模式 [root@db03 ~]# tail -f /etc/mha/manager.log #在MySQL select ping:2006上出错(MySQL服务器已 ...
- MHA的在线切换后的一些总结(mha方案来自网络)
mha方案来自:http://www.cnblogs.com/xuanzhi201111/p/4231412.html MHA的在线切换 192.168.2.131 [root bin]$ maste ...
- MHA在线切换过程
MHA 在线切换是MHA除了自动监控切换换提供的另外一种方式,多用于诸如硬件升级,MySQL数据库迁移等等.该方式提供快速切换和优雅的阻塞写入,无关关闭原有服务器,整个切换过程在0.5-2s 的时间左 ...
- MHA在线切换的步骤及原理
在日常工作中,会碰到如下的场景,如mysql数据库升级,主服务器硬件升级等,这个时候就需要将写操作切换到另外一台服务器上,那么如何进行在线切换呢?同时,要求切换过程短,对业务的影响比较小. MHA就提 ...
- MySQL高可用方案MHA在线切换的步骤及原理
在日常工作中,会碰到如下的场景,如mysql数据库升级,主服务器硬件升级等,这个时候就需要将写操作切换到另外一台服务器上,那么如何进行在线切换呢?同时,要求切换过程短,对业务的影响比较小. MHA就提 ...
- (5.12)mysql高可用系列——复制中的在线切换GTID模式/增加节点/删除节点
目录 [0]需求 前提,已经假设好基于传统异步复制的主库和从库1. [0.1]传统异步切换成基于GTID的无损模式 [0.2]增加特殊要求的从库 [1]操作环境 [2]构建 复制->半同步复制 ...
- MHA 主从切换过程及日志分析
本文主要在MHA 切换日志的角度分析MHA切换的过.MHA故障切换过程如下图所示 第一部分:开启MHA 监控 通过分析日志,得到以下步骤: 1.读取MHA manager 节点的配置文件,并检查配置文 ...
- Oracle 无备份情况下的恢复--临时文件/在线重做日志/ORA-00205
13.5 恢复临时文件 临时文件没有也不应该备份.通过V$TEMPFILE可以找到所有的临时文件. 此类文件的损坏会造成需要使用临时表空间的命令执行失败,不至于造成实例崩溃或session中断.由于临 ...
- Oracle RACDB 增加、删除 在线重做日志组
Oracle RACDB 增加.删除 在线重做日志组 select * from v$log;select * from v$logfile ; ----删除日志组:alter database dr ...
随机推荐
- qmake生成VS的vcproj/sln工程文件
qmake 生成的vs工程与环境变量中的 qmakespec相关,可以有两种方法: 1.默认情况下,即环境变量qmakespec为你装的qt for vs的版本,默认生成的为该版本的vs工程,如,你装 ...
- leetcode-easy-listnode-206 reverse linked list
mycode 98.87 # Definition for singly-linked list. # class ListNode(object): # def __init__(self, x ...
- SQLServer2012R2部署手册
1. 安装软件.net framework3.5 1.在安装SQL SERVER 2012前需要3.5的支持.在WIN 2012系统可以在系统管理的添加角色和功能中安装,如下将[.NET Framew ...
- C语言转义字符表和ASCII码表
主要参考 http://www.51hei.com/mcu/4342.html 以及 https://www.cnblogs.com/jason207489550/p/6663444.html
- Select 选择器
Select 选择器 当选项过多时,使用下拉菜单展示并选择内容. 基础用法 适用广泛的基础单选 v-model的值为当前被选中的el-option的 value 属性值 <template> ...
- 阶段3 2.Spring_03.Spring的 IOC 和 DI_4 ApplicationContext的三个实现类
如何找到接口的实现类 BeanFactory是核心容器的顶层接口 查看接口的实现类 接下来介绍这三个实现类 把bean.xml复制到桌面上面 运行测试程序 实际更常用ClassPathXmlAppli ...
- SuperSocket 学习笔记-客户端
客户端: 定义 private AsyncTcpSession client; 初始化 client = new AsyncTcpSession(); client.Connected += Clie ...
- linux 磁盘命令
用到共享软件为:samba 配置文件为 /etc/samba/smb.conf sudo fdisk -l 查看磁盘 sudo df -lh 查看磁盘挂载情况 sudo mount /dev/sdb ...
- 【miscellaneous】监狱智能视频监控系统设计解决方案
监狱智能视频监控系统设计解决方案 一.系统概况 随着司法监狱管理系统内视频监控系统的日益发展,现有的被动式人工监控这一传统模式已无法满足新形势下的监管工作需求,尤其是现在靠轮询的视频监控方式,无法对突 ...
- 20191209 Linux就该这么学(5)
5. 用户身份与文件权限 5.1 用户的身份和能力 Linux 系统的管理员之所以是 root,并不是因为它的名字叫 root,而是因为该用户的身份号码即 UID( User IDentificati ...