MHA-Failover可能遇到的坑
一、主从数据一致性
1.1、如何保证主从数据一致性
参考叶师傅文章:FAQ系列 | 如何保证主从复制数据一致性
在MySQL中,一次事务提交后,需要写undo、写redo、写binlog,写数据文件等等。在这个过程中,可能在某个步骤发生crash,就有可能导致主从数据的不一致。为了避免这种情况,我们需要调整主从上面相关选项配置,确保即便发生crash了,也不能发生主从复制的数据丢失。
MASTER上修改配置innodb_flush_log_at_trx_commit = --> redo log 1写磁盘、2写系统缓存(操作系统挂可能丢数据)、0写redo log buffer(mysql挂可能丢数据)
sync_binlog = --> binlog 1写磁盘、0写系统缓存保证每次事务提交后,都能实时刷新到磁盘中,尤其是确保每次事务对应的binlog都能及时刷新到磁盘中
SLAVE上修改配置master_info_repository = "TABLE"
relay_log_info_repository = "TABLE"
relay_log_recovery =确保在slave上和复制相关的元数据表也采用InnoDB引擎,受到InnoDB事务安全的保护;开启relay-log自动修复机制,发生crash时根据relay_log_info中记录的已执行的binlog位置从master上重新抓取回来再次应用,以此避免部分数据丢失的可能性。
这样配置后,正常情况下主从数据应该是一致的~
1.2、主从数据不一致,复制状态正常
• binlog_format='STATEMENT'
只要复制语句对应的表结构一致,主从数据是否一致不会影响复制状态
• binlog_format='ROW'
1、有主键/唯一索引的情况下,slave应用relay-log的过程只需匹配主键/唯一索引即可,不会考虑其他列与master上的原始值是否一致
2、slave update/delete master上永远不会访问的数据
一致性的保证,需要定期使用pt工具检测并同步啦●-●
二、relay_log_recovery && relay_log_purge
参考文章:MySQL relay_log_purge=0 时的风险
有时候,我们希望将MySQL的relay-log多保留一段时间,比如用于高可用切换后的数据补齐,于是就会设置relay_log_purge=0,禁止SQL_Thread在执行完一个relay-log后自动将其删除。
relay_log_recovery=1 && relay_log_purge=0会有什么坑
• 由于崩溃或停止MySQL时,SQL_Thread可能没有执行完全部的relay-log,最后一个relay-log中的一部分数据会被重新获取到新的文件中。也就是说,这部分数据重复了两次
• 如果SQL_Thread跟得很紧,则可能在IO_Thread写入relay-log,但还没有同步到磁盘时,就已经读取执行了。这时,就会造成新的文件和旧的文件中少了一部分数据
对于复制来说这样不会有什么影响,但如果我们读取relay-log来获取数据,必须注意这一点,否则就会造成数据不一致
三、MHA-Failover可能遇到的坑
传统复制环境,MHA利用Latest Slave的relay-log去补全其他Slave的与Latest Slave之间的差异数据;GTID环境,通过change master to利用binlog补全数据,不再依赖relay-log
为了方便模拟,本文选择手动Failover来检测MHA遇到上面提到的坑会出现什么现象?本文使用MHA-手动Failover流程(传统复制>ID复制)中的基本环境
3.1、relay-log重复
人为暂停SQL_Thread,再关闭MySQL实例,模拟SQL_Thread没有执行完全部的relay-log
relay_log_recovery= && relay_log_purge=
#测试数据简写如下
Node1写入第一条记录->Node3停止io_thread
Node1写入第二条记录->
、Node2从库stop slave sql_thread;
、Node1主库写入一条新数据row_new
、Node2从库shutdown;
、Node2从库启动mysql,start slave;
Node2停止io_thread
Node1写入第三条记录
暂停从库的SQL_Thread,主库写入新数据,新数据被IO_Thread获取写入到relay-log,然后重新启动从库的mysql实例,IO_Thread根据relay_log_info中记录的已执行的binlog位置从master上重新抓取回来再次应用,因此在relay-log中可以解析到row_new获取过两次~
[root@ZST3 app1]# masterha_master_switch --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf --dead_master_host=192.168.85.132 --dead_master_port= --master_state=dead --new_master_host=192.168.85.134 --new_master_port= --ignore_last_failover
...
Fri Apr :: - [info] * Phase 3.4: Master Log Apply Phase..
Fri Apr :: - [info]
Fri Apr :: - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Fri Apr :: - [info] Starting recovery on 192.168.85.134(192.168.85.134:)..
Fri Apr :: - [info] Generating diffs succeeded.
Fri Apr :: - [info] Waiting until all relay logs are applied.
Fri Apr :: - [info] done.
Fri Apr :: - [debug] Stopping SQL thread on 192.168.85.134(192.168.85.134:)..
Fri Apr :: - [debug] done.
Fri Apr :: - [info] Getting slave status..
Fri Apr :: - [info] This slave(192.168.85.134)''s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.:). No need to recover from Exec_Master_Log_Pos.
Fri Apr :: - [debug] Current max_allowed_packet is .
Fri Apr :: - [debug] Tentatively setting max_allowed_packet to 1GB succeeded.
Fri Apr :: - [info] Connecting to the target slave host 192.168.85.134, running recover script..
Fri Apr :: - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mydba' --slave_host=192.168.85.134 --slave_ip=192.168.85.134 --slave_port= --apply_files=/var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180413180912.binlog,/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180413180912.binlog --workdir=/var/log/masterha/app1 --target_version=5.7.-log --timestamp= --handle_raw_binlog= --disable_log_bin= --manager_version=0.56 --debug --slave_pass=xxx
Fri Apr :: - [info]
Concat all apply files to /var/log/masterha/app1/total_binlog_for_192.168.85.134_3307..binlog ..
Copying the first binlog file /var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180413180912.binlog to /var/log/masterha/app1/total_binlog_for_192.168.85.134_3307..binlog.. ok.
Dumping binlog head events (rotate events), skipping format description events from /var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180413180912.binlog.. parse_init_headers: file=saved_master_binlog_from_192.168.85.132_3307_20180413180912.binlog event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
Binlog Checksum enabled
parse_init_headers: file=saved_master_binlog_from_192.168.85.132_3307_20180413180912.binlog event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
Got previous gtids log event: .
parse_init_headers: file=saved_master_binlog_from_192.168.85.132_3307_20180413180912.binlog event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
dumped up to pos . ok.
/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180413180912.binlog has effective binlog events from pos .
Dumping effective binlog data from /var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180413180912.binlog position to tail().. ok.
Concat succeeded.
All apply target binary logs are concatinated at /var/log/masterha/app1/total_binlog_for_192.168.85.134_3307..binlog .
MySQL client version is 5.7.. Using --binary-mode.
Applying differential binary/relay log files /var/log/masterha/app1/relay_from_read_to_latest_192.168.85.134_3307_20180413180912.binlog,/var/log/masterha/app1/saved_master_binlog_from_192.168.85.132_3307_20180413180912.binlog on 192.168.85.134:. This may take long time...
FATAL: applying log files failed with rc :!
Error logs from ZST3:/var/log/masterha/app1/relay_log_apply_for_192.168.85.134_3307_20180413180912_err.log (the last lines)..
mysql: [Warning] Using a password on the command line interface can be insecure.
...
ERROR () at line : Duplicate entry '' for key 'PRIMARY'
--------------
BINLOG '
NoDQWhMrMRQAPwAAAAMEAAAAAG4AAAAAAAEACXJlcGxjcmFzaAAHcHlfdXNlcgAEAw8SDwVgAAAe
AA7JJu9M
NoDQWh4rMRQAVgAAAFkEAAAAAG4AAAAAAAEAAgAE//ADAAAAIGM3MzExZWQ0LTNmMDEtMTFlOC05
ODg4LTAwMGMyOWMxmZ+bIJ4HMTMyMzMwN3PJaGg=
'
-------------- Bye
at /usr/bin/apply_diff_relay_logs line
eval {...} called at /usr/bin/apply_diff_relay_logs line
main::main() called at /usr/bin/apply_diff_relay_logs line
Fri Apr :: - [debug] Setting max_allowed_packet back to succeeded.
Fri Apr :: - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln1398] Applying diffs failed with return code :.
Fri Apr :: - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln1561] Recovering master server failed.
Fri Apr :: - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/bin/masterha_master_switch line
Fri Apr :: - [debug] Disconnected from 192.168.85.133(192.168.85.133:)
Fri Apr :: - [debug] Disconnected from 192.168.85.134(192.168.85.134:)
Fri Apr :: - [info] ----- Failover Report ----- app1: MySQL Master failover 192.168.85.132(192.168.85.132:) Master 192.168.85.132(192.168.85.132:) is down! Check MHA Manager logs at ZST3 for details. Started manual(interactive) failover.
Invalidated master IP address on 192.168.85.132(192.168.85.132:)
The latest slave 192.168.85.133(192.168.85.133:) has all relay logs for recovery.
Selected 192.168.85.134(192.168.85.134:) as a new master.
Recovering master server failed.
Got Error so could not continue failover from here.
[root@ZST3 app1]#
MHA切换会报错!原因就是Node3获取Latest Slave上的数据,会有重复记录,导致应用差异日志时报错。relay_from_read_to_latest_**里面也能看到有重复数据
3.2、relay-log缺失
要模拟SQL_Thread跟得比较紧不太好实现,但是可以变相模拟从库缺失relay-log的情况
relay_log_recovery= && relay_log_purge=
#测试数据简写如下
Node1写入第一条记录->Node3停止io_thread
Node1写入第二条记录->Node2执行两次flush relay logs;->Node2停止io_thread
Node1写入第三条记录
目的是将第二条记录相关的relay-log给purge掉,这样Latest Slave上就没有足够的relay-log用于其他Slave的恢复
[root@ZST3 app1]# masterha_master_switch --global_conf=/etc/masterha/masterha_default.conf --conf=/etc/masterha/app1.conf --dead_master_host=192.168.85.132 --dead_master_port= --master_state=dead --new_master_host=192.168.85.134 --new_master_port= --ignore_last_failover
...
Fri Apr :: - [info] * Phase 3.3: Determining New Master Phase..
Fri Apr :: - [info]
Fri Apr :: - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Fri Apr :: - [info] Checking whether 192.168.85.133 has relay logs from the oldest position..
Fri Apr :: - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=mysql-bin. --latest_rmlp= --target_mlf=mysql-bin. --target_rmlp= --server_id= --workdir=/var/log/masterha/app1 --timestamp= --manager_version=0.56 --relay_log_info=/data/mysql/mysql3307/data/relay-log.info --relay_dir=/data/mysql/mysql3307/data/ --debug :
Opening /data/mysql/mysql3307/data/relay-log.info ... ok.
Relay log found at /data/mysql/mysql3307/data, up to relay-bin.
Fast relay log position search failed. Reading relay logs to find..
Reading relay-bin.
parse_init_headers: file=relay-bin. event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
Binlog Checksum enabled
parse_init_headers: file=relay-bin. event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
Got previous gtids log event: .
parse_init_headers: file=relay-bin. event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
Master Version is 5.7.-log
Binlog Checksum enabled
parse_init_headers: file=relay-bin. event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
get_starting_mlp: file=relay-bin. event_type= server_id= length= next=
relay-bin. contains master mysql-bin. from position
Reading relay-bin.
parse_init_headers: file=relay-bin. event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
Binlog Checksum enabled
parse_init_headers: file=relay-bin. event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
Got previous gtids log event: .
parse_init_headers: file=relay-bin. event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
parse_init_headers: file=relay-bin. event_type= server_id= length= nextmpos= prevrelay= cur(post)relay=
Reading relay-bin.
No such file or directory:/data/mysql/mysql3307/data/relay-bin. at /usr/share/perl5/vendor_perl/MHA/BinlogPosFindManager.pm line
Fri Apr :: - [warning] 192.168.85.133 does not have all relay logs. Maybe some logs were purged.
Fri Apr :: - [warning] None of latest servers have enough relay logs from oldest position. We can not recover oldest slaves.
Fri Apr :: - [error][/usr/share/perl5/vendor_perl/MHA/MasterFailover.pm, ln947] None of the latest slaves has enough relay logs for recovery.
Fri Apr :: - [error][/usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/bin/masterha_master_switch line
Fri Apr :: - [debug] Disconnected from 192.168.85.133(192.168.85.133:)
Fri Apr :: - [debug] Disconnected from 192.168.85.134(192.168.85.134:)
Fri Apr :: - [info] ----- Failover Report ----- app1: MySQL Master failover 192.168.85.132(192.168.85.132:) Master 192.168.85.132(192.168.85.132:) is down! Check MHA Manager logs at ZST3 for details. Started manual(interactive) failover.
Invalidated master IP address on 192.168.85.132(192.168.85.132:)
None of the latest slaves has enough relay logs for recovery.
Got Error so could not continue failover from here.
[root@ZST3 app1]#
MHA切换会报错!原因是Latest Slave没有包含足够的relay-log用于其他Slave的恢复操作
这样看来MHA需要relay-log恢复数据的过程,如果relay-log重复或者缺失会直接报错,切换失败!!!
自动切换先找出所有配置candidate_master=1的[server],再从中找出日志最新的,如果有多个日志最新的,那就按[server]的先后顺序来选new master
传统复制环境,如果选择了"问题Slave"作为Latest Slave,不管手动还是自动Failover,切换都会报错。所以尽量用GTID吧~
3.3、default-character-set
15:07 2018/7/26 补充
GTID环境,执行save_binary_logs --command=save 保存Dead Master/Binlog Server和Latest Slave之间的差异数据报错
mysqlbinlog:[ERROR] unknown variable 'default-character-set=utf8'
/etc/mysql.cnf中有
[client]
default-character-set=utf8
当注释掉这行就可以正常切换(不需重启),什么原因呢?
GTID环境save_binary_logs执行的类似这种命令
Executing command: mysqlbinlog --start-position= /data/mysql/mysql3307/logs/mysql-bin.
mysqlbinlog类似mysqladmin会到/etc/my.cnf /etc/mysql/my.cnf /usr/local/mysql/etc/my.cnf ~/.my.cnf文件中读取[mysqladmin] [client]组
如果上述配置文件中添加前面的字符集信息,尝试打印mysqlbinlog默认参数信息
[root@ZST1 ~]# mysqlbinlog --print-defaults
mysqlbinlog would have been started with the following arguments:
--character-set-server=utf8
也就是说mysqlbinlog --start-position=1013 /data/mysql/mysql3307/logs/mysql-bin.000009,等效命令
mysqlbinlog --character-set-server=utf8 --start-position=1013 /data/mysql/mysql3307/logs/mysql-bin.000009
但是mysqlbinlog并不支持--character-set-server这样的变量所以就报错啦~
解决方法嘛,注释配置文件中的字符集信息,或者给mysqlbinlog增加一个别名:alias mysqlbinlog='mysqlbinlog --no-defaults'
MHA-Failover可能遇到的坑的更多相关文章
- MHA failover GTID 专题
https://yq.aliyun.com/articles/238882?spm=5176.8067842.tagmain.18.73PjU3 摘要: MHA failover GTID 专题 这里 ...
- MySQL MHA FailOver后,原Master节点自动以Slave角色加入解群的研究与实现
MHA是一套MySQL高可用管理软件,除了检测Master宕机后,提升候选Slave为New Master之外(漂虚拟IP),还会自动让其他Slave与New Master 建立复制关系.MHA Ma ...
- 使用Assetbundle时可能遇到的坑
原地址:http://www.cnblogs.com/realtimepixels/p/3652128.html 一 24 十一郎未分类 No Comments 转自 http://www.unity ...
- MySQL备份可能遇到的坑
MySQL备份工具,支持各种参数选项,使用不同的选项极有可能影响备份处理过程.本文使用我们常规认为合理的备份参数,测试/验证是否存在容易忽视的坑 # 常规备份参数 # mysqldump shell& ...
- 你在使用assetbundle时可能遇到的坑【转】
在公司项目开发中,用到了assetbundle,由于是webplayer不像手机,流量限制几乎没有,所以场景都是用assetbundle打包后动态加载的,但是这个过程中,遇到不少坑: 1.Editor ...
- 苹果ATS Win2008 R2 IIS7.5 HTTPS 证书的那些可能遇到的坑
前言:工作这么多年,每一次要弄https 都和苹果有关,上一次是苹果app的企业安装形式,ios7后 .plist 文件必须在一个https路径. 这一次则是苹果的ATS计划,无疑这是在推动网络安全上 ...
- SpringBoot初探JSP页面可能遇到的坑
第一个坑就是依赖没有配了 网上很多依赖的配置代码 在pom.xml文件的dependencies添加以下依赖 <!-- servlet依赖 --> <dependency> & ...
- 初学者学习golang可能遇到的坑
我也是个golang初学者,刚入门的话,有些"坑"还是不好发现的.如map只是定义了然后就拿来使用,变量的值覆盖等. 本来打算写一篇的,后面发现有人写的挺不错的,我就把里面的有些坑 ...
- moviepy音视频剪辑:使用fl_time进行诸如快播、慢播、倒序播放等时间特效处理的原理和可能遇到的坑
专栏:Python基础教程目录 专栏:使用PyQt开发图形界面Python应用 专栏:PyQt+moviepy音视频剪辑实战 专栏:PyQt入门学习 老猿Python博文目录 老猿学5G博文目录 一. ...
随机推荐
- Launch4j Java 转可执行程序工具
launch4j 可以用来将Java应用程序转成Windows本地可执行文件 (.exe).提供了本地弹出屏幕,应用程序图标,JRE搜索或使用绑定的JRE,启动失败反馈,传递命令行参数,ANT编译脚本 ...
- 睡前小dp-poj2096-概率dp
http://poj.org/problem?id=2096 有n种分类,s种子系统,相互独立.每天发现一个bug等概率的属于n种分类和s种子系统. 要使发现的bug完全覆盖n种分类,s种分类,求天数 ...
- 【HDU5831】Rikka with Parenthesis II(括号)
BUPT2017 wintertraining(16) #4 G HDU - 5831 题意 给定括号序列,问能否交换一对括号使得括号合法. 题解 注意()是No的情况. 任意时刻)不能比(超过2个以 ...
- 自学Zabbix13.2 分布式监控proxy配置
点击返回:自学Zabbix之路 点击返回:自学Zabbix4.0之路 点击返回:自学zabbix集锦 自学Zabbix13.2 分布式监控proxy配置 分为两部分: 安装proxy 配置proxy ...
- pytesseract 使用框架
import pytesseract import cv2 img = cv2.imread("captcha.jpg",0) try: img.shape except Attr ...
- 洛谷P4112 最短不公共子串
题意: 下面,给两个小写字母串A,B,请你计算: (1) A的一个最短的子串,它不是B的子串 (2) A的一个最短的子串,它不是B的子序列 (3) A的一个最短的子序列,它不是B的子串 (4) A的一 ...
- 关于jqGrid中GridUnload方法的困惑
首先 GridUnload 这个方法在 4.7.1 + 的版本中已经删除,直接把4.7.1中的grid.common.js合来用就行. GridUnload 这个方法是直接删除原来的table,重新生 ...
- 2018 ACM 网络选拔赛 沈阳赛区
B. Call of Accepted #include <cstdio> #include <cstdlib> #include <cmath> #include ...
- (string find) 亲和串 hdu2203
亲和串 Time Limit: 3000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others) Total Submis ...
- 记一次B站答题经历
第一题部分:社区规范卷 --------- ------------ 第二题:社区规范第二部分 -------------------- 第三部分自由选择题 --------------------- ...