Preface
 
    In my last test of pt-heartbeat,both of master and slave were out of disk.And the mysql client was hang.In order to resolve the issue,I've tryed to fix the replicaiton environment without using mysqldump to reconfigure the slave.Let's see the details.
 
Procedure
 
I dropped test tables in database "sysbench" to release the disk space on master.
 [root@zlm2 :: /data/mysql/mysql3306/logs]
#sysbench oltp_read_write.lua --mysql-host=192.168.1.101 --mysql-port= --mysql-user=zlm --mysql-password=zlmzlm --mysql-db=sysbench --tables= --table-size= --mysql-storage-engine=innodb cleanup
sysbench 1.0. (using bundled LuaJIT 2.1.-beta2) Dropping table 'sbtest1'... (zlm@192.168.1.101 )[(none)]>use sysbench;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A Database changed
(zlm@192.168.1.101 )[sysbench]>show tables;
+--------------------+
| Tables_in_sysbench |
+--------------------+
| hb |
| sbtest2 |
| sbtest3 |
| sbtest4 |
| sbtest5 |
+--------------------+
rows in set (0.00 sec) //Only sbtest1 was deleted.It's not enough. [root@zlm2 :: ~/sysbench-1.0/src/lua]
#sysbench oltp_read_write.lua --mysql-host=192.168.1.101 --mysql-port= --mysql-user=zlm --mysql-password=zlmzlm --mysql-db=sysbench --tables= --table-size= --mysql-storage-engine=innodb cleanup
sysbench 1.0. (using bundled LuaJIT 2.1.-beta2) Dropping table 'sbtest1'...
Dropping table 'sbtest2'...
Dropping table 'sbtest3'...
Dropping table 'sbtest4'...
Dropping table 'sbtest5'... [root@zlm2 :: ~/sysbench-1.0/src/lua]
#df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-root .4G .9G .5G % / //I'd got 27% free space.
devtmpfs 488M 488M % /dev
tmpfs 497M 497M % /dev/shm
tmpfs 497M 6.6M 491M % /run
tmpfs 497M 497M % /sys/fs/cgroup
/dev/sda1 497M 118M 379M % /boot
none 87G 80G .6G % /vagrant (zlm@192.168.1.101 )[(none)]>drop database sysbench;
Query OK, row affected (0.04 sec) //Further more,I dropped the "sysbench".
The slave hung still and disk space was full.
 [root@zlm3 :: ~]
#df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-root .4G .4G 20K % /
devtmpfs 488M 488M % /dev
tmpfs 497M 497M % /dev/shm
tmpfs 497M 6.5M 491M % /run
tmpfs 497M 497M % /sys/fs/cgroup
/dev/sda1 497M 118M 379M % /boot
none 87G 80G .6G % /vagrant (zlm@192.168.1.102 )[(none)]>show slave status\G
^C^C -- query aborted ^Z
[]+ Stopped mysql [root@zlm3 :: ~]
#pkill mysqld [root@zlm3 :: ~]
#./mysqld.sh [root@zlm3 :: ~]
#mysql
ERROR (HY000): Can't connect to MySQL server on '192.168.1.102' (111) [root@zlm3 :: ~]
#cd /data/mysql/mysql3306/data [root@zlm3 :: /data/mysql/mysql3306/data]
#cat error.log |tail -n
--19T08::02.581937+: [Note] InnoDB: Log scan progressed past the checkpoint lsn
--19T08::02.581958+: [Note] InnoDB: Doing recovery: scanned up to log sequence number
--19T08::02.581963+: [Note] InnoDB: Database was not shutdown normally!
--19T08::02.581965+: [Note] InnoDB: Starting crash recovery.
--19T08::02.696292+: [Note] InnoDB: Transaction was in the XA prepared state.
--19T08::02.700688+: [Note] InnoDB: Transaction was in the XA prepared state.
--19T08::02.700814+: [Note] InnoDB: transaction(s) which must be rolled back or cleaned up in total row operations to undo
--19T08::02.700821+: [Note] InnoDB: Trx id counter is
--19T08::02.701719+: [Note] InnoDB: Last MySQL binlog file position , file name mysql-bin.
--19T08::02.805965+: [Note] InnoDB: Ignoring tablespace `zlm`.`sbtest2` because the DISCARD flag is set .
--19T08::02.806462+: [Note] InnoDB: Creating shared tablespace for temporary tables
--19T08::02.807316+: [Note] InnoDB: Setting file './ibtmp1' size to MB. Physically writing the file full; Please wait ...
--19T08::02.807568+: [Note] InnoDB: Starting in background the rollback of uncommitted transactions
--19T08::02.807594+: [Note] InnoDB: Rollback of non-prepared transactions completed
--19T08::02.871396+: [Warning] InnoDB: Retry attempts for writing partial data failed.
--19T08::02.871423+: [ERROR] InnoDB: Write to file ./ibtmp1failed at offset , bytes should have been written, only were written. Operating system error number . Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.
--19T08::02.871441+: [ERROR] InnoDB: Error number means 'No space left on device'
--19T08::02.871446+: [Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html
--19T08::02.871451+: [ERROR] InnoDB: Could not set the file size of './ibtmp1'. Probably out of disk space
--19T08::02.871456+: [ERROR] InnoDB: Unable to create the shared innodb_temporary
--19T08::02.871459+: [ERROR] InnoDB: Plugin initialization aborted with error Generic error
--19T08::03.273011+: [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
--19T08::03.273029+: [ERROR] Plugin 'InnoDB' init function returned error.
--19T08::03.273033+: [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
--19T08::03.273037+: [ERROR] Failed to initialize builtin plugins.
--19T08::03.273040+: [ERROR] Aborting --19T08::03.273046+: [Note] Binlog end
--19T08::03.273389+: [Note] mysqld: Shutdown complete //The mysqld process could not run again because of no free disk space.
I decided to drop all the binlogs on slave to release the disk space.
 [root@zlm3 :: /data/mysql/mysql3306]
#cd logs [root@zlm3 :: /data/mysql/mysql3306/logs]
#ls -l
total
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.index [root@zlm3 :: /data/mysql/mysql3306/logs]
#rm -f * [root@zlm3 :: /data/mysql/mysql3306/logs]
#ls -l
total [root@zlm3 :: ~]
#df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-root .4G .5G .0G % / //The free disk space had been reduced to 47%.
devtmpfs 488M 488M % /dev
tmpfs 497M 497M % /dev/shm
tmpfs 497M 6.5M 491M % /run
tmpfs 497M 497M % /sys/fs/cgroup
/dev/sda1 497M 118M 379M % /boot
none 87G 80G .6G % /vagrant
Ran the mysqld again and dropped the database "sysbench" on slave.
 [root@zlm3 :: /data/mysql/mysql3306/logs]
#sh /root/mysqld.sh [root@zlm3 :: /data/mysql/mysql3306/logs]
#ps aux|grep mysqld
mysql 7.0 17.8 pts/ Sl : : mysqld --defaults-file=/data/mysql/mysql3306/my.cnf
root 0.0 0.0 pts/ R+ : : grep --color=auto mysqld [root@zlm3 :: /data/mysql/mysql3306/logs]
#mysql
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is
Server version: 5.7.-log MySQL Community Server (GPL) Copyright (c) , , Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. (zlm@192.168.1.102 )[(none)]>drop database sysbench;
Query OK, rows affected (0.11 sec)

Started the replication threads of slave.

 (zlm@192.168.1.102 )[(none)]>start slave;
Query OK, rows affected (0.00 sec) (zlm@192.168.1.102 )[(none)]>show slave status\G
*************************** . row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.1.101
Master_User: repl
Master_Port:
Connect_Retry:
Master_Log_File: mysql-bin.
Read_Master_Log_Pos:
Relay_Log_File: relay-bin.
Relay_Log_Pos:
Relay_Master_Log_File: mysql-bin.
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno:
Last_Error: Error executing row event: 'Table 'sysbench.sbtest1' doesn't exist'
Skip_Counter:
Exec_Master_Log_Pos:
Relay_Log_Space:
Until_Condition: None
Until_Log_File:
Until_Log_Pos:
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno:
Last_IO_Error:
Last_SQL_Errno:
Last_SQL_Error: Error executing row event: 'Table 'sysbench.sbtest1' doesn't exist' //Since the database had been droppted.This error was notable.
Replicate_Ignore_Server_Ids:
Master_Server_Id:
Master_UUID: 1b7181ee-6eaf-11e8-998e-080027de0e0e
Master_Info_File: mysql.slave_master_info
SQL_Delay:
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count:
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp: ::
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:- //It was stuck on transaction 3714549(which contained error).
Executed_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:-,
5c77c31b-4add-11e8-81e2-080027de0e0e:-
Auto_Position:
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
row in set (0.00 sec) [root@zlm3 :: ~]
#perror
MySQL error code (ER_NO_SUCH_TABLE): Table '%-.192s.%-.192s' doesn't exist //Error 1146 indicated the absence of table "sbtest1" in "sysbench" database.
//Obviously,the slave was replaying the operations relevant to this table on master.The table even the database had been dropped.
//How could I do next step?Do I have to generate a new mysqldump file and reconfigure the slave again?
//There's One thing I'm rather sure that there were no other transactions generated in the whole course except the operations on "sysbench" database.
//Since I'd drop "sysbentch" database on both master and slave.Maybe I can fix the issue easily.

Checked the Executed_Gtid_Set on master.

 (zlm@192.168.1.101 )[(none)]>show master status;
+------------------+-----------+--------------+------------------+------------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+-----------+--------------+------------------+------------------------------------------------+
| mysql-bin. | | | | 1b7181ee-6eaf-11e8-998e-080027de0e0e:- |
+------------------+-----------+--------------+------------------+------------------------------------------------+
row in set (0.00 sec) //The executed gtid was upto "3730021".

Tryed to fix the replica of master.

 (zlm@192.168.1.102 )[(none)]>show slave status\G
*************************** . row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.1.101
Master_User: repl
Master_Port:
Connect_Retry:
Master_Log_File: mysql-bin.
Read_Master_Log_Pos:
Relay_Log_File: relay-bin.
Relay_Log_Pos:
Relay_Master_Log_File: mysql-bin.
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno:
Last_Error: Error executing row event: 'Table 'sysbench.sbtest1' doesn't exist'
Skip_Counter:
Exec_Master_Log_Pos:
Relay_Log_Space:
Until_Condition: None
Until_Log_File:
Until_Log_Pos:
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno:
Last_IO_Error:
Last_SQL_Errno:
Last_SQL_Error: Error executing row event: 'Table 'sysbench.sbtest1' doesn't exist'
Replicate_Ignore_Server_Ids:
Master_Server_Id:
Master_UUID: 1b7181ee-6eaf-11e8-998e-080027de0e0e
Master_Info_File: mysql.slave_master_info
SQL_Delay:
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count:
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp: ::
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:-
Executed_Gtid_Set:
Auto_Position:
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
row in set (0.00 sec) (zlm@192.168.1.102 )[(none)]>reset master;
Query OK, rows affected (0.02 sec) (zlm@192.168.1.102 )[(none)]>set @@global.gtid_purged='1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3730021';
Query OK, rows affected (0.00 sec) //On account of surely knowing there were no other transactions at all.I set the "gtid_purged" variable to the value of "gtid_executed" on master.
//It means I guised that all the transactions generated on master had been replayed on slave already.The slave could retrieve new GTID at the moment. (zlm@192.168.1.102 )[(none)]>start slave sql_thread;
Query OK, rows affected (0.02 sec) (zlm@192.168.1.102 )[(none)]>show slave status\G
*************************** . row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.1.101
Master_User: repl
Master_Port:
Connect_Retry:
Master_Log_File: mysql-bin.
Read_Master_Log_Pos:
Relay_Log_File: relay-bin.
Relay_Log_Pos:
Relay_Master_Log_File: mysql-bin.
Slave_IO_Running: Yes
Slave_SQL_Running: Yes //The sql_thread became "Yes".
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno:
Last_Error:
Skip_Counter:
Exec_Master_Log_Pos:
Relay_Log_Space:
Until_Condition: None
Until_Log_File:
Until_Log_Pos:
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master:
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno:
Last_IO_Error:
Last_SQL_Errno:
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id:
Master_UUID: 1b7181ee-6eaf-11e8-998e-080027de0e0e
Master_Info_File: mysql.slave_master_info
SQL_Delay:
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count:
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:-
Executed_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:- //The slave had skipped those GTID(which contained error 1146) of master and waited for newer GTID.The replica had been fixed up.
Auto_Position:
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
row in set (0.00 sec)

Summary

  • The variable "gtid_purged" cannot be set if "gtid_executed" is not empty.
  • Caution,"reset master" can only be used on slave.Keep in mind that don't do it on master anytime.
  • This case can be followed only in test environment 'cause you cannot guarantee whether all the transactions are really replayed on slave.

GTID环境中手动修复主从故障一例(Error 1146)的更多相关文章

  1. GTID环境中手动修复主从故障一例(Error 1236/Error 1396)

      Preface       I got an replication error 1236 when I modified the password of a user without start ...

  2. SqlServer 禁止架构更改的复制中手动修复使发布和订阅中分别增加的字段同步

    原文:SqlServer 禁止架构更改的复制中手动修复使发布和订阅中分别增加的字段同步 由于之前的需要,禁止了复制架构更改,以至在发布中添加一个字段,并不会同步到订阅中,而现在又在订阅中添加了一个同名 ...

  3. 企业运维 | MySQL关系型数据库在Docker与Kubernetes容器环境中快速搭建部署主从实践

    [点击 关注「 WeiyiGeek」公众号 ] 设为「️ 星标」每天带你玩转网络安全运维.应用开发.物联网IOT学习! 希望各位看友[关注.点赞.评论.收藏.投币],助力每一个梦想. 本章目录 目录 ...

  4. 业务零影响!如何在Online环境中巧用MySQL传统复制技术【转】

    业务零影响!如何在Online环境中巧用MySQL传统复制技术 这篇文章我并不会介绍如何部署一个MySQL复制环境或keepalived+双主环境,因为此类安装搭建的文章已经很多,大家也很熟悉.在这篇 ...

  5. .NET 环境中使用RabbitMQ RabbitMQ与Redis队列对比 RabbitMQ入门与使用篇

    .NET 环境中使用RabbitMQ   在企业应用系统领域,会面对不同系统之间的通信.集成与整合,尤其当面临异构系统时,这种分布式的调用与通信变得越发重要.其次,系统中一般会有很多对实时性要求不高的 ...

  6. 5.7 并行复制配置 基于GTID 搭建中从 基于GTID的备份与恢复,同步中断处理

    5.7 并行复制配置 基于GTID 搭建中从 基于GTID的备份与恢复,同步中断处理 这个文章包含三个部分 1:gtid的多线程复制2:同步中断处理3:GTID的备份与恢复 下面文字相关的东西 大部分 ...

  7. 生产环境中的kubernetes 优先级与抢占

    kubernetes 中的抢占功能是调度器比较重要的feature,但是真正使用起来还是比较危险,否则很容易把低优先级的pod给无辜kill.为了提高GPU集群的资源利用率,决定勇于尝试一番该feat ...

  8. Redis 哨兵模式实现主从故障互切换

    200 ? "200px" : this.width)!important;} --> 介绍 Redis Sentinel 是一个分布式系统, 你可以在一个架构中运行多个 S ...

  9. 生产环境中nginx既做web服务又做反向代理

    一.写对于初入博客园的感想 众所周知,nginx是一个高性能的HTTP和反向代理服务器,在以前工作中要么实现http要么做反向代理或者负载均衡.尚未在同一台nginx或者集群上同时既实现HTTP又实现 ...

随机推荐

  1. ERP和C4C中的function location

    SAP ERP里的Functional Locations,下载到SAP Cloud for Customer后成为类型为'Functional Location'的Installation Poin ...

  2. UI5 Source code map机制的细节介绍

    在我的博客A debugging issue caused by source code mapping里我介绍了在我做SAP C4C开发时遇到的一个曾经困扰我很久的问题,最后结论是这个问题由于Jav ...

  3. selenium使用谷歌浏览器自带手机模拟器运行H5网页

    背景:最开始用手机模拟H5页面跑自动化,发现经常因为app连接或者网络原因等一系列情况,导致M版(H5页面)用例跑不通,想通过浏览器自带的手机模拟器运行,保证稳定性 浏览器自带的模拟器如下图: 代码实 ...

  4. POJ-2513 Colored Sticks---欧拉回路+并查集+字典树

    题目链接: https://vjudge.net/problem/POJ-2513 题目大意: 给一些木棍,两端都有颜色,只有两根对应的端点颜色相同才能相接,问能不能把它们接成一根木棍 解题思路: 题 ...

  5. HDU(1698),线段树区间更新

    题目链接:http://acm.hdu.edu.cn/showproblem.php?pid=1698 区间更新重点在于懒惰标记. 当你更新的区间就是整个区间的时候,直接sum[rt] = c*(r- ...

  6. 数黑格有多少个,模拟题,POJ(1656)

    题目链接:http://poj.org/problem?id=1656 #include <stdio.h> #include <iostream> #include < ...

  7. 大白话讲解BP算法(转载)

    最近在看深度学习的东西,一开始看的吴恩达的UFLDL教程,有中文版就直接看了,后来发现有些地方总是不是很明确,又去看英文版,然后又找了些资料看,才发现,中文版的译者在翻译的时候会对省略的公式推导过程进 ...

  8. 一篇RxJava友好的文章(三)

    组合操作符 继上一篇讲述了过滤操作符,这一篇讲述组合操作符,组合操作符可用于组合多个Observable.组合操作符相对于过滤操作符要复杂很多,也较难以理解,需要花费时间去看文档查资料,写demo才能 ...

  9. java mysql多次事务 模拟依据汇率转账,并存储转账信息 分层完成 dao层 service 层 client层 连接池使用C3p0 写入库使用DBUtils

    Jar包使用,及层的划分 c3p0-config.xml <?xml version="1.0" encoding="UTF-8"?> <c3 ...

  10. SVProgressHUD–比MBProgressHUD更好用的 iOS进度提示组件

    简介 SVProgressHUD是简单易用的显示器,用于指示一个持续进行的任务的进度. 项目主页: SVProgressHUD 最新示例: 点击下载 快速入门 安装 通过Cocoapods pod ' ...