Preface
 
    In my last test of pt-heartbeat,both of master and slave were out of disk.And the mysql client was hang.In order to resolve the issue,I've tryed to fix the replicaiton environment without using mysqldump to reconfigure the slave.Let's see the details.
 
Procedure
 
I dropped test tables in database "sysbench" to release the disk space on master.
 [root@zlm2 :: /data/mysql/mysql3306/logs]
#sysbench oltp_read_write.lua --mysql-host=192.168.1.101 --mysql-port= --mysql-user=zlm --mysql-password=zlmzlm --mysql-db=sysbench --tables= --table-size= --mysql-storage-engine=innodb cleanup
sysbench 1.0. (using bundled LuaJIT 2.1.-beta2) Dropping table 'sbtest1'... (zlm@192.168.1.101 )[(none)]>use sysbench;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A Database changed
(zlm@192.168.1.101 )[sysbench]>show tables;
+--------------------+
| Tables_in_sysbench |
+--------------------+
| hb |
| sbtest2 |
| sbtest3 |
| sbtest4 |
| sbtest5 |
+--------------------+
rows in set (0.00 sec) //Only sbtest1 was deleted.It's not enough. [root@zlm2 :: ~/sysbench-1.0/src/lua]
#sysbench oltp_read_write.lua --mysql-host=192.168.1.101 --mysql-port= --mysql-user=zlm --mysql-password=zlmzlm --mysql-db=sysbench --tables= --table-size= --mysql-storage-engine=innodb cleanup
sysbench 1.0. (using bundled LuaJIT 2.1.-beta2) Dropping table 'sbtest1'...
Dropping table 'sbtest2'...
Dropping table 'sbtest3'...
Dropping table 'sbtest4'...
Dropping table 'sbtest5'... [root@zlm2 :: ~/sysbench-1.0/src/lua]
#df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-root .4G .9G .5G % / //I'd got 27% free space.
devtmpfs 488M 488M % /dev
tmpfs 497M 497M % /dev/shm
tmpfs 497M 6.6M 491M % /run
tmpfs 497M 497M % /sys/fs/cgroup
/dev/sda1 497M 118M 379M % /boot
none 87G 80G .6G % /vagrant (zlm@192.168.1.101 )[(none)]>drop database sysbench;
Query OK, row affected (0.04 sec) //Further more,I dropped the "sysbench".
The slave hung still and disk space was full.
 [root@zlm3 :: ~]
#df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-root .4G .4G 20K % /
devtmpfs 488M 488M % /dev
tmpfs 497M 497M % /dev/shm
tmpfs 497M 6.5M 491M % /run
tmpfs 497M 497M % /sys/fs/cgroup
/dev/sda1 497M 118M 379M % /boot
none 87G 80G .6G % /vagrant (zlm@192.168.1.102 )[(none)]>show slave status\G
^C^C -- query aborted ^Z
[]+ Stopped mysql [root@zlm3 :: ~]
#pkill mysqld [root@zlm3 :: ~]
#./mysqld.sh [root@zlm3 :: ~]
#mysql
ERROR (HY000): Can't connect to MySQL server on '192.168.1.102' (111) [root@zlm3 :: ~]
#cd /data/mysql/mysql3306/data [root@zlm3 :: /data/mysql/mysql3306/data]
#cat error.log |tail -n
--19T08::02.581937+: [Note] InnoDB: Log scan progressed past the checkpoint lsn
--19T08::02.581958+: [Note] InnoDB: Doing recovery: scanned up to log sequence number
--19T08::02.581963+: [Note] InnoDB: Database was not shutdown normally!
--19T08::02.581965+: [Note] InnoDB: Starting crash recovery.
--19T08::02.696292+: [Note] InnoDB: Transaction was in the XA prepared state.
--19T08::02.700688+: [Note] InnoDB: Transaction was in the XA prepared state.
--19T08::02.700814+: [Note] InnoDB: transaction(s) which must be rolled back or cleaned up in total row operations to undo
--19T08::02.700821+: [Note] InnoDB: Trx id counter is
--19T08::02.701719+: [Note] InnoDB: Last MySQL binlog file position , file name mysql-bin.
--19T08::02.805965+: [Note] InnoDB: Ignoring tablespace `zlm`.`sbtest2` because the DISCARD flag is set .
--19T08::02.806462+: [Note] InnoDB: Creating shared tablespace for temporary tables
--19T08::02.807316+: [Note] InnoDB: Setting file './ibtmp1' size to MB. Physically writing the file full; Please wait ...
--19T08::02.807568+: [Note] InnoDB: Starting in background the rollback of uncommitted transactions
--19T08::02.807594+: [Note] InnoDB: Rollback of non-prepared transactions completed
--19T08::02.871396+: [Warning] InnoDB: Retry attempts for writing partial data failed.
--19T08::02.871423+: [ERROR] InnoDB: Write to file ./ibtmp1failed at offset , bytes should have been written, only were written. Operating system error number . Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.
--19T08::02.871441+: [ERROR] InnoDB: Error number means 'No space left on device'
--19T08::02.871446+: [Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html
--19T08::02.871451+: [ERROR] InnoDB: Could not set the file size of './ibtmp1'. Probably out of disk space
--19T08::02.871456+: [ERROR] InnoDB: Unable to create the shared innodb_temporary
--19T08::02.871459+: [ERROR] InnoDB: Plugin initialization aborted with error Generic error
--19T08::03.273011+: [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
--19T08::03.273029+: [ERROR] Plugin 'InnoDB' init function returned error.
--19T08::03.273033+: [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
--19T08::03.273037+: [ERROR] Failed to initialize builtin plugins.
--19T08::03.273040+: [ERROR] Aborting --19T08::03.273046+: [Note] Binlog end
--19T08::03.273389+: [Note] mysqld: Shutdown complete //The mysqld process could not run again because of no free disk space.
I decided to drop all the binlogs on slave to release the disk space.
 [root@zlm3 :: /data/mysql/mysql3306]
#cd logs [root@zlm3 :: /data/mysql/mysql3306/logs]
#ls -l
total
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.
-rw-r----- mysql mysql Jul : mysql-bin.index [root@zlm3 :: /data/mysql/mysql3306/logs]
#rm -f * [root@zlm3 :: /data/mysql/mysql3306/logs]
#ls -l
total [root@zlm3 :: ~]
#df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-root .4G .5G .0G % / //The free disk space had been reduced to 47%.
devtmpfs 488M 488M % /dev
tmpfs 497M 497M % /dev/shm
tmpfs 497M 6.5M 491M % /run
tmpfs 497M 497M % /sys/fs/cgroup
/dev/sda1 497M 118M 379M % /boot
none 87G 80G .6G % /vagrant
Ran the mysqld again and dropped the database "sysbench" on slave.
 [root@zlm3 :: /data/mysql/mysql3306/logs]
#sh /root/mysqld.sh [root@zlm3 :: /data/mysql/mysql3306/logs]
#ps aux|grep mysqld
mysql 7.0 17.8 pts/ Sl : : mysqld --defaults-file=/data/mysql/mysql3306/my.cnf
root 0.0 0.0 pts/ R+ : : grep --color=auto mysqld [root@zlm3 :: /data/mysql/mysql3306/logs]
#mysql
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is
Server version: 5.7.-log MySQL Community Server (GPL) Copyright (c) , , Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. (zlm@192.168.1.102 )[(none)]>drop database sysbench;
Query OK, rows affected (0.11 sec)

Started the replication threads of slave.

 (zlm@192.168.1.102 )[(none)]>start slave;
Query OK, rows affected (0.00 sec) (zlm@192.168.1.102 )[(none)]>show slave status\G
*************************** . row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.1.101
Master_User: repl
Master_Port:
Connect_Retry:
Master_Log_File: mysql-bin.
Read_Master_Log_Pos:
Relay_Log_File: relay-bin.
Relay_Log_Pos:
Relay_Master_Log_File: mysql-bin.
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno:
Last_Error: Error executing row event: 'Table 'sysbench.sbtest1' doesn't exist'
Skip_Counter:
Exec_Master_Log_Pos:
Relay_Log_Space:
Until_Condition: None
Until_Log_File:
Until_Log_Pos:
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno:
Last_IO_Error:
Last_SQL_Errno:
Last_SQL_Error: Error executing row event: 'Table 'sysbench.sbtest1' doesn't exist' //Since the database had been droppted.This error was notable.
Replicate_Ignore_Server_Ids:
Master_Server_Id:
Master_UUID: 1b7181ee-6eaf-11e8-998e-080027de0e0e
Master_Info_File: mysql.slave_master_info
SQL_Delay:
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count:
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp: ::
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:- //It was stuck on transaction 3714549(which contained error).
Executed_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:-,
5c77c31b-4add-11e8-81e2-080027de0e0e:-
Auto_Position:
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
row in set (0.00 sec) [root@zlm3 :: ~]
#perror
MySQL error code (ER_NO_SUCH_TABLE): Table '%-.192s.%-.192s' doesn't exist //Error 1146 indicated the absence of table "sbtest1" in "sysbench" database.
//Obviously,the slave was replaying the operations relevant to this table on master.The table even the database had been dropped.
//How could I do next step?Do I have to generate a new mysqldump file and reconfigure the slave again?
//There's One thing I'm rather sure that there were no other transactions generated in the whole course except the operations on "sysbench" database.
//Since I'd drop "sysbentch" database on both master and slave.Maybe I can fix the issue easily.

Checked the Executed_Gtid_Set on master.

 (zlm@192.168.1.101 )[(none)]>show master status;
+------------------+-----------+--------------+------------------+------------------------------------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+-----------+--------------+------------------+------------------------------------------------+
| mysql-bin. | | | | 1b7181ee-6eaf-11e8-998e-080027de0e0e:- |
+------------------+-----------+--------------+------------------+------------------------------------------------+
row in set (0.00 sec) //The executed gtid was upto "3730021".

Tryed to fix the replica of master.

 (zlm@192.168.1.102 )[(none)]>show slave status\G
*************************** . row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.1.101
Master_User: repl
Master_Port:
Connect_Retry:
Master_Log_File: mysql-bin.
Read_Master_Log_Pos:
Relay_Log_File: relay-bin.
Relay_Log_Pos:
Relay_Master_Log_File: mysql-bin.
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno:
Last_Error: Error executing row event: 'Table 'sysbench.sbtest1' doesn't exist'
Skip_Counter:
Exec_Master_Log_Pos:
Relay_Log_Space:
Until_Condition: None
Until_Log_File:
Until_Log_Pos:
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno:
Last_IO_Error:
Last_SQL_Errno:
Last_SQL_Error: Error executing row event: 'Table 'sysbench.sbtest1' doesn't exist'
Replicate_Ignore_Server_Ids:
Master_Server_Id:
Master_UUID: 1b7181ee-6eaf-11e8-998e-080027de0e0e
Master_Info_File: mysql.slave_master_info
SQL_Delay:
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count:
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp: ::
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:-
Executed_Gtid_Set:
Auto_Position:
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
row in set (0.00 sec) (zlm@192.168.1.102 )[(none)]>reset master;
Query OK, rows affected (0.02 sec) (zlm@192.168.1.102 )[(none)]>set @@global.gtid_purged='1b7181ee-6eaf-11e8-998e-080027de0e0e:1-3730021';
Query OK, rows affected (0.00 sec) //On account of surely knowing there were no other transactions at all.I set the "gtid_purged" variable to the value of "gtid_executed" on master.
//It means I guised that all the transactions generated on master had been replayed on slave already.The slave could retrieve new GTID at the moment. (zlm@192.168.1.102 )[(none)]>start slave sql_thread;
Query OK, rows affected (0.02 sec) (zlm@192.168.1.102 )[(none)]>show slave status\G
*************************** . row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.1.101
Master_User: repl
Master_Port:
Connect_Retry:
Master_Log_File: mysql-bin.
Read_Master_Log_Pos:
Relay_Log_File: relay-bin.
Relay_Log_Pos:
Relay_Master_Log_File: mysql-bin.
Slave_IO_Running: Yes
Slave_SQL_Running: Yes //The sql_thread became "Yes".
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno:
Last_Error:
Skip_Counter:
Exec_Master_Log_Pos:
Relay_Log_Space:
Until_Condition: None
Until_Log_File:
Until_Log_Pos:
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master:
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno:
Last_IO_Error:
Last_SQL_Errno:
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id:
Master_UUID: 1b7181ee-6eaf-11e8-998e-080027de0e0e
Master_Info_File: mysql.slave_master_info
SQL_Delay:
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Master_Retry_Count:
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:-
Executed_Gtid_Set: 1b7181ee-6eaf-11e8-998e-080027de0e0e:- //The slave had skipped those GTID(which contained error 1146) of master and waited for newer GTID.The replica had been fixed up.
Auto_Position:
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
row in set (0.00 sec)

Summary

  • The variable "gtid_purged" cannot be set if "gtid_executed" is not empty.
  • Caution,"reset master" can only be used on slave.Keep in mind that don't do it on master anytime.
  • This case can be followed only in test environment 'cause you cannot guarantee whether all the transactions are really replayed on slave.

GTID环境中手动修复主从故障一例(Error 1146)的更多相关文章

  1. GTID环境中手动修复主从故障一例(Error 1236/Error 1396)

      Preface       I got an replication error 1236 when I modified the password of a user without start ...

  2. SqlServer 禁止架构更改的复制中手动修复使发布和订阅中分别增加的字段同步

    原文:SqlServer 禁止架构更改的复制中手动修复使发布和订阅中分别增加的字段同步 由于之前的需要,禁止了复制架构更改,以至在发布中添加一个字段,并不会同步到订阅中,而现在又在订阅中添加了一个同名 ...

  3. 企业运维 | MySQL关系型数据库在Docker与Kubernetes容器环境中快速搭建部署主从实践

    [点击 关注「 WeiyiGeek」公众号 ] 设为「️ 星标」每天带你玩转网络安全运维.应用开发.物联网IOT学习! 希望各位看友[关注.点赞.评论.收藏.投币],助力每一个梦想. 本章目录 目录 ...

  4. 业务零影响!如何在Online环境中巧用MySQL传统复制技术【转】

    业务零影响!如何在Online环境中巧用MySQL传统复制技术 这篇文章我并不会介绍如何部署一个MySQL复制环境或keepalived+双主环境,因为此类安装搭建的文章已经很多,大家也很熟悉.在这篇 ...

  5. .NET 环境中使用RabbitMQ RabbitMQ与Redis队列对比 RabbitMQ入门与使用篇

    .NET 环境中使用RabbitMQ   在企业应用系统领域,会面对不同系统之间的通信.集成与整合,尤其当面临异构系统时,这种分布式的调用与通信变得越发重要.其次,系统中一般会有很多对实时性要求不高的 ...

  6. 5.7 并行复制配置 基于GTID 搭建中从 基于GTID的备份与恢复,同步中断处理

    5.7 并行复制配置 基于GTID 搭建中从 基于GTID的备份与恢复,同步中断处理 这个文章包含三个部分 1:gtid的多线程复制2:同步中断处理3:GTID的备份与恢复 下面文字相关的东西 大部分 ...

  7. 生产环境中的kubernetes 优先级与抢占

    kubernetes 中的抢占功能是调度器比较重要的feature,但是真正使用起来还是比较危险,否则很容易把低优先级的pod给无辜kill.为了提高GPU集群的资源利用率,决定勇于尝试一番该feat ...

  8. Redis 哨兵模式实现主从故障互切换

    200 ? "200px" : this.width)!important;} --> 介绍 Redis Sentinel 是一个分布式系统, 你可以在一个架构中运行多个 S ...

  9. 生产环境中nginx既做web服务又做反向代理

    一.写对于初入博客园的感想 众所周知,nginx是一个高性能的HTTP和反向代理服务器,在以前工作中要么实现http要么做反向代理或者负载均衡.尚未在同一台nginx或者集群上同时既实现HTTP又实现 ...

随机推荐

  1. 【转载】#437 - Access Interface Members through an Interface Variable

    Onece a class implementation a particular interface, you can interact with the members of the interf ...

  2. CSS:响应式下的折叠菜单(条纹式)

    原文:CSS: Responsive Navigation Menu 译文:CSS:响应式导航菜单 译者:dwqs 写在之前,关于如何制作响应式的下拉菜单:响应式下的下拉菜单 之前,我写了一篇关于怎么 ...

  3. 使用browsermob代理出现错误java.lang.NoClassDefFoundError: org/littleshoot/proxy/HttpFiltersSource

    使用browsermob代理做埋点数据,maven配置的包如下 <dependency> <groupId>net.lightbody.bmp</groupId> ...

  4. jmeter参数化读取数据进行多次运行

    jmeter参数化数据,可以使用csv,还可以使用数据库的方式 1.使用csv读取数据 在线程组中,配置原件中,选择csv data set config 1.本地创建了16个数据,存为test.tx ...

  5. Uva 11806 拉拉队

    题目链接:https://uva.onlinejudge.org/external/118/11806.pdf 题意: n行m列的矩阵上放k个棋子,其中要求第一行,最后一行,第一列,最后一列必须要有. ...

  6. [pytorch] 官网教程+注释

    pytorch官网教程+注释 Classifier import torch import torchvision import torchvision.transforms as transform ...

  7. 0001-BUGIFX-Magento-Zend-Framework-1-PHP5.6.patch

    It is from the full Github-Gist: Bugfix for Zend Framework 1 in Magento (>= 1.7..) + PHP 5.6 http ...

  8. activity 工作流学习(一)

    一.了解工作流 1.工作流(Workflow),就是“业务过程的部分或整体在计算机应用环境下的自动化”,它主要解决的是“使在多个参与者之间按照某种预定义的规则传递文档.信息或任务的过程自动进行,从而实 ...

  9. JS中如何得到触发事件的属性?

    <html xmlns="http://www.w3.org/1999/xhtml"><head runat="server">    ...

  10. System.Web.UI.Page

    mdsn:点击查看此类介绍 git:   点击查看封装方法   消息弹框,消息弹框跳转,自定义脚本信息 定义:表示一个从托管 ASP.NET Web 应用程序的服务器请求的 .aspx 文件(也称为 ...