一、mysql group replication 生来就要面对两个问题:

  一、主节点宕机如何恢复。

  二、多数节点离线的情况下、余下节点如何继续承载业务。

  在这里我们只讨论第一个问题、也就是说当主结点宕机之后、我们怎么把它从新加入到高可用集群中去。这个问题又可以细分成

  两种情况:

    1、温和打击:主结点的数据还在、宕机期间集群中的其它结点的binlog日志也都还在

          这种情况下重新启动mysql group replication 就可修复问题。

    2、毁灭打击:主结点的数据都没有了

          这种情况下要从其余结点备份恢复宕机结点、然后再重启mysql group replication 就可修复问题。

  详细的修复步骤请看后面的例子

二、环境介绍:

  环境简介

主机名         ip地址        mgr角色

mtls17        10.186.19.17      primary    

mtls18        10.186.19.18      seconde

mtls19        10.186.19.19      seconde

  集群状态:

mysql> select * from replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 12b6f8d9-d655-11e7-936a-9a17854b700d | mtls17 | 3306 | ONLINE |
| group_replication_applier | 12bfe200-d655-11e7-a264-1e1b3511358e | mtsl18 | 3306 | ONLINE |
| group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19 | 3306 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
3 rows in set (0.00 sec) mysql> show global status like 'group_replication_primary_member';
+----------------------------------+--------------------------------------+
| Variable_name | Value |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 12b6f8d9-d655-11e7-936a-9a17854b700d |
+----------------------------------+--------------------------------------+
1 row in set (0.00 sec)

  说明:

  由上面的信息可以看出mtls17上的mysql为集群当前的primary结点、并且集群的各结点的状态正常。

三、情况下的故障模拟 + 解决:

  1、模拟mtls17结点宕机

ps -ef | grep mysql
mysql : ? :: /usr/local/mysql/bin/mysqld --defaults-file=/etc/my.cnf
root : pts/ :: grep --color=auto mysql
[root@mtls17 data]# kill -
[root@mtls17 data]# ps -ef | grep mysql
root : pts/ :: grep --color=auto mysql

  

  2、查看余下两个结点的情况

mysql> melect * from replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 12bfe200-d655-11e7-a264-1e1b3511358e | mtsl18 | 3306 | ONLINE |
| group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19 | 3306 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
2 rows in set (0.00 sec) mysql> show global status like 'group_replication_primary_member';
+----------------------------------+--------------------------------------+
| Variable_name | Value |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 12bfe200-d655-11e7-a264-1e1b3511358e |
+----------------------------------+--------------------------------------+
1 row in set (0.00 sec)

  由上面可以看出在mtls17结点上的mysql被kill掉之后、余下的两个结点组成了新的集群、并且mtls18上的mysql

  成为了primary

  

  3、解决primary宕机恢复的问题

systemctl start mysql
[root@mtls17 data]# mysql -uroot -pmtls0352
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is
Server version: 5.7.-log MySQL Community Server (GPL) Copyright (c) , , Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> start group_replication;
Query OK, rows affected (4.03 sec) mysql>

  4、检查问题是否正常解决

select * from replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 12b6f8d9-d655-11e7-936a-9a17854b700d | mtls17 | 3306 | ONLINE |
| group_replication_applier | 12bfe200-d655-11e7-a264-1e1b3511358e | mtsl18 | 3306 | ONLINE |
| group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19 | 3306 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
3 rows in set (0.00 sec) mysql> show global status like 'group_replication_primary_member';
+----------------------------------+--------------------------------------+
| Variable_name | Value |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 12bfe200-d655-11e7-a264-1e1b3511358e |
+----------------------------------+--------------------------------------+
1 row in set (0.00 sec)

  总论:之前的主结点在宕机之后、通过重启服务、重启mysql-group-replication成功的解决了问题。

四、模拟primary结点上的数据已经丢失的情况下、如果恢复结点:

  1、退出服务、删除数据

[root@mtsl18 ~]# ps -ef | grep mysql
mysql : ? :: /usr/local/mysql/bin/mysqld --defaults-file=/etc/my.cnf
root : pts/ :: grep --color=auto mysql
[root@mtsl18 ~]# kill -
[root@mtsl18 ~]# rm -rf /database/mysql/data/
[root@mtsl18 ~]# ps -ef | grep mysql
root : pts/ :: grep --color=auto mysql

  这个实验是接着情况一做下去的、所以primary在mtls18上、所以我们在mtls18上做退出服务、删除数据的动作

  2、查看集群的状态:

mysql> select * from replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 12b6f8d9-d655-11e7-936a-9a17854b700d | mtls17 | 3306 | ONLINE |
| group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19 | 3306 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
2 rows in set (0.00 sec) mysql> show global status like 'group_replication_primary_member';
+----------------------------------+--------------------------------------+
| Variable_name | Value |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 12b6f8d9-d655-11e7-936a-9a17854b700d |
+----------------------------------+--------------------------------------+
1 row in set (0.01 sec)

  说明:当mtls18宕机后primary就从mtls18切到了mtls17上去了

  3、通过meb备份mtls19用于还原宕机的mtls18

mysqlbackup --defaults-file=/etc/my.cnf --with-timestamp \
--host=localhost --user=root --password=mtls0352 \
--backup-dir=/tmp/ --backup-image=/tmp/2017-12-01T12:30:00.mbi --no-history-logging \
backup-to-image MySQL Enterprise Backup version 4.1. Linux-2.6.-400.215..el5uek-x86_64 [//]
Copyright (c) , , Oracle and/or its affiliates. All Rights Reserved. :: MAIN INFO: A thread created with Id ''
:: MAIN INFO: Starting with following command line ...
mysqlbackup --defaults-file=/etc/my.cnf --with-timestamp --host=localhost
--user=root --password=xxxxxxxx --backup-dir=/tmp/
--backup-image=/tmp/--01T12::.mbi --no-history-logging
backup-to-image :: MAIN INFO:
:: MAIN INFO: MySQL server version is '5.7.20-log'
.......
........
:: MAIN INFO: Full Image Backup operation completed successfully.
:: MAIN INFO: Backup image created successfully.
:: MAIN INFO: Image Path = /tmp/--01T12::.mbi
:: MAIN INFO: MySQL binlog position: filename mysql-bin., position -------------------------------------------------------------
Parameters Summary
-------------------------------------------------------------
Start LSN :
End LSN :
------------------------------------------------------------- mysqlbackup completed OK!

  4、传输备份到mtls18

scp /tmp/--01T12::.mbi mtls18:/tmp/

  5、还原备份

mysqlbackup --defaults-file=/etc/my.cnf --backup-image=/tmp/2017-12-01T12:30:00.mbi \
> --backup-dir=/tmp/ --datadir=/database/mysql/data/3306/ \
> copy-back-and-apply-log
MySQL Enterprise Backup version 4.1. Linux-2.6.-400.215..el5uek-x86_64 [//]
Copyright (c) , , Oracle and/or its affiliates. All Rights Reserved. :: MAIN INFO: A thread created with Id ''
:: MAIN INFO: Starting with following command line ...
mysqlbackup --defaults-file=/etc/my.cnf
--backup-image=/tmp/--01T12::.mbi --backup-dir=/tmp/
--datadir=/database/mysql/data// copy-back-and-apply-log :: MAIN INFO:
IMPORTANT: Please check that mysqlbackup run completes successfully.
.....
.....
:: PCR1 INFO: The first data file is '/database/mysql/data/3306/ibdata1'
and the new created log files are at '/database/mysql/data/3306/'
:: MAIN INFO: MySQL server version is '5.7.20-log'
:: MAIN INFO: Restoring ...5.7.-log version
:: MAIN INFO: Apply-log operation completed successfully.
:: MAIN INFO: Full Backup has been restored successfully. mysqlbackup completed OK!

  6、重启mtls18上的mysql

[root@mtsl18 tmp]# chown -R mysql:mysql /database/mysql/data/
[root@mtsl18 tmp]# systemctl start mysql
[root@mtsl18 tmp]# ps -ef | grep mysql
mysql : ? :: /usr/local/mysql/bin/mysqld --defaults-file=/etc/my.cnf
root : pts/ :: grep --color=auto mysql

  7、重启mysql group replication

mysql -uroot -pmtls0352
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 5.7.20-log MySQL Community Server (GPL) Copyright (c) 2000, 2017, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> reset master;
Query OK, 0 rows affected (0.10 sec) mysql> reset slave;
Query OK, 0 rows affected (0.00 sec) mysql> set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec) mysql> source /database/mysql/data/3306/backup_gtid_executed.sql ;
Query OK, 0 rows affected (0.10 sec) mysql> set sql_log_bin=1;
Query OK, 0 rows affected (0.00 sec) mysql> change master to
-> master_user='mgr_usr',
-> master_password='mgr10352'
-> for channel 'group_replication_recovery';
Query OK, 0 rows affected, 2 warnings (0.21 sec) mysql> start group_replication;
Query OK, 0 rows affected (3.46 sec)

  8、检查集群的状态是否正常

mysql> select * from replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 12b6f8d9-d655-11e7-936a-9a17854b700d | mtls17 | 3306 | ONLINE |
| group_replication_applier | 1453bcac-d655-11e7-a503-8a7c439b72d9 | mtls19 | 3306 | ONLINE |
| group_replication_applier | 85f82fce-d65e-11e7-9e92-1e1b3511358e | mtsl18 | 3306 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
3 rows in set (0.01 sec) mysql> show global status like 'group_replication_primary_member';
+----------------------------------+--------------------------------------+
| Variable_name | Value |
+----------------------------------+--------------------------------------+
| group_replication_primary_member | 12b6f8d9-d655-11e7-936a-9a17854b700d |
+----------------------------------+--------------------------------------+
1 row in set (0.01 sec)

五、总结:

  对于两种primary宕故障的修复总结:

    1、数据没有丢、binlog日志也没有丢 那直接重启mysql group replication 就行、它会自动修复问题。

    2、数据丢失的情况、先备份还原-->重启mysql group replication 就行。

  对于mysql group replication 维护操作复杂性的总结:  

    总的来说mysql group replication 对dba还是比较友好的、几个小小的操作就能恢复故障的集群。

六、我写的关于mysql group replication 的相关文章 

  1、mysql group replication 安装与配置详解:http://www.cnblogs.com/JiangLe/p/6727281.html#3849996

  2、mysql group replication 在mysql-5.7.20版本下的可用性报告:http://www.cnblogs.com/JiangLe/p/7809229.html

  3、mysql group replication 主节宕机点恢复 https://i.cnblogs.com/EditPosts.aspx?postid=7941929

  4、mysql group replication 多数据结点丢失情况下的恢复

  5、我写的全自动化安装mysql-group-replication 开源工具 https://github.com/Neeky/mysqltools

----

mysql group replication 主节点宕机恢复的更多相关文章

  1. CDH集群主节点宕机恢复

    1       情况概述 公司的开发集群在周末莫名其妙的主节点Hadoop-1的启动固态盘挂了,由于CM.HDFS的NameNode.HBase的Master都安装在Hadoop-1,导致了整个集群都 ...

  2. Mysql 5.7 基于组复制(MySQL Group Replication) - 运维小结

    之前介绍了Mysq主从同步的异步复制(默认模式).半同步复制.基于GTID复制.基于组提交和并行复制 (解决同步延迟),下面简单说下Mysql基于组复制(MySQL Group Replication ...

  3. Mysql Group Replication 简介及单主模式组复制配置【转】

    一 Mysql Group Replication简介    Mysql Group Replication(MGR)是一个全新的高可用和高扩张的MySQL集群服务.    高一致性,基于原生复制及p ...

  4. MySQL Group Replication 介绍

    2016-12-12,一个重要的日子,mysql5.7.17 GA版发布,正式推出Group Replication(组复制) 插件,通过这个插件增强了MySQL原有的高可用方案(原有的Replica ...

  5. 使用ProxySQL实现MySQL Group Replication的故障转移、读写分离(一)

    导读: 在之前,我们搭建了MySQL组复制集群环境,MySQL组复制集群环境解决了MySQL集群内部的自动故障转移,但是,组复制并没有解决外部业务的故障转移.举个例子,在A.B.C 3台机器上搭建了组 ...

  6. MySQL group replication介绍

    “MySQL group replication” group replication是MySQL官方开发的一个开源插件,是实现MySQL高可用集群的一个工具.第一个GA版本正式发布于MySQL5.7 ...

  7. mysql group replication观点及实践

    一:个人看法 Mysql  Group Replication  随着5.7发布3年了.作为技术爱好者.mgr 是继 oracle database rac 之后. 又一个“真正” 的群集,怎么做到“ ...

  8. MySQL Group Replication配置

    MySQL Group Replication简述 MySQL 组复制实现了基于复制协议的多主更新(单主模式). 复制组由多个 server成员构成,并且组中的每个 server 成员可以独立地执行事 ...

  9. MySQL Group Replication 技术点

    mysql group replication,组复制,提供了多写(multi-master update)的特性,增强了原有的mysql的高可用架构.mysql group replication基 ...

随机推荐

  1. 在TQ2440开发板上ping 127.0.0.1不通

    问题:在TQ2440上ping 127.0.0.1,提示错误 ping: sendto: Network is unreachable   解决方法:ifconfig lo 127.0.0.1 up ...

  2. hadoop old API CombineFileInputFormat

    来自:http://f.dataguru.cn/thread-271645-1-1.html 简介 本文主要介绍下面4个方面 1.为什么要使用CombineFileInputFormat 2.Comb ...

  3. 远程IPC种植木马

    要实现代码例如以下: ///////////////////////////////////////////////////////////////////////////////////// typ ...

  4. RHEL/CentOS/Fedora各种源

    CentOS 默认自带 CentOS-Base.repo 源, 但官方源中去除了很多有版权争议的软件, 而且安装的软件也不是最新的稳定版. Fedora 自带的源中也找不到很多多媒体软件, 如果需要安 ...

  5. ViewPager实现无限循环

    引言 这两天在项目里实用到ViewPager来做广告运营位展示.看到如今非常多APP的广告运营位都是无限循环的,所以就研究了一下这个功能的实现. 先看看效果 从一个方向上一直滑动.么有滑到尽头的感觉. ...

  6. V-rep学习笔记:机器人模型创建2—添加关节

    下面接着之前经过简化并调整好视觉效果的模型继续工作流,为了使模型能受控制运动起来必须在合适的位置上添加相应的运动副/关节.一般情况下我们可以查阅手册或根据设计图纸获得这些关节的准确位置和姿态,知道这些 ...

  7. grep命令-v参数过滤以井号、分号开头的注释信息行及空白行

    grep命令-v参数(反向选择)分别去掉所有以#(井号)和;(分号)开头的注释信息行,对于剩余的空白行可以再用^$来表示并反选过滤 [root@rhel7 samba]# cat smb.conf | ...

  8. FTP下载工具

    开源的FTP下载工具,FTP搬运工.... 01.FileZilla_3.21.0_win64 官方地址: https://filezilla-project.org/ 下载地址: http://pa ...

  9. iOS中CGRectDividede中布局用法

    - (void)viewDidLoad { [super viewDidLoad]; // Do any additional setup after loading the view, typica ...

  10. Linux-使用 yum 升级 gcc 到 4.8

    wget http://people.centos.org/tru/devtools-2/devtools-2.repo mv devtools-2.repo /etc/yum.repos.d yum ...