Auth: Jin
Date: 20140414

1.master-slave同步问题
1)故障描述和错误代码:
监控报警slave故障
登录slave服务器查看
mysql> show slave status\G
发现有Errno
详细的记录在另外电脑,电脑硬盘坏了,没有具体的报错了
2)排查步骤:
(1)查看错误日志
tail -n 100 /var/log/mysql/mysql_7.error
没有发现问题
cat /mysql/data7/master.info
18
mysql-bin.000106
847011590
查看master信息
# cat /mysql/data7/relay-log.info
binlogs/mysql-relay-bin.000022
846786796
mysql-bin.000106
846786650
8
从master读取到slave的信息
ll /mysql/data7/binlogs/
total 826996
-rw-rw----. 1 mysql mysql 199 Apr 8 17:39 mysql-relay-bin.000021
-rw-rw----. 1 mysql mysql 846828651 Apr 14 14:31 mysql-relay-bin.000022
-rw-rw----. 1 mysql mysql 62 Apr 8 17:39 mysql-relay-bin.index
tail -f /mysql/data7/binlogs/mysql-relay-bin.000022
都正常
初步判断IO线程正常
(2)仔细看slave status
mysql> show slave status\G
可以看到IO线程是正常的,SQL线程有问题
3)解决办法
重新按最后的position重新同步

4)总结:
(1)需要熟悉repliction的原理
(2)一般情况之需要看slave的status即可,info文件信息都读取进status了,根据status判断是IO的问题,还是SQL的问题

2,master当机,无法自动恢复问题
1)故障秒素和错误代码
监控报警master服务器宕了
登录mmm_control查看,发现writer切换到backup master上去了。原来的master处于正在恢复总
2)排查步骤
查看master服务-正常
slave status也OK,没有找到自动切换的原因
后查资料说一个bug,把检测时间修改为了10可避免,我设置的是60
3)解决办法
(1)手动把“出故障的”master为online
(2)将writer角色mv给master

Auth: Jin
Date: 20140505
Title: SQL注入攻击锁表,引起MMM slave无法复制

故障描述:slave复制不正常
root@JQ-pdt-host-3 ~]# mmm_control @3307 show
db2(192.168.201.2) slave/REPLICATION_DELAY. Roles:
db215(192.168.201.215) slave/ONLINE. Roles: reader(192.168.201.21), reader(192.168.201.22), reader(192.168.201.23)
db216(192.168.201.216) master/ONLINE. Roles: writer(192.168.201.20)
db217(192.168.201.217) slave/REPLICATION_DELAY. Roles:
db3(192.168.201.3) master/ONLINE. Roles:

[root@JQ-pdt-host-2 ~]# mysql -h 127.0.0.1 -P 3307 -u root -ppassword
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 2155233
Server version: 5.5.30-log MySQL Community Server (GPL)

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.201.216
Master_User: dbslave
Master_Port: 3307
Connect_Retry: 60
Master_Log_File: mysql-bin.000109
Read_Master_Log_Pos: 671221507
Relay_Log_File: mysql-relay-bin.000031
Relay_Log_Pos: 668578820
Relay_Master_Log_File: mysql-bin.000109
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB: mysql,test,performance_schema
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 668578674
Relay_Log_Space: 671221852
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 1503
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 33071

mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.201.216
Master_User: dbslave
Master_Port: 3307
Connect_Retry: 60
Master_Log_File: mysql-bin.000109
Read_Master_Log_Pos: 671759015
Relay_Log_File: mysql-relay-bin.000031
Relay_Log_Pos: 668578820
Relay_Master_Log_File: mysql-bin.000109
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB: mysql,test,performance_schema
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 668578674
Relay_Log_Space: 671759360
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 1724
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 33071
1 row in set (0.00 sec)

原因分析1:SQL线程阻塞了
Seconds_Behind_Master: 在增加
Exec_Master_Log_Pos: 668578674 没有变化
根据网上的资料,
http://blog.csdn.net/zbszhangbosen/article/details/8494921
Seconds_Behind_Master表示slave上SQL thread与IO thread之间的延迟,我们都知道在MySQL的复制环境中,slave先从master上将

binlog拉取到本地(通过IO thread),
然后通过SQLthread将binlog重放,而Seconds_Behind_Master表示本地relaylog中未被执行完的那部分的差值

原因分析2:查询slave没有将relaylog应用到数据库的原因

操作
1.mmm剔除db2
[root@JQ-pdt-host-3 ~]# mmm_control @3307 set_offline db2
无法关闭

2.查看错误日志
[root@JQ-pdt-host-217 ~]# tail -f /var/log/mysql/mysql_7.error
140501 19:32:05 [Warning] Slave SQL: Could not execute Update_rows event on table ask.ask_question; Can't find record

in 'ask_question', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000109,

end_log_pos 117120783, Error_code: 1032
140502 17:02:17 [Warning] Slave SQL: Could not execute Update_rows event on table ask.ask_question; Can't find record

in 'ask_question', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000109,

end_log_pos 233682298, Error_code: 1032
140502 17:55:03 [Warning] Slave SQL: Could not execute Update_rows event on table ask.ask_question; Can't find record

in 'ask_question', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000109,

end_log_pos 244489088, Error_code: 1032
140503 3:23:02 [Warning] Slave SQL: Could not execute Update_rows event on table ask.ask_question; Can't find record

in 'ask_question', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000109,

end_log_pos 308850228, Error_code: 1032
140503 4:15:10 [Warning] Slave SQL: Could not execute Update_rows event on table ask.ask_question; Can't find record

in 'ask_question', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000109,

end_log_pos 313199736, Error_code: 1032
140503 8:29:47 [Warning] Slave SQL: Could not execute Update_rows event on table ask.ask_question; Can't find record

in 'ask_question', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000109,

end_log_pos 332919503, Error_code: 1032
140503 18:47:15 [Warning] Slave SQL: Could not execute Update_rows event on table ask.ask_question; Can't find record

in 'ask_question', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000109,

end_log_pos 391309758, Error_code: 1032
140504 1:38:20 [Warning] Slave SQL: Could not execute Update_rows event on table ask.ask_question; Can't find record

in 'ask_question', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000109,

end_log_pos 431859737, Error_code: 1032
140505 9:30:08 [Warning] Slave SQL: Could not execute Update_rows event on table ask.ask_question; Can't find record

in 'ask_question', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000109,

end_log_pos 606849321, Error_code: 1032
140505 14:17:00 [Warning] Slave SQL: Could not execute Update_rows event on table ask.ask_question; Can't find record

in 'ask_question', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000109,

end_log_pos 638173055, Error_code: 1032
master
140328 9:32:19 [Note] Event Scheduler: Loaded 0 events
140328 9:32:19 [Note] /usr/local/mysql/bin/mysqld: ready for connections.
Version: '5.5.30-log' socket: '/mysql/data7/mysql.sock' port: 3307 MySQL Community Server (GPL)
140328 9:44:30 [Note] Error reading relay log event: slave SQL thread was killed
140328 9:44:30 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (

server_errno=2013)
140328 9:44:30 [Note] Slave I/O thread killed while reading event
140328 9:44:30 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.000153', position 496435278
140328 9:44:43 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000153' at position

496435278, relay log 'binlogs/mysql-relay-bin.000254' position: 959809
140328 9:44:43 [Note] Slave I/O thread: connected to master 'dbslave@192.168.201.3:3307',replication started in log

'mysql-bin.000153' at position 496435278
140328 15:04:47 [Note] Slave: received end packet from server, apparent master shutdown:
140328 15:04:47 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000153' at

position 532390356
140328 15:04:47 [ERROR] Slave I/O: error reconnecting to master 'dbslave@192.168.201.3:3307' - retry-time: 60

retries: 86400, Error_code: 2003
140328 15:07:47 [Note] Slave: connected to master 'dbslave@192.168.201.3:3307',replication resumed in log 'mysql-

bin.000153' at position 532390356
140328 15:44:47 [Note] Error reading relay log event: slave SQL thread was killed
140328 15:44:47 [ERROR] Error reading packet from server: Lost connection to MySQL server during query (

server_errno=2013)
140328 15:44:47 [Note] Slave I/O thread killed while reading event
140328 15:44:47 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.000154', position 4451889
140328 15:45:02 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000154' at position

4451889, relay log 'binlogs/mysql-relay-bin.000257' position: 253
140328 15:45:02 [Note] Slave I/O thread: connected to master 'dbslave@192.168.201.3:3307',replication started in log

'mysql-bin.000154' at position 4451889
报错信息都是比较老的

3.查看线程
mysql> show processlist |
| 2 | system user | | NULL | Connect | 2438 | Waiting for table level lock | NULL

|
| 30 | dbproxy | 192.168.201.218:36994 | ask | Query | 2431 | Waiting for table level lock |

SELECT q.question_id AS id, q.question_content AS content,a.answer_content,a.answer_isexpert,a.answe |
| 33 | dbproxy | 192.168.201.218:37000 | ask | Query | 2435 | Waiting for table level lock |

SELECT q.question_id AS id, q.question_content AS content,a.answer_content,a.answer_isexpert,a.answe |
| 37 | dbproxy | 192.168.201.243:55050 | ask | Query | 2420 | Waiting for table level lock |

SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 41 | dbproxy | 192.168.201.243:55053 | ask | Query | 2408 | Waiting for table level lock |

select count(q.question_id) as total from `ask_question` as q left join `ask_category` as c on q.que |
| 42 | dbproxy | 192.168.201.243:55059 | ask | Query | 2390 | Waiting for table level lock |

SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 46 | dbproxy | 192.168.201.243:55062 | ask | Query | 2437 | Waiting for table level lock |

SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 50 | dbproxy | 192.168.201.243:55065 | ask | Query | 2407 | Waiting for table level lock |

SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 51 | dbproxy | 192.168.201.243:55069 | ask | Query | 2393 | Waiting for table level lock |

SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 55 | dbproxy | 192.168.201.218:37714 | ask | Query | 2434 | Waiting for table level lock |

SELECT q.question_id AS id, q.question_content AS content,a.answer_content,a.answer_isexpert,a.answe |
| 82 | dbproxy | 192.168.201.243:55224 | ask | Query | 2425 | Waiting for table level lock |

select q.question_id,q.question_cate_id,q.question_title,q.question_subtitle,q.question_addtime from |
| 120 | dbproxy | 192.168.201.243:55455 | ask | Query | 2439 | User sleep |

select * from ask_question where question_id=''+(select(sleep(3)))+'' |
| 258 | dbproxy | 192.168.201.243:54612 | ask | Query | 2409 | Waiting for table level lock |

select * from ask_question where question_id=''+(select(sleep(3)))+'' |
| 259 | dbproxy | 192.168.201.243:54615 | ask | Query | 2436 | Waiting for table level lock |

SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 1269 | dbproxy | 192.168.201.243:39716 | ask | Query | 2412 | Waiting for table level lock |

SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 1270 | dbproxy | 192.168.201.243:39720 | ask | Query | 2416 | Waiting for table level lock |

SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 1610 | dbproxy | 192.168.201.243:59369 | ask | Query | 2435 | Waiting for table level lock |

SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 1614 | dbproxy | 192.168.201.243:59372 | ask | Query | 2414 | Waiting for table level lock |

select * from `ask_question` where question_id = '618401' |
| 2080 | dbproxy | 192.168.201.243:37535 | ask | Query | 2406 | Waiting for table level lock |

select count(q.question_id) as total from `ask_question` as q left join `ask_category` as c on q.que |
| 2081 | dbproxy | 192.168.201.243:37538 | ask | Query | 2407 | Waiting for table level lock |

SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 4866 | dbproxy | 192.168.201.243:37556 | ask | Query | 2436 | Waiting for table level lock |

SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 4867 | dbproxy | 192.168.201.243:37559 | ask | Query | 2430 | Waiting for table level lock |

SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 4868 | dbproxy | 192.168.201.243:37562 | ask | Query | 2426 | Waiting for table level lock |

SELECT COUNT(*) AS `numrows`
FROM (`ask_question`)
WHERE `question_uid` = '1112844184'
AND `question |
| 4869 | dbproxy | 192.168.201.243:37565 | ask | Query | 2398 | Waiting for table level lock |

SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 4870 | dbproxy | 192.168.201.243:37568 | ask | Query | 2435 | Waiting for table level lock |

select q.question_id tid,q.question_username author,q.question_uid authorid,q.question_title subject |
| 4871 | dbproxy | 192.168.201.243:37571 | ask | Query | 2413 | Waiting for table level lock |

SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 4872 | dbproxy | 192.168.201.243:37574 | ask | Query | 2415 | Waiting for table level lock |

SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 4873 | dbproxy | 192.168.201.243:37577 | ask | Query | 2404 | Waiting for table level lock |

SELECT COUNT(distinct q.question_id) AS `numrows` FROM (`ask_question` as q) LEFT JOIN `ask_answer` |
| 4874 | dbproxy | 192.168.201.243:43949 | ask | Query | 2421 | Waiting for table level lock |

SELECT COUNT(1) AS count FROM ask_question WHERE (question_uid=793 OR question_expert_id=43) AND que |
| 4876 | dbproxy | 192.168.201.243:43971 | ask | Query | 2419 | Waiting for table level lock |

SELECT COUNT(distinct q.question_id) AS `numrows` FROM (`ask_question` as q) LEFT JOIN `ask_answer` |
| 4877 | dbproxy | 192.168.201.243:43974 | ask | Query | 2400 | Waiting for table level lock |

select count(q.question_id) as total from `ask_question` as q left join `ask_category` as c on q.que |
| 4878 | dbproxy | 192.168.201.243:43977 | ask | Query | 2393 | Waiting for table level lock |

SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 9496 | dbproxy | 192.168.201.218:34009 | ask | Query | 2430 | Waiting for table level lock |

SELECT q.question_id AS id, q.question_content AS content,a.answer_content,a.answer_isexpert,a.answe |
| 15948 | dbproxy | 192.168.201.243:45947 | ask | Query | 2433 | Waiting for table level lock |

SELECT COUNT(*) AS `numrows`
FROM (`ask_question`)
WHERE `question_uid` = '43606'
AND `question_cate |
| 15949 | dbproxy | 192.168.201.243:45952 | ask | Query | 2438 | Waiting for table level lock |

select q.*,c.category_name,c.category_parent_id from ask_question as q left join ask_category as c o |
| 15950 | dbproxy | 192.168.201.243:45956 | ask | Query | 2425 | Waiting for table level lock |

SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 932995 | dbproxy | 192.168.201.243:33088 | ask | Query | 2401 | Waiting for table level lock |

select q.question_id,q.question_cate_id,q.question_title,q.question_subtitle,q.question_addtime from |
| 2055023 | dbproxy | 192.168.201.243:41700 | ask | Query | 2390 | Waiting for table level lock |

SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 2055027 | dbproxy | 192.168.201.243:41717 | ask | Query | 2392 | Waiting for table level lock |

SELECT COUNT(1) AS count FROM ask_question WHERE question_uid=745 |
| 2055028 | dbproxy | 192.168.201.243:41721 | ask | Query | 2392 | Waiting for table level lock |

SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 2055029 | dbproxy | 192.168.201.243:41724 | ask | Query | 2392 | Waiting for table level lock |

select q.question_id,q.question_cate_id,q.question_title,q.question_subtitle,q.question_addtime from |
| 2055030 | dbproxy | 192.168.201.243:41727 | ask | Query | 2390 | Waiting for table level lock |

SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 2055031 | dbproxy | 192.168.201.243:41730 | ask | Query | 2378 | Waiting for table level lock |

select * from ask_question where question_id=''+(select(sleep(3)))+'' |
| 2055032 | dbproxy | 192.168.201.243:41733 | ask | Query | 2389 | Waiting for table level lock |

select count(q.question_id) as total from `ask_question` as q left join `ask_category` as c on q.que |
| 2055033 | dbproxy | 192.168.201.243:41736 | ask | Query | 2378 | Waiting for table level lock |

SELECT COUNT(distinct q.question_id) AS `numrows` FROM (`ask_question` as q) LEFT JOIN `ask_answer` |
| 2055034 | dbproxy | 192.168.201.243:41739 | ask | Query | 2388 | Waiting for table level lock |

select count(q.question_id) as total from `ask_question` as q left join `ask_category` as c on q.que |
| 2055036 | dbproxy | 192.168.201.243:41742 | ask | Query | 2388 | Waiting for table level lock |

SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 2055039 | dbproxy | 192.168.201.243:41747 | ask | Query | 2383 | Waiting for table level lock |

select count(q.question_id) as total from `ask_question` as q left join `ask_category` as c on q.que |
| 2055040 | dbproxy | 192.168.201.243:41752 | ask | Query | 2381 | Waiting for table level lock |

SELECT COUNT(*) AS `numrows`
FROM (`ask_question` as q)
LEFT JOIN `ask_question_attention` as qa ON |
| 2055044 | dbproxy | 192.168.201.243:41764 | ask | Query | 2379 | Waiting for table level lock |

SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 2055047 | dbproxy | 192.168.201.243:41789 | ask | Query | 2377 | Waiting for table level lock |

SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 2055051 | dbproxy | 192.168.201.243:41792 | ask | Query | 2371 | Waiting for table level lock |

SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 2055052 | dbproxy | 192.168.201.243:41799 | ask | Query | 2376 | Waiting for table level lock |

SELECT COUNT(distinct q.question_id) AS `numrows` FROM (`ask_question` as q) LEFT JOIN `ask_answer` |
| 2055053 | dbproxy | 192.168.201.243:41804 | ask | Query | 2367 | Waiting for table level lock |

SELECT COUNT(distinct q.question_id) AS `numrows` FROM (`ask_question` as q) LEFT JOIN `ask_answer` |
| 2055054 | dbproxy | 192.168.201.243:41810 | ask | Query | 2373 | Waiting for table level lock |

SELECT q.*, c.category_name, c.category_parent_id, e.id as expertid, e.picpath FROM ask_question q L |
| 2055055 | dbproxy | 192.168.201.243:41819 | ask | Query | 2368 | Waiting for table level lock |

select * from `ask_question` where question_id = '626326' |
| 2055063 | dbproxy | 192.168.201.243:41843 | ask | Query | 2367 | Waiting for table level lock |

SELECT COUNT(distinct q.question_id) AS `numrows` FROM (`ask_question` as q) LEFT JOIN `ask_answer` |
| 2056431 | root | 127.0.0.1:60233 | NULL | Query | 0 | NULL | show

processlist |
+---------+-------------+-----------------------+------+---------+---------+----------------------------------

引起原因:锁表(全局表读锁)

4.kill线程
kill不掉

5.重启mysql
[root@JQ-pdt-host-2 ~]# mysqld_multi stop 7 --password=password
[root@JQ-pdt-host-2 ~]# mysqld_multi report 7
Reporting MySQL servers
MySQL server from group: mysqld7 is not running
[root@JQ-pdt-host-2 ~]# mysqld_multi start 7
[root@JQ-pdt-host-2 ~]# mysqld_multi report 7
Reporting MySQL servers
MySQL server from group: mysqld7 is running

6.确认复制
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.201.216
Master_User: dbslave
Master_Port: 3307
Connect_Retry: 60
Master_Log_File: mysql-bin.000109
Read_Master_Log_Pos: 674890825
Relay_Log_File: mysql-relay-bin.000033
Relay_Log_Pos: 66526
Relay_Master_Log_File: mysql-bin.000109
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB: mysql,test,performance_schema
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 674890825
Relay_Log_Space: 66682
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 33071
1 row in set (0.00 sec)
正常

7.确认mmm
[root@JQ-pdt-host-3 ~]# mmm_control @3307 show
db2(192.168.201.2) slave/ADMIN_OFFLINE. Roles:
db215(192.168.201.215) slave/ONLINE. Roles: reader(192.168.201.21), reader(192.168.201.22), reader(192.168.201.23)
db216(192.168.201.216) master/ONLINE. Roles: writer(192.168.201.20)
db217(192.168.201.217) slave/REPLICATION_DELAY. Roles:
db3(192.168.201.3) master/ONLINE. Roles:

[root@JQ-pdt-host-3 ~]# mmm_control @3307 set_online db2
OK: State of 'db2' changed to ONLINE. Now you can wait some time and check its new roles!
[root@JQ-pdt-host-3 ~]# mmm_control @3307 show
db2(192.168.201.2) slave/ONLINE. Roles: reader(192.168.201.22)
db215(192.168.201.215) slave/ONLINE. Roles: reader(192.168.201.21), reader(192.168.201.23)
db216(192.168.201.216) master/ONLINE. Roles: writer(192.168.201.20)
db217(192.168.201.217) slave/REPLICATION_DELAY. Roles:
db3(192.168.201.3) master/ONLINE. Roles:

8.另外一台场一样操(不用se offline)
[root@JQ-pdt-host-217 ~]# mysqld_multi stop 7 --password=password
[root@JQ-pdt-host-217 ~]# mysqld_multi report 7
Reporting MySQL servers
MySQL server from group: mysqld7 is not running
[root@JQ-pdt-host-217 ~]# mysqld_multi start 7
[root@JQ-pdt-host-3 ~]# mmm_control @3307 show
db2(192.168.201.2) slave/ONLINE. Roles: reader(192.168.201.22)
db215(192.168.201.215) slave/ONLINE. Roles: reader(192.168.201.23)
db216(192.168.201.216) master/ONLINE. Roles: writer(192.168.201.20)
db217(192.168.201.217) slave/ONLINE. Roles: reader(192.168.201.21)
db3(192.168.201.3) master/ONLINE. Roles:

9.联系开发 提交slow-log

在这个过程中后三台slave搞锁了

10.查询前端错误和访问日志
grep 'share.html' /var/log/nginx/ask_access.log
183.60.196.29 - - [05/May/2014:16:47:19 +0800] "GET /question/share.html?

question_id=2000000000%27%28%22%29%29%28%28%29%27%27 HTTP/1.1" 500 5 "-" "sqlmap/0.9 (http://sqlmap.sourceforge.net)"

183.60.196.29 - - [05/May/2014:16:47:19 +0800] "GET /question/share.html?question_id=2000000000%29%20AND

%20993%3D1067%20AND%20%289822%3D9822 HTTP/1.1" 500 1293 "-" "sqlmap/0.9 (http://sqlmap.sourceforge.net)"

183.60.196.29 - - [05/May/2014:16:47:19 +0800] "GET /question/share.html?question_id=2000000000%20AND%205341%3D5341

HTTP/1.1" 500 1293 "-" "sqlmap/0.9 (http://sqlmap.sourceforge.net)"
14.17.87.22 - - [05/May/2014:16:49:39 +0800] "GET /question/share.html?question_id=999999.9+union+all+select

+0x31303235343830303536%2C0x31303235343830303536%2C0x31303235343830303536%2C0x31303235343830303536%2C0x31303235343830

303536-- HTTP/1.1" 500 1293 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727) Havij"
14.17.87.22 - - [05/May/2014:16:49:45 +0800] "GET /question/share.html?question_id=2000000000%3B+if+

%281%3D1%29+waitfor+delay+%2700%3A00%3A01%27-- HTTP/1.1" 500 5 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT

5.1; SV1; .NET CLR 2.0.50727) Havij"

14.17.87.22 - - [05/May/2014:16:49:45 +0800] "GET /question/share.html?question_id=2000000000+and+if

%281%3D1%2CBENCHMARK%28218400%2CMD5%280x41%29%29%2C0%29 HTTP/1.1" 500 1293 "-" "Mozilla/4.0 (compatible; MSIE 7.0;

Windows NT 5.1; SV1; .NET CLR 2.0.50727) Havij
14.17.87.22 - - [05/May/2014:16:49:46 +0800] "GET /question/share.html?question_id=2000000000%27+and+if%28Length

%28%28database%28%29%29%29%3C32%2CBENCHMARK%28105380%2CMD5%280x41%29%29%2C0%29+and+%27x%27%3D%27x HTTP/1.1" 500 1293

"-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727) Havij"
14.17.87.22 - - [05/May/2014:16:49:46 +0800] "GET /question/share.html?question_id=2000000000%27+and+if%28Length

%28%28database%28%29%29%29%3C16%2CBENCHMARK%28105380%2CMD5%280x41%29%29%2C0%29+and+%27x%27%3D%27x HTTP/1.1" 500 1293

"-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727) Havij"

14.17.87.22 - - [05/May/2014:16:49:46 +0800] "GET /question/share.html?question_id=2000000000%27+and+if%28Length

%28%28database%28%29%29%29%3C4%2CBENCHMARK%28105380%2CMD5%280x41%29%29%2C0%29+and+%27x%27%3D%27x HTTP/1.1" 500 1293

"-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727) Havij"
121.8.210.24 - - [05/May/2014:18:43:53 +0800] "GET /question/share.html?question_id='and 1=(if(2>1,1,0)) and '

HTTP/1.1" 500 1293 "http://ask.qq.com" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)

Chrome/28.0.1500.63 Safari/537.36"
121.8.210.24 - - [05/May/2014:18:43:57 +0800] "GET /question/share.html?

question_id='and 1=(if(2>1,1,')) and ' HTTP/1.1" 500 5 "http://ask.qq.com" "Mozilla/5.0 (Windows NT 6.1; WOW64)

AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.63 Safari/537.36"
121.8.210.24 - - [05/May/2014:18:44:01

+0800] "GET /question/share.html?question_id='and 1=(if(2>1,1,')) and ' HTTP/1.1" 500 5 "http://ask.qq.com"

"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.63 Safari/537.36"
121.8.210.24 - - [05/May/2014:18:44:03 +0800] "GET /question/share.html?question_id='and 1=(if(2>1,1,)) and '

HTTP/1.1" 500 5 "http://ask.qq.com" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)

Chrome/28.0.1500.63 Safari/537.36"
121.8.210.24 - - [05/May/2014:18:44:25 +0800] "GET /question/share.html?

question_id='and 1=(if(2>1,sleep(5),sleep(5)) and ' HTTP/1.1" 500 5 "http://ask.qq.com" "Mozilla/5.0 (Windows NT 6.1;

WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.63 Safari/537.36"
121.8.210.24 - -

[05/May/2014:18:44:26 +0800] "GET /question/share.html?question_id='and 1=(if(2>1,sleep(5),sleep(5)) and ' HTTP/1.1"

500 5 "http://ask.qq.com" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)

Chrome/28.0.1500.63 Safari/537.36"
121.8.210.24 - - [05/May/2014:18:44:27 +0800] "GET /question/share.html?

question_id='and 1=(if(2>1,sleep(5),sleep(5)) and ' HTTP/1.1" 500 5 "http://ask.qq.com" "Mozilla/5.0 (Windows NT 6.1;

WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.63 Safari/537.36"
121.8.210.24 - -

[05/May/2014:18:44:34 +0800] "GET /question/share.html?question_id=' and 1=(if(2>1,sleep(5),sleep(5)) and ' HTTP/1.1"

500 5 "http://ask.qq.com" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)

Chrome/28.0.1500.63 Safari/537.36"
121.8.210.24 - - [05/May/2014:18:44:35 +0800] "GET /question/share.html?

question_id=' and 1=(if(2>1,sleep(5),sleep(5)) and ' HTTP/1.1" 500 5 "http://ask.qq.com" "Mozilla/5.0 (Windows NT

6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.63 Safari/537.36"
121.8.210.24 - -

[05/May/2014:18:46:58 +0800] "GET /question/share.html?question_id=' union select(0)from(select(sleep(9))) HTTP/1.1"

500 5 "http://ask.qq.com" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)

Chrome/28.0.1500.63 Safari/537.36"
121.8.210.24 - - [05/May/2014:18:46:58 +0800] "GET /question/share.html?

question_id=' union select(0)from(select(sleep(9))) HTTP/1.1" 500 5 "http://ask.qq.com" "Mozilla/5.0 (Windows NT 6.1;

WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.63 Safari/537.36"
121.8.210.24 - -

[05/May/2014:18:46:59 +0800] "GET /question/share.html?question_id=' union select(0)from(select(sleep(9))) HTTP/1.1"

500 5 "http://ask.qq.com" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)

Chrome/28.0.1500.63 Safari/537.36"
121.8.210.24 - - [05/May/2014:18:47:37 +0800] "GET /question/share.html?

question_id='%2b(select(0)from(select(sleep(0)))v)%2b' HTTP/1.1" 500 1293 "http://ask.qq.com" "Mozilla/5.0 (Windows

NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.63 Safari/537.36"

11.确定原因
SQL注入引起的, 日志内可见如下访问:
"GET /question/share.html?question_id=620779%20AND%20100%3E(SELECT%20COUNT(*)%20FROM

%20information_schema.columns,information_schema.columns%20T2,information_schema.columns

%20T3,information_schema.columns%20T4,information_schema.columns%20T5,information_schema.columns

%20T6,information_schema.columns%20T7,information_schema.columns%20T8,information_schema.columns

%20T9,information_schema.columns%20T10%20WHERE%207551=7551)%20AND%20(SELECT%20LENGTH(CURRENT_USER))%20%3C

%202147483647 HTTP/1.1" 200 5 "-" "-"
"GET /question/share.html?question_id='%2b(select(0)from(select(sleep(3)))v)%2b'%22%2b(select(0)from(select(sleep

(0)))v)%2b%22*/ HTTP/1.1" 500

排查 web_code/application/controllers/quesiton.php::share 对question_id未做SQL注入防范,已修正

12.解决办法
开发patch补丁

总结:
本次故障原因
1.开发未做SQL注入防范,漏洞被别人抓住
2.SQL注入引起slave锁表,进一步造成slave SQL线程无法正常工作,脱离MMM

MMM故障排查总结
1.查看MMM状态
2.查看MYSQL服务状态
3.查看单个mysql 同步状态和proceslist状态
4.查看单个mysql errorlog和slow log
5.前端access 和 error log

mysql-mmm故障整理的更多相关文章

  1. MySQL -MMM 学习整理

    一. 规划 1.主机规划 服务器 IP 作用 monitor 10.0.0.10 监控服务器 master-01 10.0.0.5 读写主机01 master-02 10.0.0.6 读写主机02 s ...

  2. mysql 故障整理(2)

    导入备份数据时报错. mysql> system mysql -uroot -p < /root/mingongge_bak.sqlEnter password: ERROR 1840 ( ...

  3. 从 MySQL+MMM 到 MariaDB+Galera Cluster : 一个高可用性系统改造

    很少有事情比推出高可用性(HA)系统之后便经常看到的系统崩溃更糟糕.对于我们这个Rails运行机的团队来说,这个失效的HA系统是MySQL多主复制管理器(MMM). 我们已经找寻MMM的替代品有一段时 ...

  4. Slave_SQL_Running: No mysql同步故障解决方法

    Slave_SQL_Running: No mysql同步故障解决      今天检查数据库发现一台MySQL Slave未和主机同步,查看Slave状态:mysql> show slave s ...

  5. keepalive配置mysql自动故障转移

    keepalive配置mysql自动故障转移 原创 2016年02月29日 02:16:52 2640 本文先配置了一个双master环境,互为主从,然后通过Keepalive配置了一个虚拟IP,客户 ...

  6. (4.1)mysql备份还原——mysql常见故障

    (4.1)mysql备份还原——mysql常见故障 1.常见故障类型 在数据库环境中,常见故障类型: 语句失败,用户进程失败,用户错误 实例失败,介质故障,网络故障 其中最严重的故障主要是用户错误和介 ...

  7. Window下mysql环境配置问题整理

    Window下mysql环境配置问题整理 参考如下链接. 无需安装解压版mysql包 创建选项配置 首次启动服务 用mysqld初始化目录 安装后设置和测试 启动服务错误信息 管理员模式打开cmd m ...

  8. MySQL - Show Processlist 整理

    MySQL - Show Processlist 整理   原文来源:MySQL 5.5 Reference Manual 部分翻译取自:<MySQL_5.1中文参考手册> 转载请注明原文 ...

  9. mysql常见故障问题汇总

    Auth: JinDate: 20140414 UpdateDate: 继续更新 导库字符集的问题http://www.cnblogs.com/diege/p/3640618.htmlmysql-pr ...

随机推荐

  1. Linux 入门记录:十九、Linux 包管理工具 RPM

    一.源代码管理 绝大多数开源软件都是直接以源代码形式发布的,一般会被打包为 tar.gz 的归档压缩文件.程序源代码需要编译为二进制可执行文件后才能够运行使用.源代码的基本编译流程为: ./confi ...

  2. ogre 3d游戏开发框架指南

    ogre 3d游戏开发框架指南pdf 附光盘代码 http://www.ddooo.com/softdown/74228.htm OGRE3D游戏开发框架指南.pdf http://vdisk.wei ...

  3. PHP下载APK文件

    PHP下载APK文件(代码如下) /** * //这里不要随便打印文字,否则会影响输出的文件的 * (例如下载没问题,但是apk安装时候提醒解析安装包错误) * @return array */ pu ...

  4. sicily 1020. Big Integer

    Description Long long ago, there was a super computer that could deal with VeryLongIntegers(no VeryL ...

  5. 【uva10779】收集者的难题

    按照题意建模就行了. #include<bits/stdc++.h> #define naive 0 #define inf 1000000007 using namespace std; ...

  6. 【bzoj1026】windy数

    江泽OJ好,远离bzoj保平安. 仍然沿用之前的记忆化搜索的办法即可. #include<bits/stdc++.h> #define N 10010 using namespace st ...

  7. JSP(3) - 9个JSP内置对象 - 小易Java笔记

    1.9个JSP内置对象 内置对象引用名称     对应的类型 request HttpServletRequest response HttpServletResponse config Servle ...

  8. HTML5 一篇就够的中文教程

    HTML5 是近十年来 Web 开发标准最巨大的飞跃.HTML5 并非仅仅用来表示 Web 内容,它将 Web 带入一个成熟的应用平台,在 HTML5 平台上,视频.音频.图象.动画,以及同电脑的交互 ...

  9. elasticsearch简单查询

    elasticsearch简单查询示例: { "from": "0", //分页,从第一页开始 "size": "10" ...

  10. 201. Non Absorbing DFA

    题意好难看懂的说... 有限状态自动机DFA是这么一个有序组<Σ, U, s, T, phi>:Σ代表输入字符集,表示此自动机的工作范围:U代表所有的状态集合:s是初始状态:T是最终状态: ...