




  • 主库为什么会切换
  • 切换后新主库为什么还是不可用



2、从每分钟的线程快照中发现故障时大量线程处于sending data和statistics状态,但前一分钟快照中未看到任何阻塞与锁等待





会后上机器发现实例上存在20个处于user sleep状态的sql,大概模型都是where id = '+(select 'rbzd' where 6910=6910 and sleep(300)+)',比较可疑,因为开发不会在程序中调用sleep函数






3.1 准备数据

`id` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1; insert into test select 1;

3.2 开24个线程访问此表


[root@VM_42_63_centos ~]# for i in `seq 1 24`; do nohup mysql -S /tmp/mysql.sock3306 -e "select id from test.test where id=''+(select 'rbzd' where 6910=6910 and sleep(300))+''" & done

3.3 观察一波

执行上一步脚本后,再开一个session访问test表会发现线程hang住,没返回,状态为sending data,如下:

(root@localhost) [test]> show processlist;
| Id | User | Host | db | Command | Time | State | Info |
| 2 | root | localhost:38204 | NULL | Sleep | 0 | | NULL |
| 3 | root | localhost | test | Query | 0 | starting | show processlist |
| 4 | root | localhost | NULL | Query | 23 | User sleep | select id from test.test where id=''+(select 'rbzd' where 6910=6910 and sleep(300))+'' |
| 5 | root | localhost | NULL | Query | 23 | User sleep | select id from test.test where id=''+(select 'rbzd' where 6910=6910 and sleep(300))+'' |
| 6 | root | localhost | NULL | Query | 23 | User sleep | select id from test.test where id=''+(select 'rbzd' where 6910=6910 and sleep(300))+'' |
| 7 | root | localhost | NULL | Query | 23 | User sleep | select id from test.test where id=''+(select 'rbzd' where 6910=6910 and sleep(300))+'' |
| 8 | root | localhost | NULL | Query | 23 | User sleep | select id from test.test where id=''+(select 'rbzd' where 6910=6910 and sleep(300))+'' |
| 9 | root | localhost | NULL | Query | 23 | User sleep | select id from test.test where id=''+(select 'rbzd' where 6910=6910 and sleep(300))+'' |
| 10 | root | localhost | NULL | Query | 23 | User sleep | select id from test.test where id=''+(select 'rbzd' where 6910=6910 and sleep(300))+'' |
| 11 | root | localhost | NULL | Query | 23 | User sleep | select id from test.test where id=''+(select 'rbzd' where 6910=6910 and sleep(300))+'' |
| 12 | root | localhost | NULL | Query | 23 | User sleep | select id from test.test where id=''+(select 'rbzd' where 6910=6910 and sleep(300))+'' |
| 13 | root | localhost | NULL | Query | 23 | User sleep | select id from test.test where id=''+(select 'rbzd' where 6910=6910 and sleep(300))+'' |
| 14 | root | localhost | NULL | Query | 23 | User sleep | select id from test.test where id=''+(select 'rbzd' where 6910=6910 and sleep(300))+'' |
| 15 | root | localhost | NULL | Query | 23 | User sleep | select id from test.test where id=''+(select 'rbzd' where 6910=6910 and sleep(300))+'' |
| 16 | root | localhost | NULL | Query | 23 | User sleep | select id from test.test where id=''+(select 'rbzd' where 6910=6910 and sleep(300))+'' |
| 17 | root | localhost | NULL | Query | 23 | User sleep | select id from test.test where id=''+(select 'rbzd' where 6910=6910 and sleep(300))+'' |
| 18 | root | localhost | NULL | Query | 23 | User sleep | select id from test.test where id=''+(select 'rbzd' where 6910=6910 and sleep(300))+'' |
| 19 | root | localhost | NULL | Query | 23 | User sleep | select id from test.test where id=''+(select 'rbzd' where 6910=6910 and sleep(300))+'' |
| 20 | root | localhost | NULL | Query | 23 | User sleep | select id from test.test where id=''+(select 'rbzd' where 6910=6910 and sleep(300))+'' |
| 21 | root | localhost | NULL | Query | 23 | User sleep | select id from test.test where id=''+(select 'rbzd' where 6910=6910 and sleep(300))+'' |
| 22 | root | localhost | NULL | Query | 23 | User sleep | select id from test.test where id=''+(select 'rbzd' where 6910=6910 and sleep(300))+'' |
| 23 | root | localhost | NULL | Query | 23 | User sleep | select id from test.test where id=''+(select 'rbzd' where 6910=6910 and sleep(300))+'' |
| 24 | root | localhost | NULL | Query | 23 | User sleep | select id from test.test where id=''+(select 'rbzd' where 6910=6910 and sleep(300))+'' |
| 25 | root | localhost | NULL | Query | 23 | User sleep | select id from test.test where id=''+(select 'rbzd' where 6910=6910 and sleep(300))+'' |
| 26 | root | localhost | NULL | Query | 23 | User sleep | select id from test.test where id=''+(select 'rbzd' where 6910=6910 and sleep(300))+'' |
| 27 | root | localhost | NULL | Query | 23 | User sleep | select id from test.test where id=''+(select 'rbzd' where 6910=6910 and sleep(300))+'' |
| 28 | root | localhost | NULL | Query | 11 | Sending data | select id from test.test |
27 rows in set (0.00 sec)


1、sql注入,user sleep状态的sql累计到24条之后,其他sql就不能进入innodb进行操作,包括高可用探测程序,由于线程快照中过滤了sleep,导致没抓到注入的sql,加大了后续排查难度



4、重启主库后,仅有20个注入进入innodb且一直为user sleep,由于没达到24个触发阈值,所以业务表现正常,只是性能不及老主库,原因就是已经有20个线程在innodb层一直没退出,kill掉这些线程,业务恢复正常


  • 由于线程快照过滤了sleep关键字,导致排查过程较长,后续要对user sleep状态线程保留在线程快照中
  • 开发优化程序,防止sql注入


