【转帖】MySQL 8.0 hash join有重大缺陷?
徐春阳老师发文爆MySQL 8.0 hash join有重大缺陷。
1. 利用TPC-H工具准备测试环境
TPC-H工具在这里下载 http://www.tpc.org/tpch/default5.asp。默认并不支持MySQL,需要自己手动做些调整,参见 https://imysql.com/2012/12/21/tpch-for-mysql-manual.html。
在本案中,我指定的 Scale Factor 参数是10,即:
[root@yejr.run dbgen]# ./dbgen -s 10 && ls -l *tbl
-rw-r--r-- 1 root root 244847642 Apr 14 09:52 customer.tbl
-rw-r--r-- 1 root root 7775727688 Apr 14 09:52 lineitem.tbl
-rw-r--r-- 1 root root 2224 Apr 14 09:52 nation.tbl
-rw-r--r-- 1 root root 1749195031 Apr 14 09:52 orders.tbl
-rw-r--r-- 1 root root 243336157 Apr 14 09:52 part.tbl
-rw-r--r-- 1 root root 1204850769 Apr 14 09:52 partsupp.tbl
-rw-r--r-- 1 root root 389 Apr 14 09:52 region.tbl
-rw-r--r-- 1 root root 14176368 Apr 14 09:52 supplier.tbl
2. 创建测试表,导入测试数据。
| Name | Row_format | Rows | Avg_row_length | Data_length | Index_length |
| customer | Dynamic | 1476605 | 197 | 291258368 | 0 |
| lineitem | Dynamic | 59431418 | 152 | 9035579392 | 0 |
| nation | Dynamic | 25 | 655 | 16384 | 0 |
| orders | Dynamic | 14442405 | 137 | 1992294400 | 0 |
| part | Dynamic | 1980917 | 165 | 327991296 | 0 |
| partsupp | Dynamic | 9464104 | 199 | 1885339648 | 0 |
| region | Dynamic | 5 | 3276 | 16384 | 0 |
| supplier | Dynamic | 99517 | 184 | 18366464 | 0 |
提醒:几个测试表都不要加任何索引,包括主键,上表中 Index_length 的值均为0。
3. 运行测试SQL
[root@yejr.run]> \s
Server version: 8.0.19-commercial MySQL Enterprise Server - Commercial
不过,本案主要测试Hash Join,因此去掉了其中的GROUP BY和ORDER BY子句。
[root@yejr.run]> desc select count(*) -> from -> customer, -> orders, -> lineitem, -> supplier, -> nation, -> region -> where -> c_custkey = o_custkey -> and l_orderkey = o_orderkey -> and l_suppkey = s_suppkey -> and c_nationkey = s_nationkey -> and s_nationkey = n_nationkey -> and n_regionkey = r_regionkey -> and r_name = 'AMERICA' -> and o_orderdate >= date '1993-01-01' -> and o_orderdate < date '1993-01-01' + interval '1' year; +----------+------+----------+----------+----------------------------------------------------+ | table | type | rows | filtered | Extra | +----------+------+----------+----------+----------------------------------------------------+ | region | ALL | 5 | 20.00 | Using where | | nation | ALL | 25 | 10.00 | Using where; Using join buffer (Block Nested Loop) | | supplier | ALL | 98705 | 10.00 | Using where; Using join buffer (Block Nested Loop) | | customer | ALL | 1485216 | 10.00 | Using where; Using join buffer (Block Nested Loop) | | orders | ALL | 14932433 | 1.11 | Using where; Using join buffer (Block Nested Loop) | | lineitem | ALL | 59386314 | 1.00 | Using where; Using join buffer (Block Nested Loop) | +----------+------+----------+----------+----------------------------------------------------+
加上 format=tree 再看下(真壮观啊。。。)
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)
-> Inner hash join (lineitem.L_SUPPKEY = supplier.S_SUPPKEY), (lineitem.L_ORDERKEY = orders.O_ORDERKEY) (cost=40107736685515472896.00 rows=4010763818487343104)
-> Table scan on lineitem (cost=0.07 rows=59386314)
-> Hash
-> Inner hash join (orders.O_CUSTKEY = customer.C_CUSTKEY) (cost=60799566599072.12 rows=6753683238538)
-> Filter: ((orders.O_ORDERDATE >= DATE'1993-01-01') and (orders.O_ORDERDATE < <cache>((DATE'1993-01-01' + interval '1' year)))) (cost=0.16 rows=165883)
-> Table scan on orders (cost=0.16 rows=14932433)
-> Hash
-> Inner hash join (customer.C_NATIONKEY = nation.N_NATIONKEY) (cost=3664985889.79 rows=3664956624)
-> Table scan on customer (cost=0.79 rows=1485216)
-> Hash
-> Inner hash join (supplier.S_NATIONKEY = nation.N_NATIONKEY) (cost=24976.50 rows=24676)
-> Table scan on supplier (cost=513.52 rows=98705)
-> Hash
-> Inner hash join (nation.N_REGIONKEY = region.R_REGIONKEY) (cost=3.50 rows=3)
-> Table scan on nation (cost=0.50 rows=25)
-> Hash
-> Filter: (region.R_NAME = 'AMERICA') (cost=0.75 rows=1)
-> Table scan on region (cost=0.75 rows=5)
在开始跑之前,我们先看一眼手册中关于Hash Join的描述,其中有一段是这样的:
Memory usage by hash joins can be controlled using the join_buffer_size
system variable; a hash join cannot use more memory than this amount.
When the memory required for a hash join exceeds the amount available,
MySQL handles this by using files on disk. If thishappens, you should
be aware that the join may not succeed if a hash join cannot fit into
memory and it creates more files than set for open_files_limit. To avoid
such problems, make either of the following changes:
- Increase join_buffer_size so that the hash join does not spill over to disk.
- Increase open_files_limit.
简言之,当 join_buffer_size 不够时,会在hash join的过程中转储大量的磁盘表(把一个hash表切分成多个小文件放在磁盘上,再逐个读入内存进行hash join),因此建议加大 join_buffer_size,或者加大 open_files_limit 上限。
[root@yejr.run]> select @@join_buffer_size, @@tmp_table_size, @@innodb_buffer_pool_size;
| @@join_buffer_size | @@tmp_table_size | @@innodb_buffer_pool_size |
| 1073741824 | 16777216 | 10737418240 |
并且为了保险起见,在执行SQL时也用 SET_VAR(8.0新特性) 设置了 join_bufer_size,走起。
# Query_time: 2911.426483 Lock_time: 0.000251 Rows_sent: 1 Rows_examined: 76586082
/data 分区最开始可用空间是 373GB,这条SQL在峰值吃掉了约170GB,着实可怕。
# 刚开始
/dev/vdb 524032000 132967368 391064632 26% /data
# 峰值时
/dev/vdb 524032000 319732288 204299712 62% /data
4. 补充测试
[root@yejr.run]> EXPLAIN STRAIGHT_JOIN select count(*) from customer straight_join orders straight_join lineitem straight_join supplier straight_join nation straight_join region where c_custkey = o_custkey and l_orderkey = o_orderkey and l_suppkey = s_suppkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'AMERICA' and o_orderdate >= date '1993-01-01' and o_orderdate < date '1993-01-01' + interval '1' year; +----------+----------+----------+----------------------------------------------------+ | table | rows | filtered | Extra | +----------+----------+----------+----------------------------------------------------+ | customer | 1485216 | 100.00 | NULL | | orders | 14932433 | 1.11 | Using where; Using join buffer (Block Nested Loop) | | lineitem | 59386314 | 10.00 | Using where; Using join buffer (Block Nested Loop) | | supplier | 98705 | 1.00 | Using where; Using join buffer (Block Nested Loop) | | nation | 25 | 10.00 | Using where; Using join buffer (Block Nested Loop) | | region | 5 | 20.00 | Using where; Using join buffer (Block Nested Loop) | +----------+----------+----------+----------------------------------------------------+ #format=tree模式下 | -> Aggregate: count(0) -> Inner hash join (region.R_REGIONKEY = nation.N_REGIONKEY) (cost=204565289351994015744.00 rows=8021527039324357632) -> Filter: (region.R_NAME = 'AMERICA') (cost=0.00 rows=1) -> Table scan on region (cost=0.00 rows=5) -> Hash -> Inner hash join (nation.N_NATIONKEY = customer.C_NATIONKEY) (cost=200554431911464173568.00 rows=-9223372036854775808) -> Table scan on nation (cost=0.00 rows=25) -> Hash -> Inner hash join (supplier.S_NATIONKEY = customer.C_NATIONKEY), (supplier.S_SUPPKEY = lineitem.L_SUPPKEY) (cost=160446786739199049728.00 rows=-9223372036854775808) -> Table scan on supplier (cost=0.00 rows=98705) -> Hash -> Inner hash join (lineitem.L_ORDERKEY = orders.O_ORDERKEY) (cost=16253562153466286.00 rows=16253535510797654) -> Table scan on lineitem (cost=0.01 rows=59386314) -> Hash -> Inner hash join (orders.O_CUSTKEY = customer.C_CUSTKEY) (cost=24638698342.46 rows=2736915995) -> Filter: ((orders.O_ORDERDATE >= DATE'1993-01-01') and (orders.O_ORDERDATE < <cache>((DATE'1993-01-01' + interval '1' year)))) (cost=0.94 rows=165883) -> Table scan on orders (cost=0.94 rows=14932433) -> Hash -> Table scan on customer (cost=153126.35 rows=1485216)
[root@yejr.run]> mysql> select /*+ set_var(join_buffer_size=1073741824) */
| count(*) |
| 72033 |
1 row in set (4 min 12.31 sec)
这次之所以会比较快,是因为 orders 表在第二顺序执行,对它还附加了WHERE条件,过滤后数据量变小了(全表1500万,过滤后227万),因此整体执行时间缩短了。
靠着 straight_join 拯救了危机。
[root@yejr.run]> desc select count(*) from orders o , lineitem l, partsupp ps where
| table | rows | filtered | Extra |
| ps | 7697248 | 100.00 | NULL |
| l | 59386314 | 10.00 | Using where; Using join buffer (Block Nested Loop) |
| o | 14932433 | 10.00 | Using where; Using join buffer (Block Nested Loop) |
# Query_time: 304.889654 Lock_time: 0.000178 Rows_sent: 1 Rows_examined: 82986052
在前几天我的文章《MySQL没前途了吗?》中,其实已经说了MySQL目前不适合做OLAP业务,即便有Hash Join也不行,毕竟其适用的场景很有限。
另外,在已经明确需要走Hash Join的情况下,就应该人为干预,提前加大join_buffer_size,减少执行过程中生成的临时文件。
不过,MySQL在偏OLAP场景上的性能的确还有很大提升空间,对此我持谨慎乐观态度,比如把ClickHouse给直接收编了呢 :)
