PostgreSQL 修改执行计划 GroupAggregate 为 HashAggregate

1、前言

PostgreSQL 聚合算法有两种，HashAggregate and GroupAggregate 。我们知道GroupAggregate 需要对记录进行排序，而 HashAggregate 则无需进行排序，通常 HashAggregate 要快很多。但是，我们经常会看到优化器使用 GroupAggregate，即使 enable_sort=off 也不能阻止排序操作。那么有什么办法让优化器使用 HashAggregate 了？

2、优化例子

创建测试数据：数据每个列有100个不同值，但是列值之间是相关的，也就是 count(distinct *) 也就100 个。

create table t1

(

  id1 integer,  id2 integer,  id3 integer,  id4 integer,

  id5 integer,  id6 integer,  id7 integer,  id8 integer,

  status1 char(1),  status2 char(1),  status3 char(1),  status4 char(1),

  status5 char(1),  status6 char(1),  status7 char(1),  status8 char(1),

  id9 integer, id10 integer,

  name text

);

insert into t1

select

  generate_series(1,100),  generate_series(1,100),  generate_series(1,100),  generate_series(1,100),

  generate_series(1,100),  generate_series(1,100),  generate_series(1,100),  generate_series(1,100),

  round(random()),  round(random()),  round(random()),  round(random()),

  round(random()),  round(random()),  round(random()),  round(random()),

  generate_series(1,100),  generate_series(1,100),

  repeat('a',100);

begin

  for i in 1..16 loop

    insert into t1 select * from t1;

  end loop;

end;

test=# select count(*) from t1;

  count

---------

 6553600

(1 row)

不同的group by 走不同 aggregrate 算法。

test=# explain analyze select id1,count(*) from t1 group by id1;

                                                     QUERY PLAN

---------------------------------------------------------------------------------------------------------------------

 HashAggregate  (cost=224335.18..224336.18 rows=100 width=12) (actual time=2084.369..2084.387 rows=100 loops=1)

   Group Key: id1

   ->  Seq Scan on t1  (cost=0.00..191567.12 rows=6553612 width=4) (actual time=1.043..876.510 rows=6553600 loops=1)

 Planning Time: 0.051 ms

 Execution Time: 2084.422 ms

(5 rows)

test=# explain analyze select id1,id2,count(*) from t1 group by id1,id2;

                                                      QUERY PLAN

----------------------------------------------------------------------------------------------------------------------

 HashAggregate  (cost=240719.21..240819.21 rows=10000 width=16) (actual time=4801.232..4801.302 rows=100 loops=1)

   Group Key: id1, id2

   ->  Seq Scan on t1  (cost=0.00..191567.12 rows=6553612 width=8) (actual time=0.017..3457.207 rows=6553600 loops=1)

 Planning Time: 0.076 ms

 Execution Time: 4801.430 ms

(5 rows)

--随着 group by 列的数量增加，优化器开始走 GroupAggregate

test=# explain analyze select id1,id2,id3,count(*) from t1 group by id1,id2,id3;

                                                         QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------

 GroupAggregate  (cost=1157569.44..1246043.20 rows=655361 width=20) (actual time=4519.386..6399.677 rows=100 loops=1)

   Group Key: id1, id2, id3

   ->  Sort  (cost=1157569.44..1173953.47 rows=6553612 width=12) (actual time=4501.004..5619.469 rows=6553600 loops=1)

         Sort Key: id1, id2, id3

         Sort Method: external merge  Disk: 141144kB

         ->  Seq Scan on t1  (cost=0.00..191567.12 rows=6553612 width=12) (actual time=0.492..1274.615 rows=6553600 loops=1)

 Planning Time: 0.072 ms

 Execution Time: 6415.101 ms

(8 rows)

test=# explain analyze select id1,id2,id3,id4,count(*) from t1 group by id1,id2,id3,id4;

                                                         QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------

 GroupAggregate  (cost=1157569.44..1262427.23 rows=655361 width=24) (actual time=4873.150..7008.342 rows=100 loops=1)

   Group Key: id1, id2, id3, id4

   ->  Sort  (cost=1157569.44..1173953.47 rows=6553612 width=16) (actual time=4855.042..6153.156 rows=6553600 loops=1)

         Sort Key: id1, id2, id3, id4

         Sort Method: external merge  Disk: 166792kB

         ->  Seq Scan on t1  (cost=0.00..191567.12 rows=6553612 width=16) (actual time=0.403..1320.904 rows=6553600 loops=1)

 Planning Time: 0.208 ms

 Execution Time: 7023.674 ms

(8 rows)

问题解析：根据优化器的估算，如果 group by 只有一列，有 100 个不同值，2列 100*100 个， 3列 100*100*100 个，也就是group by 后面的列越多，优化器会认为返回的不同值越多，优化器更偏向于采用GroupAggregate算法。基于以上思路，我们考虑对表创建多列的统计数据。

test=# create statistics s1(ndistinct) on id1,id2,id3,id4,id5,id6,id7,id8 from t1;

CREATE STATISTICS

test=# analyze t1;

ANALYZE

test=# explain analyze select id1,id2,id3,id4,count(*) from t1 group by id1,id2,id3,id4;

                                                      QUERY PLAN

----------------------------------------------------------------------------------------------------------------------

 HashAggregate  (cost=273487.27..273488.27 rows=100 width=24) (actual time=2787.681..2787.695 rows=100 loops=1)

   Group Key: id1, id2, id3, id4

   ->  Seq Scan on t1  (cost=0.00..191567.12 rows=6553612 width=16) (actual time=0.024..926.445 rows=6553600 loops=1)

 Planning Time: 0.243 ms

 Execution Time: 2787.764 ms

(5 rows)

test=# explain analyze select id1,id2,id3,id4,id5,id6,id7,id8,count(*) from t1 group by id1,id2,id3,id4,id5,id6,id7,id8;

                                                      QUERY PLAN

----------------------------------------------------------------------------------------------------------------------

 HashAggregate  (cost=358628.65..358629.65 rows=100 width=40) (actual time=3304.199..3304.218 rows=100 loops=1)

   Group Key: id1, id2, id3, id4, id5, id6, id7, id8

   ->  Seq Scan on t1  (cost=0.00..211172.20 rows=6553620 width=32) (actual time=0.027..887.569 rows=6553600 loops=1)

 Planning Time: 0.196 ms

 Execution Time: 3304.256 ms

可以看到，对表创建多列统计数据后，优化器采用了 HashAggregate 算法，性能也得到了很大提升。

3、group by 超过8列的情况

test=# create statistics s2(ndistinct) on id1,id2,id3,id4,id5,id6,id7,id8,status1 from t1;

ERROR:  cannot have more than 8 columns in statistics

test=# explain analyze select id1,id2,id3,id4,id5,id6,id7,id8,status1,status2,status3,status4,status5,status6,status7,status8,count(*)

test-# from t1 group by id1,id2,id3,id4,id5,id6,id7,id8,status1,status2,status3,status4,status5,status6,status7,status8;

                                                         QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------

 HashAggregate  (cost=489701.05..489957.05 rows=25600 width=56) (actual time=7595.743..7595.849 rows=100 loops=1)

   Group Key: id1, id2, id3, id4, id5, id6, id7, id8, status1, status2, status3, status4, status5, status6, status7, status8

   ->  Seq Scan on t1  (cost=0.00..211172.20 rows=6553620 width=48) (actual time=0.500..945.677 rows=6553600 loops=1)

 Planning Time: 0.887 ms

 Execution Time: 7596.065 ms

(5 rows)

test-# select id1,id2,id3,id4,id5,id6,id7,id8,status1,status2,status3,status4,status5,status6,status7,status8,id9,count(*)

test-# from t1 group by id1,id2,id3,id4,id5,id6,id7,id8,status1,status2,status3,status4,status5,status6,status7,status8,id9;

                                                              QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------

 GroupAggregate  (cost=1407916.97..1725760.65 rows=655348 width=60) (actual time=48694.618..68534.965 rows=100 loops=1)

   Group Key: id1, id2, id3, id4, id5, id6, id7, id8, status1, status2, status3, status4, status5, status6, status7, status8, id9

   ->  Sort  (cost=1407916.97..1424300.66 rows=6553478 width=52) (actual time=48542.144..64549.985 rows=6553600 loops=1)

         Sort Key: id1, id2, id3, id4, id5, id6, id7, id8, status1, status2, status3, status4, status5, status6, status7, status8, id9

         Sort Method: external merge  Disk: 397656kB

         ->  Seq Scan on t1  (cost=0.00..217944.78 rows=6553478 width=52) (actual time=0.018..2029.935 rows=6553600 loops=1)

 Planning Time: 0.203 ms

 Execution Time: 68559.728 ms

(8 rows)

结论：

例子1：虽然无法创建超过8列的statistics，但由于后面的status 列的 distinct 值较少，8 个列优化器估算的distinct 值也只有 100 * 2^8 ，因此，优化器还是走 HashAggregate。

例子2：多了1列，估算的 distinct 值 100 * 2^8 *100 ，优化器走GroupAggregate

test=# explain analyze

test-# select id1,id2,id3,id4,id5,id6,id7,id8,id9,id10,status1,status2,status3,status4,status5,status6,count(*)

test-# from t1 group by id1,id2,id3,id4,id5,id6,id7,id8,id9,id10,status1,status2,status3,status4,status5,status6;

                                                         QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------

 GroupAggregate  (cost=1407950.79..1709417.77 rows=655363 width=60) (actual time=36920.441..52070.524 rows=100 loops=1)

   Group Key: id1, id2, id3, id4, id5, id6, id7, id8, id9, id10, status1, status2, status3, status4, status5, status6

   ->  Sort  (cost=1407950.79..1424334.87 rows=6553630 width=52) (actual time=36801.599..48682.412 rows=6553600 loops=1)

         Sort Key: id1, id2, id3, id4, id5, id6, id7, id8, id9, id10, status1, status2, status3, status4, status5, status6

         Sort Method: external merge  Disk: 397656kB

         ->  Seq Scan on t1  (cost=0.00..217946.30 rows=6553630 width=52) (actual time=0.098..2682.113 rows=6553600 loops=1)

 Planning Time: 0.323 ms

 Execution Time: 52096.144 ms

(8 rows)

test=# create statistics s2(ndistinct) on id9,id10,status1,status2,status3,status4,status5,status6 from t1;

CREATE STATISTICS

test=# analyze t1;

ANALYZE

test-# select id1,id2,id3,id4,id5,id6,id7,id8,id9,id10,status1,status2,status3,status4,status5,status6,status7,count(*)

test-# from t1 group by id1,id2,id3,id4,id5,id6,id7,id8,id9,id10,status1,status2,status3,status4,status5,status6,status7;

                                                          QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------

 HashAggregate  (cost=512859.65..513059.65 rows=20000 width=62) (actual time=7189.342..7189.445 rows=100 loops=1)

   Group Key: id1, id2, id3, id4, id5, id6, id7, id8, id9, id10, status1, status2, status3, status4, status5, status6, status7

   ->  Seq Scan on t1  (cost=0.00..217946.30 rows=6553630 width=54) (actual time=0.017..931.265 rows=6553600 loops=1)

 Planning Time: 0.135 ms

 Execution Time: 7189.517 ms

(5 rows)

结论：

例子1：由于distinct 值过多，优化器还是走GroupAggregate；

例子2：通过创建额外的 Statistics s2 ，可以走HashAggregate的情况，也就是s1 , s2 可以同时被优化器所使用。

4、结论

优化器在决定采用哪种 group by 算法时，会估算结果分组个数，如果结果分组个数过多，优化器会偏向于 GroupAggregate。但是我们知道，PostgreSQL 对于多列值的合并估算并不准确，这就需要手动创建多列统计信息。实际优化过程中，可能经常需要创建多列统计信息，以促使优化器采用HashAggregate。