PostgreSQL 欺骗优化器之扩展统计信息

一、什么是扩展统计

扩展统计对象，追踪指定表、外部表或物化视图的数据。目前支持的种类：

启用n-distinct统计的 ndistinct。
启用功能依赖性统计的dependencies。
启用最常见的值列表的mcv。

本文仅讨论n-distinct统计信息，在优化器中的作用。手工修改统计信息，使得执行计划发生改变。

二、数据准备

建立一个大表，这个表模拟商业交易明细记录。这个表，不仅有海量的数据，也具有大量的维度信息。

create table t_order as

select id,

       'dim01_' || (random() * 5)::int                 as dim01,

       'dim02_' || (random() * 5)::int                 as dim02,

       'dim03_' || (random() * 5)::int                 as dim03,

       'dim04_' || (random() * 5)::int                 as dim04,

       'dim05_' || (random() * 5)::int                 as dim05,

       'dim06_' || (random() * 5)::int                 as dim06,

       'dim07_' || (random() * 5)::int                 as dim07,

       'dim08_' || (random() * 5)::int                 as dim08,

       'dim09_' || (random() * 5)::int                 as dim09,

       'dim10_' || (random() * 5)::int                 as dim10,

       'dim11_' || (random() * 5)::int                 as dim11,

       'dim12_' || (random() * 5)::int                 as dim12,

       'dim13_' || (random() * 5)::int                 as dim13,

       'dim14_' || (random() * 5)::int                 as dim14,

       'dim15_' || (random() * 5)::int                 as dim15,

       'dim16_' || (random() * 5)::int                 as dim16,

       'dim17_' || (random() * 5)::int                 as dim17,

       'dim18_' || (random() * 5)::int                 as dim18,

       'dim19_' || (random() * 5)::int                 as dim19,

       'dim20_' || (random() * 5)::int                 as dim20,

       'dim21_' || (random() * 5)::int                 as dim21,

       'dim22_' || (random() * 5)::int                 as dim22,

       'dim23_' || (random() * 5)::int                 as dim23,

       'dim24_' || (random() * 5)::int                 as dim24,

       'dim25_' || (random() * 5)::int                 as dim25,

       'dim26_' || (random() * 5)::int                 as dim26,

       'dim27_' || (random() * 5)::int                 as dim27,

       'dim28_' || (random() * 5)::int                 as dim28,

       'dim29_' || (random() * 5)::int                 as dim29,

       'dim30_' || (random() * 5)::int                 as dim30,

       (random() * 100)::numeric(20, 2)                as amount,

       (now() - (random() * 10)::numeric(10, 2))::date as created

from (select generate_series(1, 10000000) id) t;

10000000 rows affected in 1 m 8 s 747 ms

select pg_table_size('t_order')/1024/1024;

 ?column?

----------

     2893

(1 行记录)

用例数据有1000万行，30个维度，有2893 MB。真实商业业务系统中，订单数据表，会达到10 TB，每日增量数据可以达到100 GB，比用例数据更加庞大。

三、查询需求

在报表系统中，还需要进一步将交易明细表的数据，生成所有维度的汇总数据表。汇总数据表，是按每个维度的基本度进行组合的聚合数据，各类报表是在查询结果的基础上，选取一个或几个维度，读取维度的细粒度，再次聚合计算而成。如果维度的细粒度较低，最终的海量的交易明细数据，会压缩成少量的维度明细聚合记录。在查询的计划中，会使用两种聚合函数实现：HashAggregate与GroupAggregate。

HashAggregate

对于hash聚合来说，数据库会根据group by字段后面的值算出hash值，并根据前面使用的聚合函数在内存中维护对应的列表。内存参数work_mem和表的分析结果，决定是否选择HashAggregate。
GroupAggregate

对于普通聚合函数，使用group聚合，其原理是先将表中的数据按照group by的字段排序，这样子同一个group by的值就在一起，对排好序的数据进行一次全扫描，得到聚合的结果 .

select dim01, count(*) as cnt, sum(amount) as amount, ....

from t_order

group by dim01, .....

为了便于展现优化器选择HashAggregate和GroupAggregate的语句上的差别，用例将内存参数work_mem设置较小的数值。

set work_mem = 1024;

四试验步骤

1、数据表分析之前

关闭自动数据收集。由于没有数据表的统计信息，分组列的行估值行数为200，行数估值总计是200的n次方或总行数。

按单列分组聚合，执行计划使用HashAggregate。

explain

select count(*) , sum(amount) ,count(*) , sum(amount),count(*) , sum(amount),count(*) , sum(amount),count(*) , sum(amount),count(*) , sum(amount),count(*) , sum(amount),count(*) , sum(amount),count(*) , sum(amount),count(*) , sum(amount),count(*) , sum(amount),count(*) , sum(amount),count(*) , sum(amount),count(*) , sum(amount),count(*) , sum(amount)

from t_order

group by dim01;

HashAggregate  (cost=1245372.49..1245381.99 rows=200 width=632)

  Group Key: dim01

  ->  Seq Scan on t_order  (cost=0.00..470371.17 rows=10000017 width=52)

按多列分组聚合，执行计划使用GroupAggregate。

explain

select  count(*) , sum(amount)

from t_order

group by dim01,dim02 ;

GroupAggregate  (cost=3068597.60..3194097.81 rows=40000 width=104)

"  Group Key: dim01, dim02"

  ->  Sort  (cost=3068597.60..3093597.64 rows=10000017 width=84)

"        Sort Key: dim01, dim02"

        ->  Seq Scan on t_order  (cost=0.00..470371.17 rows=10000017 width=84)

2、数据表分析之后

由于已知数据表的统计信息，行数估值总计是多个分组列的统计值的乘积或总行数。这里的 rows=7776=6^5

HashAggregate



explain

select  count(*), sum(amount), count(*), sum(amount), count(*)

from t_order

group by dim01, dim02, dim03, dim04 , dim05 ;

HashAggregate  (cost=720371.59..720488.23 rows=7776 width=128)

"  Group Key: dim01, dim02, dim03, dim04, dim05"

  ->  Seq Scan on t_order  (cost=0.00..470371.17 rows=10000017 width=46)

GroupAggregate

--增加聚合函数列

explain

select  count(*), sum(amount), count(*), sum(amount), count(*), sum(amount)

from t_order

group by dim01, dim02, dim03, dim04 , dim05 ;

GroupAggregate  (cost=2248285.10..2548421.69 rows=7776 width=160)

"  Group Key: dim01, dim02, dim03, dim04, dim05"

  ->  Sort  (cost=2248285.10..2273285.14 rows=10000017 width=46)

"        Sort Key: dim01, dim02, dim03, dim04, dim05"

        ->  Seq Scan on t_order  (cost=0.00..470371.17 rows=10000017 width=46)

--增加分组列

explain

select  count(*)

from t_order

group by dim01, dim02, dim03, dim04 , dim05, dim06 ;

GroupAggregate  (cost=2248285.10..2448752.00 rows=46656 width=56)

"  Group Key: dim01, dim02, dim03, dim04, dim05, dim06"

  ->  Sort  (cost=2248285.10..2273285.14 rows=10000017 width=48)

"        Sort Key: dim01, dim02, dim03, dim04, dim05, dim06"

        ->  Seq Scan on t_order  (cost=0.00..470371.17 rows=10000017 width=48)

3、通过distinct分析

distinct子句具有相同的性质。查询语句仅包含distinct多个维度列，不同的估值行数，会影响计划的选择。

HashAggregate

explain analyse

select distinct dim01 , dim02 , dim03 , dim04 , dim05 ,dim06

from t_order;

HashAggregate  (cost=620371.43..620837.99 rows=46656 width=48) (actual time=4422.376..4427.546 rows=46656 loops=1)

"  Group Key: dim01, dim02, dim03, dim04, dim05, dim06"

  ->  Seq Scan on t_order  (cost=0.00..470371.17 rows=10000017 width=48) (actual time=0.013..710.242 rows=10000000 loops=1)

Planning Time: 0.081 ms

Execution Time: 4428.778 ms

GroupAggregate

explain analyse

select distinct dim01 , dim02 , dim03 , dim04 , dim05 ,dim06 , dim07

from t_order;

Unique  (cost=2316647.10..2516647.44 rows=279936 width=56) (actual time=64027.276..74826.618 rows=279456 loops=1)

  ->  Sort  (cost=2316647.10..2341647.14 rows=10000017 width=56) (actual time=64027.274..72741.372 rows=10000000 loops=1)

"        Sort Key: dim01, dim02, dim03, dim04, dim05, dim06, dim07"

        Sort Method: external merge  Disk: 645840kB

        ->  Seq Scan on t_order  (cost=0.00..470371.17 rows=10000017 width=56) (actual time=0.014..1469.598 rows=10000000 loops=1)

Planning Time: 0.080 ms

Execution Time: 74872.438 ms

4、阶段分析

优化器的计算公式，如果综合估值达到某个阈值后，内存参数work_mem，不能满足HashAggregate需要的内存空间，就会选择GroupAggregate。GroupAggregate函数会先排序后聚合，所需要的更多的CPU时间。

work_mem不满足HashAggregate需要

set work_mem = 10240;

explain (analyse,buffers)

select count(*)

from t_order

group by dim01, dim02;

 GroupAggregate  (cost=2385009.10..2485409.27 rows=40000 width=72) (actual time=8648.437..11283.800 rows=36 loops=1)

   Group Key: dim01, dim02

   Buffers: shared hit=370371, temp read=87632 written=87812

   ->  Sort  (cost=2385009.10..2410009.14 rows=10000017 width=64) (actual time=8620.699..10353.186 rows=10000000 loops=1)

         Sort Key: dim01, dim02

         Sort Method: external merge  Disk: 254472kB

         Buffers: shared hit=370371, temp read=87632 written=87812

         ->  Seq Scan on t_order  (cost=0.00..470371.17 rows=10000017 width=64) (actual time=0.024..939.992 rows=10000000 loops=1)

               Buffers: shared hit=370371

 Planning Time: 0.178 ms

 Execution Time: 11300.646 ms

work_mem满足HashAggregate需要

set work_mem = 10240000;

explain (analyse,buffers)

select count(*)

from t_order

group by dim01, dim02;

 HashAggregate  (cost=545371.30..545771.30 rows=40000 width=72) (actual time=2211.028..2211.127 rows=36 loops=1)

   Group Key: dim01, dim02

   Buffers: shared hit=370371

   ->  Seq Scan on t_order  (cost=0.00..470371.17 rows=10000017 width=64) (actual time=0.013..606.175 rows=10000000 loops=1)

         Buffers: shared hit=370371

 Planning Time: 0.127 ms

 Execution Time: 2212.227 ms

work_mem较小数值，收集表的单列统计信息

有了更准确的列值统计信息，即使多个分组列，查询计划也可以使用HashAggregate。

set work_mem = 10240;

analyse t_order;

explain analyse

select count(*), sum(amount)

from t_order

group by dim01 , dim02 , dim03, dim04 , dim05;

 HashAggregate  (cost=645371.47..645468.67 rows=7776 width=80) (actual time=5684.939..5686.491 rows=7776 loops=1)

   Group Key: dim01, dim02, dim03, dim04, dim05

   ->  Seq Scan on t_order  (cost=0.00..470371.17 rows=10000017 width=46) (actual time=0.015..666.171 rows=10000000 loops=1)

 Planning Time: 0.177 ms

 Execution Time: 5687.117 ms

当分组列过多，优化器就不在选择HashAggregate。

explain analyse

select count(*), sum(amount)

from t_order

group by dim01, dim02, dim03, dim04, dim05, dim06 ;

 GroupAggregate  (cost=2316647.10..2542230.68 rows=46656 width=88) (actual time=47971.812..58032.974 rows=46656 loops=1)

   Group Key: dim01, dim02, dim03, dim04, dim05, dim06

   ->  Sort  (cost=2316647.10..2341647.14 rows=10000017 width=54) (actual time=47971.777..55725.887 rows=10000000 loops=1)

         Sort Key: dim01, dim02, dim03, dim04, dim05, dim06

         Sort Method: external merge  Disk: 635984kB

         ->  Seq Scan on t_order  (cost=0.00..470371.17 rows=10000017 width=54) (actual time=0.014..2853.353 rows=10000000 loops=1)

 Planning Time: 0.158 ms

 Execution Time: 58071.107 ms

五、扩展统计-多列统计信息

创建扩展统计对象，可以精确的获取精确的多列重复值，优化器选择了性能更好的HashAggregate。

create statistics t_order_01 (ndistinct ) on dim01 , dim02 , dim03, dim04 , dim05 , dim06 from t_order;

analyse t_order;

explain analyse

select count(*), sum(amount)

from t_order

group by dim01 , dim02 , dim03, dim04 , dim05 , dim06;

 HashAggregate  (cost=670365.21..670792.91 rows=34216 width=88) (actual time=6810.690..6822.648 rows=46656 loops=1)

   Group Key: dim01, dim02, dim03, dim04, dim05, dim06

   ->  Seq Scan on t_order  (cost=0.00..470369.07 rows=9999807 width=54) (actual time=0.008..693.920 rows=10000000 loops=1)

 Planning Time: 0.332 ms

 Execution Time: 6824.820 ms

1、扩展统计-查看信息

 select * from pg_statistic_ext where stxname='t_order_01';

-[ RECORD 1 ]+----------------

oid          | 670835

stxrelid     | 670816

stxname      | t_order_01

stxnamespace | 18629

stxowner     | 16384

stxkeys      | 2 3 4 5 6 7

stxkind      | {d}

stxkeys列值，对应“ dim01 , dim02 , dim03, dim04 , dim05 , dim06”等列的序号。



kingbase=# select stxname,stxdndistinct from pg_statistic_ext_data , pg_statistic_ext where  stxoid  = oid and stxname='t_order_01'; 

stxname       | t_order_01

stxdndistinct | {"2, 3": 36, "2, 4": 36, ... "3, 4": 36, ... "6, 7": 36, "2, 3, 4": 216, "2, 3, 5": 216, ... "5, 6, 7": 216, "2, 3, 4, 5": 1296, ... "2, 3, 4, 5, 6, 7": 827260}

stxdndistinct列值，是多列的所有组合数之集合，sum(C(n,[2-n]))个单元。如果n值过大，表分析用时就会非常大。

2、扩展统计-限制

扩展统计对象限制了统计元素的个数，不能超过8个。分析表的用时，随统计元素的数量，而加速增长。



create statistics t_order_01 (ndistinct ) on dim01 , dim02 , dim03, dim04 , dim05 , dim06,dim07,dim08,dim09 from t_order;

错误:  在一个统计信息中不能使用超过 8 个字段

分析用时统计表

元素数量	Time	增长%
2	649 ms
3	702 ms	8
4	872 ms	24
5	1338 ms	53
6	2452 ms	83
7	5185 ms	111
8	11659 ms	125

3、扩展统计-超限

如果在不能增加内存参数work_mem的数值，分组列又超出8个列，这样的情况需要下面的方法，可以绕过限制。

建立数据表的样本数据表

创建1行的样本数据表。

create table t_order_mini

as

    select * from t_order limit 1;

创建扩展统计对象

数据表和样本表关闭autovacuum，建立扩展统计对象，并修改stxkeys列值。

--关闭表的autovacuum属性

ALTER TABLE t_order SET (autovacuum_enabled = false, toast.autovacuum_enabled = false);

ALTER TABLE t_order_mini SET (autovacuum_enabled = false, toast.autovacuum_enabled = false);

--建立扩展统计对象

create statistics t_order_sta (ndistinct ) on

    dim01 , dim02

from t_order_mini;

create statistics t_order_mini_sta (ndistinct ) on

    dim01 , dim02

from t_order_mini;

--手工修改stxkeys列值，用分组列的attnum值。

update pg_statistic_ext

set stxkeys= (select attnums::int2vector

              from (select (string_agg(attnum::text, ' ')) as attnums

                    from pg_attribute

                    where attrelid = 't_order'::regclass

                      and attname in (

                                      'dim01', 'dim02', 'dim03', 'dim04', 'dim05', 'dim06', 'dim07', 'dim08',

                                      'dim09', 'dim10', 'dim11', 'dim12', 'dim13', 'dim14', 'dim15', 'dim16',

                                      'dim17', 'dim18', 'dim19', 'dim20'

                        )

                    order by attnum) t)

where stxname in ('t_order_sta', 't_order_mini_sta');

分析样本表

analyze t_order_mini ;

ANALYZE

select e.stxname, length(d.stxdndistinct), substr(stxdndistinct, 1, 100) stxdndistinct

from pg_statistic_ext_data as d, pg_statistic_ext as e

where d.stxoid = e.oid and e.stxname = 't_order_mini_sta';

-[ RECORD 1 ]-+-----------------------------------------------------------------------------------------------------

stxname       | t_order_mini_sta

length        | 90177350

stxdndistinct | {"2, 3": 1, "2, 4": 1, "2, 5": 1, "2, 6": 1, "2, 7": 1, "2, 8": 1, "2, 9": 1, "2, 10": 1, "2, 11": 1

因数据库线程分配内存上限是 1 GB，以及物理内存的限制，会有以下错误信息。建议统计信息对象包含的列，不要超过20个列。

analyze t_order_mini ;

错误:  invalid memory alloc request size 2147483216

analyze t_order_mini ;

服务器意外地关闭了联接

        这种现象通常意味着服务器在处理请求之前，或者正在处理请求的时候意外中止

更新数据表的扩展统计值

用样本表的扩展统计值，更新数据表的扩展统计值。

update pg_statistic_ext_data

set stxdndistinct=(select d.stxdndistinct

                   from pg_statistic_ext_data as d,

                        pg_statistic_ext as e

                   where d.stxoid = e.oid

                     and e.stxname = 't_order_mini_sta')

from pg_statistic_ext as e

where stxoid = e.oid

  and e.stxname = 't_order_sta';

  select e.stxname, length(d.stxdndistinct), substr(stxdndistinct, 1, 100) stxdndistinct

from pg_statistic_ext_data as d, pg_statistic_ext as e

where d.stxoid = e.oid and e.stxname = 't_order_sta';

-[ RECORD 1 ]-+-----------------------------------------------------------------------------------------------------

stxname       | t_order_sta

length        | 90177350

stxdndistinct | {"2, 3": 1, "2, 4": 1, "2, 5": 1, "2, 6": 1, "2, 7": 1, "2, 8": 1, "2, 9": 1, "2, 10": 1, "2, 11": 1

查询数据表

执行之前的查询，分组列可以多达20个，执行计划使用HashAggregate了，OK！

explain (analyse,buffers )

select distinct dim01, dim02, dim03, dim04, dim05, dim06, dim07, dim08, dim09, dim10

, dim11, dim12, dim13, dim14, dim15, dim16, dim17, dim18, dim19, dim20

from t_order;

 HashAggregate  (cost=970372.02..970372.03 rows=1 width=160) (actual time=9036.895..12889.744 rows=10000000 loops=1)

   Group Key: dim01, dim02, dim03, dim04, dim05, dim06, dim07, dim08, dim09, dim10, dim11, dim12, dim13, dim14, dim15, dim16, dim17, dim18, dim19, dim20

   Buffers: shared hit=370371

   ->  Seq Scan on t_order  (cost=0.00..470371.17 rows=10000017 width=160) (actual time=0.010..699.935 rows=10000000 loops=1)

         Buffers: shared hit=370371

 Planning Time: 124.327 ms

 Execution Time: 13232.085 ms

查询数据表-加强

优化器获取的估值rows=1，所以，继续增加几个分组列，也可以使用HashAggregate。

explain (analyse,buffers )

select distinct dim01, dim02, dim03, dim04, dim05, dim06, dim07, dim08, dim09, dim10

, dim11, dim12, dim13, dim14, dim15, dim16, dim17, dim18, dim19, dim20

, dim21, dim22, dim23, dim24, dim25, dim26

from t_order;

 HashAggregate  (cost=1120372.27..1120450.03 rows=7776 width=208) (actual time=11161.952..15726.475 rows=10000000 loops=1)

   Group Key: dim01, dim02, dim03, dim04, dim05, dim06, dim07, dim08, dim09, dim10, dim11, dim12, dim13, dim14, dim15, dim16, dim17, dim18, dim19, dim20, dim21, dim22, dim23, dim24, dim25, dim26

   Buffers: shared hit=370371

   ->  Seq Scan on t_order  (cost=0.00..470371.17 rows=10000017 width=208) (actual time=0.009..702.706 rows=10000000 loops=1)

         Buffers: shared hit=370371

 Planning Time: 114.548 ms

 Execution Time: 16070.944 ms

4、扩展统计-超限加强

如果分组的列数量非常大，可以将分组的列，分成若干个局部。每个局部单独建立扩展统计对象，然后参照“超限法”，产生统计值。这样可以满足内存允许下的不限数量的维度列的分组聚合需求。

--为数据表t_order创建了三个扩展统计对象，使用“超限法”更新统计值

explain (analyse,buffers )

select distinct dim01, dim02, dim03, dim04, dim05, dim06, dim07, dim08, dim09, dim10

, dim11, dim12, dim13, dim14, dim15, dim16, dim17, dim18, dim19, dim20

, dim21, dim22, dim23, dim24, dim25, dim26, dim17, dim28, dim29, dim30

from t_order;

 HashAggregate  (cost=1220372.45..1220372.46 rows=1 width=240) (actual time=12720.356..17644.089 rows=10000000 loops=1)

   Group Key: dim01, dim02, dim03, dim04, dim05, dim06, dim07, dim08, dim09, dim10, dim11, dim12, dim13, dim14, dim15, dim16, dim17, dim18, dim19, dim20, dim21, dim22, dim23, dim24, dim25, dim26, dim17, dim28, dim29, dim30

   Buffers: shared hit=370371

   ->  Seq Scan on t_order  (cost=0.00..470371.17 rows=10000017 width=240) (actual time=0.013..3613.716 rows=10000000 loops=1)

         Buffers: shared hit=370371

 Planning Time: 1.353 ms

 Execution Time: 17917.662 ms

六、可行性依据

突破扩展统计的数量限制

这个数量限制，只是在创建对象时起作用，他的目的，就是避免随列数增长，分析用时则指数增长。

建议通过语句的语法，可以实现优化组合的方式，减少组合的可能性，比如使用“（）”来合并多列为一个单元。

统计信息值重复使用

由于统计元素的组合后的单元数过大，可以利用空闲时间，将常用的组合，预先计算。将计算结果存储在用户表，如果有新的数据表或分区产生，可以使用这个办法快速处理。

七、最后的话

优化器的作用是根据成本估值公式的计算结果，选择最佳的执行计划。公式需要以单利和多列统计数据。当缺失这些数据信息，优化器就会得出保守的执行计划，从而影响性能。希望今后，优化器可以推出激进模式，并可以固化查询的执行计划。