PostgreSQL 绑定变量窥探

今天我们要探讨的是 custom执行计划和通用执行计划。这一技术在 Oracle中被称为绑定变量窥视。但 Kingbase中并没有这样的定义，更严格地说，Kingbase叫做custom执行计划和通用执行计划。

什么是custom执行计划，什么是通用执行计划，我们先来看一个例子，我创建了一个100011行的表，其中有两列分别为 id、 name。在name列就2种类型的值，一种值为“aaa”，有整整100000行, 而值为bbb列的仅有11行。这就是我们常说的数据倾斜。在oracle数据库中，配合绑定变量窥视我们常常需要收集倾斜列的直方图。

以下测试基于版本：

KingbaseES V008R006C005B0041 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit

create table a(id numeric,name varchar(40));

insert into a select i, 'aaa' from generate_series (1,100000) i;

insert into a select i, 'bbb' from generate_series (100001,100011) i;

create index idx_a1 on a(name);

analyze a;

下一步是使用 prepare语句。利用该方法可以避免对语句反复解析。这个功能类似oracle 的绑定变量，（一次硬解析后在library cache产生的执行计划可为以后sql通用。避免多次硬解析，这样找到相同的执行计划planhash value叫做软解析。当然还有软软解析，这里略过。）

test=# prepare test_stmt as select * from a where name = $1;

PREPARE

 select * from pg_prepared_statements;

我们执行如下语句，连续6次都查询name为'aaa'的数据。注意是6次。

test=# explain (analyze) execute test_stmt ('aaa');

                                               QUERY PLAN

---------------------------------------------------------------------------------------------------------

 Seq Scan on a  (cost=0.00..1794.14 rows=99994 width=10) (actual time=0.009..25.862 rows=100000 loops=1)

   Filter: ((name)::text = 'aaa'::text)

   Rows Removed by Filter: 11

 Planning Time: 0.217 ms

 Execution Time: 34.710 ms

(5 rows)

test=# explain (analyze) execute test_stmt ('aaa');

                                               QUERY PLAN

---------------------------------------------------------------------------------------------------------

 Seq Scan on a  (cost=0.00..1794.14 rows=99994 width=10) (actual time=0.009..16.401 rows=100000 loops=1)

   Filter: ((name)::text = 'aaa'::text)

   Rows Removed by Filter: 11

 Planning Time: 0.073 ms

 Execution Time: 23.340 ms

(5 rows)

test=# explain (analyze) execute test_stmt ('aaa');

                                               QUERY PLAN

---------------------------------------------------------------------------------------------------------

 Seq Scan on a  (cost=0.00..1794.14 rows=99994 width=10) (actual time=0.009..30.001 rows=100000 loops=1)

   Filter: ((name)::text = 'aaa'::text)

   Rows Removed by Filter: 11

 Planning Time: 0.093 ms

 Execution Time: 39.383 ms

(5 rows)

test=# explain (analyze) execute test_stmt ('aaa');

                                               QUERY PLAN

---------------------------------------------------------------------------------------------------------

 Seq Scan on a  (cost=0.00..1794.14 rows=99994 width=10) (actual time=0.009..23.365 rows=100000 loops=1)

   Filter: ((name)::text = 'aaa'::text)

   Rows Removed by Filter: 11

 Planning Time: 0.073 ms

 Execution Time: 32.397 ms

(5 rows)

test=# explain (analyze) execute test_stmt ('aaa');

                                               QUERY PLAN

---------------------------------------------------------------------------------------------------------

 Seq Scan on a  (cost=0.00..1794.14 rows=99994 width=10) (actual time=0.009..19.287 rows=100000 loops=1)

   Filter: ((name)::text = 'aaa'::text)

   Rows Removed by Filter: 11

 Planning Time: 0.099 ms

 Execution Time: 27.462 ms

(5 rows)

test=# explain (analyze) execute test_stmt ('aaa');

                                                       QUERY PLAN

------------------------------------------------------------------------------------------------------------------------

 Index Scan using idx_a1 on a  (cost=0.42..1710.52 rows=50006 width=10) (actual time=0.082..35.540 rows=100000 loops=1)

   Index Cond: ((name)::text = $1)

 Planning Time: 0.114 ms

 Execution Time: 45.546 ms

(4 rows)

由于 aaa占用了该表的大部分数据，因此优化器选择使用全表扫描，这是优化器的算法决定的，这也存在合理性。在第六次的时候，请注意 Filter部分，(name)::text = 'aaa'::text变为 text=$1。此时优化器将生成通用执行计划，并使用绑定变量。那么之前的5次则被称为 custom执行计划。为什么第六次才生成通用执行计划？我们可以在 PostgreSQL的 plancache. c源码中找到说明：

The logic for choosing generic or custom plans is in choose_custom_plan

在choose_custom_plan函数里我们可以看到/* Generate costom plans until we have done at least 5 (arbitrary)*/ if (planaource->num_custom_plans < 5) return true;

请注意，这里的限定值小于5次，返回 true，选择 custom执行计划，而大于5次之后，则选择通用执行计划。因此，5次之后执行计划就会固定。为什么第六次使用通用执行计划，执行计划却改为索引扫描的方式？实际上这和一个参数有关plan_cache_mode。目前查看参数值时auto。

test=# show plan_cache_mode;

 plan_cache_mode

-----------------

 auto

(1 row)

在参数是auto的前提下，不管我执行aaa或bbb的列值，执行计划都是一样，执行计划固定了。如果每次不管变量值怎么变化，都选择索引扫描方式，显然这不是我们想要的。因为数据倾斜，如果执行计划不变，那么是不明智的，会出现低效解析行为。

如下，使用通用执行计划后，我们关注不管索引扫描还是全表扫描，预估cost值是50006，有意思的是这个值是实际rows的一半。从第六次执行计划开始，这个cost就没再变过，显然这是不合理的。当然有可能优化器认为这种算法对于不同的扫描方式对应的Execution Time差的不是很多，所以固定执行计划为通用执行计划。

还有一个关键是使用通用执行计划后Planning Time很小，这是否说明了”软解析的功能呢！“生成执行计划时间大大减少。



test=# explain (analyze) execute test_stmt ('bbb');

                                                    QUERY PLAN

-------------------------------------------------------------------------------------------------------------------

 Index Scan using idx_a1 on a  (cost=0.42..1710.52 rows=50006 width=10) (actual time=0.021..0.024 rows=11 loops=1)

   Index Cond: ((name)::text = $1)

 Planning Time: 0.015 ms

 Execution Time: 0.048 ms

(4 rows)

test=# explain (analyze) execute test_stmt ('bbb');

                                                    QUERY PLAN

-------------------------------------------------------------------------------------------------------------------

 Index Scan using idx_a1 on a  (cost=0.42..1710.52 rows=50006 width=10) (actual time=0.020..0.022 rows=11 loops=1)

   Index Cond: ((name)::text = $1)

 Planning Time: 0.014 ms

 Execution Time: 0.041 ms

(4 rows)

test=# explain (analyze) execute test_stmt ('aaa');

                                                       QUERY PLAN

------------------------------------------------------------------------------------------------------------------------

 Index Scan using idx_a1 on a  (cost=0.42..1710.52 rows=50006 width=10) (actual time=0.032..23.592 rows=100000 loops=1)

   Index Cond: ((name)::text = $1)

 Planning Time: 0.013 ms

 Execution Time: 31.333 ms

(4 rows)

设置 plan_cache_mode=force_custom_plan

继续测试另外一种情况，将plan_cache_mode设置为force_custom_plan。可以看到执行计划会根据绑定变量的值的分布进行变化，这种情况执行计划是合理的。但是代价是每次执行都要重新解析语句，我们知道在oracle里这叫硬解析，都听说过一句话，硬解析是万恶之源！对应的在Kingbase里数据倾斜，谓词条件经常变化，最好使用custom执行计划。

set plan_cache_mode=force_custom_plan;

test=# explain (analyze) execute test_stmt ('bbb');

                                                 QUERY PLAN

-------------------------------------------------------------------------------------------------------------

 Index Scan using idx_a1 on a  (cost=0.42..8.65 rows=13 width=10) (actual time=0.020..0.022 rows=11 loops=1)

   Index Cond: ((name)::text = 'bbb'::text)

 Planning Time: 0.079 ms

 Execution Time: 0.036 ms

(4 rows)

test=# explain (analyze) execute test_stmt ('aaa');

                                               QUERY PLAN

---------------------------------------------------------------------------------------------------------

 Seq Scan on a  (cost=0.00..1794.14 rows=99998 width=10) (actual time=0.010..16.897 rows=100000 loops=1)

   Filter: ((name)::text = 'aaa'::text)

   Rows Removed by Filter: 11

 Planning Time: 0.077 ms

 Execution Time: 24.020 ms

(5 rows)

设置 plan_cache_mode=force_generic_plan

可以看到，这种情况下，执行计划就被固定了。和最开始执行到第六次的执行计划一样，不管条件怎么变化，优化器都采用了通用执行计划。

test=# set plan_cache_mode =force_generic_plan ;

test=# explain (analyze) execute test_stmt ('bbb');

                                                    QUERY PLAN

-------------------------------------------------------------------------------------------------------------------

 Index Scan using idx_a1 on a  (cost=0.42..1710.52 rows=50006 width=10) (actual time=0.022..0.024 rows=11 loops=1)

   Index Cond: ((name)::text = $1)

 Planning Time: 0.016 ms

 Execution Time: 0.044 ms

(4 rows)

test=# explain (analyze) execute test_stmt ('bbb');

                                                    QUERY PLAN

-------------------------------------------------------------------------------------------------------------------

 Index Scan using idx_a1 on a  (cost=0.42..1710.52 rows=50006 width=10) (actual time=0.032..0.035 rows=11 loops=1)

   Index Cond: ((name)::text = $1)

 Planning Time: 0.015 ms

 Execution Time: 0.055 ms

(4 rows)

test=# explain (analyze) execute test_stmt ('aaa');

                                                       QUERY PLAN

------------------------------------------------------------------------------------------------------------------------

 Index Scan using idx_a1 on a  (cost=0.42..1710.52 rows=50006 width=10) (actual time=0.037..23.191 rows=100000 loops=1)

   Index Cond: ((name)::text = $1)

 Planning Time: 0.016 ms

 Execution Time: 30.997 ms

(4 rows)

关闭prepare语句

deallocate all;

结论：

如果在Kingbase中使用prepare语句（类似绑定变量功能），

对于数据分布均匀，且参数经常改变的情况适合使用这个功能。

建议对于数据倾斜的情况，将plan_cache_mode设置为force_custom_plan。或者不用这个功能。

当然在实现任何功能前还是建议进行充分测试。