KingbaseES 的 Lateral 连接

一、什么是 Lateral 连接

根据文档，它的作用是：

LATERAL 关键字可以位于子 SELECT FROM 项之前。这允许子 SELECT 引用 FROM 列表中出现在它之前的 FROM 项的列。（没有 LATERAL，每个子 SELECT 都是独立评估的，因此不能交叉引用任何其他 FROM 项。）

FROM中出现的表函数，前面也可以加上关键字Lateral，但对于函数来说，Lateral是可选的；FROM在任何情况下，函数的参数都可以引用对前“表”的列。

基本上，它的作用是对于主“表”中的每一行，它使用主选择行作为参数来计算子查询。与for循环遍历查询返回的行，非常相似。

二、Lateral 的用途

1、语法糖

语法糖(Syntactic sugar)，指计算机语言中添加的某种语法，这种语法对语言的功能并没有影响，但是更方便程序员使用。通常来说使用语法糖能够增加程序的可读性，从而减少程序代码出错的机会。

横向连接允许您重用计算列，使您的查询整洁易读。让我们通过一起重写一个糟糕的查询来了解横向连接。

select

    (pledged / fx_rate) as pledged_usd,

    (pledged / fx_rate) / backers_count as avg_pledge_usd,

    (goal / fx_rate) - (pledged / fx_rate) as amt_from_goal,

    (deadline - launched_at) / 86400.00 as duration,

    ((goal / fx_rate) - (pledged / fx_rate)) / ((deadline - launched_at) / 86400.00) as usd_needed_daily

from kickstarter_data;

使用横向连接，可以只定义一次计算列，然后可以在查询的其他部分引用这些列值。

select

    pledged_usd,

    avg_pledge_usd,

    amt_from_goal,

    duration,

    (usd_from_goal / duration) as usd_needed_daily

from kickstarter_data,

    lateral (select pledged / fx_rate as pledged_usd) pu,

    lateral (select pledged_usd / backers_count as avg_pledge_usd) apu,

    lateral (select goal / fx_rate as goal_usd) gu,

    lateral (select goal_usd - pledged_usd as usd_from_goal) ufg,

    lateral (select (deadline - launched_at)/86400.00 as duration) dr;

2、子查询增强模式

lateral连接更像是相关子查询，而不是普通子查询，因为lateral连接右侧的表达式对其左侧的每一行进行比较 - 就像相关子查询一样 - 而普通子查询只根据关联条件比较一次。（查询计划器有方法，可以优化两者的性能。）

另外，请记住，相关子查询的等价物是LEFT JOIN lateral... ON true：

lateral 和交叉应用是一回事。

参考以下查询。

Select A.*

, (Select min(B.val) Column1 from B where B.Fk1 = A.PK  )

, (Select max(B.val) Column2 from B where B.Fk1 = A.PK  )

FROM A ;

在这种情况下，可以使用 LATERAL 。

Select A.*

     , x.Column1

     , x.Column2

FROM A LEFT JOIN LATERAL

   (Select  min(B.val) Column1, max(B.val) Column2, B.Fk1 from B where B.Fk1 = A.PK ) x ON true;

在此查询中，由于条件子句，不能使用普通连接，可以使用 lateral 或交叉应用。

3、避免重复执行子查询或函数

用户行为A的每行记录的optdate值，与用户行为B表的最大 optdate值并列表格。

（1）数据准备

用户字典表，用户行为A表，用户行为B表

create table users (user_id int PRIMARY KEY, username text);

create table optA (opta_id int PRIMARY KEY, user_id int, optdate timestamp , note text) ;

create index optA_i1 on optA(user_id,optdate desc );

create table optB (optb_id int PRIMARY KEY, user_id int, optdate timestamp , note text) ;

create index optB_i1 on optB(user_id,optdate desc );

（2）普通语句

每行都执行用户行为B表的查询语句，消耗很多CPU计算时间。

select optA.*,(select max(optdate) from optB where optB.user_id=optA.user_id)

from optA

where opta.user_id=88;

                                           QUERY PLAN

------------------------------------------------------------------------------------------------

 Bitmap Heap Scan on opta  (cost=4.19..33.58 rows=5 width=56)

   Recheck Cond: (user_id = 88)

   ->  Bitmap Index Scan on opta_i1  (cost=0.00..4.19 rows=5 width=0)

         Index Cond: (user_id = 88)

   SubPlan 2

     ->  Result  (cost=4.17..4.18 rows=1 width=8)

           InitPlan 1 (returns $1)

             ->  Limit  (cost=0.15..4.17 rows=1 width=8)

                   ->  Index Only Scan using optb_i1 on optb  (cost=0.15..20.25 rows=5 width=8)

                         Index Cond: ((user_id = opta.user_id) AND (optdate IS NOT NULL))

(10 rows)

（3）基本优化

使用子查询，实现避免重复执行用户行为B表的查询语句

select optA.*, optB.*

from optA

         join (select user_id, max(optdate) from optB group by user_id) optB on optB.user_id = optA.user_id

where opta.user_id = 88;

                                       QUERY PLAN

----------------------------------------------------------------------------------------

 Nested Loop  (cost=8.38..25.78 rows=25 width=60)

   ->  Bitmap Heap Scan on opta  (cost=4.19..12.66 rows=5 width=48)

         Recheck Cond: (user_id = 88)

         ->  Bitmap Index Scan on opta_i1  (cost=0.00..4.19 rows=5 width=0)

               Index Cond: (user_id = 88)

   ->  Materialize  (cost=4.19..12.81 rows=5 width=12)

         ->  GroupAggregate  (cost=4.19..12.74 rows=5 width=12)

               Group Key: optb.user_id

               ->  Bitmap Heap Scan on optb  (cost=4.19..12.66 rows=5 width=12)

                     Recheck Cond: (user_id = 88)

                     ->  Bitmap Index Scan on optb_i1  (cost=0.00..4.19 rows=5 width=0)

                           Index Cond: (user_id = 88)

（4）LATERAL 连接

依然有逐行执行子查询的现象

select optA.*, optB.*

from optA

         cross join lateral (select max(optdate)

                             from optB

                             where optB.user_id = optA.user_id) optB

where opta.user_id = 88;

                                          QUERY PLAN

----------------------------------------------------------------------------------------------

 Nested Loop  (cost=8.36..33.68 rows=5 width=56)

   ->  Bitmap Heap Scan on opta  (cost=4.19..12.66 rows=5 width=48)

         Recheck Cond: (user_id = 88)

         ->  Bitmap Index Scan on opta_i1  (cost=0.00..4.19 rows=5 width=0)

               Index Cond: (user_id = 88)

   ->  Result  (cost=4.17..4.18 rows=1 width=8)

         InitPlan 1 (returns $1)

           ->  Limit  (cost=0.15..4.17 rows=1 width=8)

                 ->  Index Only Scan using optb_i1 on optb  (cost=0.15..20.25 rows=5 width=8)

                       Index Cond: ((user_id = $0) AND (optdate IS NOT NULL))

(10 rows)

（5）使用字典表和 LATERAL 连接

在两个事实表之间，使用字典表作为过度，可以避免重复执行子查询

select optA.*, optB.*

from optA

    join users on users.user_id=optA.user_id

         cross join lateral (select max(optdate)

                             from optB

                             where optB.user_id = users.user_id) optB

where opta.user_id = 88;

                                             QUERY PLAN

----------------------------------------------------------------------------------------------------

 Nested Loop  (cost=8.52..25.09 rows=5 width=56)

   ->  Nested Loop  (cost=4.33..12.37 rows=1 width=12)

         ->  Index Only Scan using users_pkey on users  (cost=0.15..8.17 rows=1 width=4)

               Index Cond: (user_id = 88)

         ->  Result  (cost=4.17..4.18 rows=1 width=8)

               InitPlan 1 (returns $1)

                 ->  Limit  (cost=0.15..4.17 rows=1 width=8)

                       ->  Index Only Scan using optb_i1 on optb  (cost=0.15..20.25 rows=5 width=8)

                             Index Cond: ((user_id = $0) AND (optdate IS NOT NULL))

   ->  Bitmap Heap Scan on opta  (cost=4.19..12.66 rows=5 width=48)

         Recheck Cond: (user_id = 88)

         ->  Bitmap Index Scan on opta_i1  (cost=0.00..4.19 rows=5 width=0)

               Index Cond: (user_id = 88)

(13 rows)

4、CTE或视图，含有分组自居和聚合函数

（1）过滤列不是分组列

这种情况下，分组列的索引没有 Index Cond 定位数据，只是遍历索引行。扫描记录数等于满足条件的数据行，以及之前的数据行，执行时长取决于分组列值处于索引行的位置。

with optA as (select user_id, max(optdate) max_dt

              from optA

              group by user_id)

select users.*, optA.*

from users

         cross join lateral (select *

                             from optA

                             where optA.user_id = users.user_id) optA

where users.username = 'ABC';

Nested Loop  (cost=0.85..60792.44 rows=9 width=32)

  Join Filter: (optB.user_id = users.user_id)

  ->  Index Scan using users_username on users

        Index Cond: (opta.username = 'ABC')

  ->  GroupAggregate  (cost=0.42..58714.70 rows=91969 width=20)

        Group Key: optA.user_id

        ->  Index Scan using optA_user_id on optA  (cost=0.42..47795.01 rows=1000000 width=8)

（2）连接条件使用any (subquery)

分组列的索引，通过 Index Cond 定位数据，扫描记录数等于满足条件的数据行。

with optA as (select user_id, max(optdate) max_dt

              from optA

              group by user_id)

select users.*, optA.*

from users

         cross join lateral (select *

                             from optA

                             where optA.user_id = any (select users.user_id)) optA

where users.username = 'ABC';

Nested Loop  (cost=0.85..57.61 rows=11 width=32)

  ->  Index Scan usingusers_username on users

        Index Cond: (opta.username = 'ABC')

  ->  GroupAggregate  (cost=0.42..48.83 rows=11 width=20)

        Group Key: optA.user_id

        ->  Index Scan using optA_user_id on optA  (cost=0.42..48.61 rows=11 width=8)

              Index Cond: (optB.user_id = users.user_id)

5、分组查询，获得每个用户的最新时间，或者最新行

（1）数据准备

用户日志表，包含user_id 和 log_date

create table log

(

    log_date timestamp,

    user_id  int,

    note     text

);

insert into log

select now() - ((100000 * random())::numeric(20, 3)::text)::interval log_date,

       (random() * 1000000)::int % 100                               id,

       md5(id)

from generate_series(1, 100000) id

order by random();

-- 索引列与排序列的次序和模式，保持一致

create index log_i1 on log (user_id, log_date DESC NULLS LAST);

（2）普通语句

顺序扫描log表或复合条件的所有记录，通过聚合函数max和窗口函数row_number

explain analyse

select user_id, max(log_date)

from log

group by user_id;

explain analyse

select *

from (select *, row_number() over (partition by user_id order by log_date desc ) sn from log) l

where sn = 1;

（3）递归 CTE 语句

方便检索单列或整行，使用表格的整行类型。仅读取每个用户的最新记录，使用的总数据块数和执行时长，远少于普通语句。

WITH RECURSIVE cte AS (

    ( -- 需要括号

        SELECT l AS my_row -- 整行记录

        FROM log l

        ORDER BY user_id, log_date DESC NULLS LAST

        LIMIT 1

    )

    UNION ALL

    SELECT (SELECT l -- 整行记录

            FROM log l

            WHERE l.user_id > (c.my_row).user_id

            ORDER BY l.user_id, l.log_date DESC NULLS LAST

            LIMIT 1)

    FROM cte c

    WHERE (c.my_row).user_id IS NOT NULL

)

SELECT (my_row).* -- 分解行

FROM cte

WHERE (my_row).user_id IS NOT NULL

ORDER BY (my_row).user_id;

（4）使用 LATERAL 连接的递归 CTE 语句

递归 CTE 语句，逻辑复杂不易理解，而且每行记录的列，有聚合分解计算。

使用LATERAL 连接，不仅语句易读，而且可以节省10%的CPU计算时长。

WITH RECURSIVE cte AS (

    ( -- 需要括号

        SELECT *

        FROM log

        WHERE 1 = 1

        ORDER BY user_id, log_date DESC NULLS LAST

        LIMIT 1

    )

    UNION ALL

    SELECT l.*

    FROM cte c

             CROSS JOIN LATERAL (

        SELECT l.*

        FROM log l

        WHERE l.user_id > c.user_id -- lateral 参照条件

        ORDER BY l.user_id, l.log_date DESC NULLS LAST

        LIMIT 1

        ) l

)

    TABLE cte

        ORDER BY user_id;

（5）users字典表和 LATERAL 连接

只要user_id保证每个相关项恰好有一行，表格布局就几乎无关紧要，理想情况下，表格的物理排序与log表格同步。

查询语句，包含字典表和 LATERAL 连接。由于使用更简洁的查询树，执行时长较递归CTE节省10%。

CREATE TABLE users (

   user_id  INT PRIMARY KEY

 , username text NOT NULL

);

insert into users select generate_series(1,100) id, md5(id) ;

SELECT u.user_id, l.*

FROM users u

         cross join  LATERAL (

    SELECT l.*

    FROM log l

    WHERE l.user_id = u.user_id -- lateral参照

    ORDER BY l.log_date DESC NULLS LAST

    LIMIT 1

    ) l ;

（6）不使用 LATERAL 连接的 select 子查询

拥有users字典表时，也可以不依靠 LATERAL 连接，达到不读取多余记录的查询语句。

由于记录行，分解成若干列，需要CPU计算用时，比 LATERAL 连接多用时长10%，且与列的数量正相关。

SELECT  (combo1).*

FROM (

   SELECT u.user_id

        , (SELECT (l.*)::log

           FROM   log l

           WHERE  l.user_id = u.user_id

           ORDER  BY l.log_date DESC NULLS LAST

           LIMIT  1) AS combo1

   FROM   users u

   ) sub;

三、Lateral 的限制

数据类型转换，cast vs :: 。

这两种语法格式，都是“显式类型转换”，完全相同。在SQL代码中的某些特殊位置的表达式，只允许使用函数式表示法。

-- 合法一

SELECT elem[1], elem[2]

FROM   ( VALUES ('1,2'::TEXT) ) AS q(arr),

       LATERAL CAST(String_To_Array(q.arr, ',') AS INT[]) AS elem

;

-- 合法二

SELECT elem[1], elem[2]

FROM   ( VALUES ('1,2'::TEXT) ) AS q(arr),

       LATERAL  (select String_To_Array(q.arr, ',')::INT[] AS elem) as t

;

-- 非法

SELECT elem[1], elem[2]

FROM   ( VALUES ('1,2'::TEXT) ) AS q(arr),

       LATERAL String_To_Array(q.arr, ',')::int[] AS elem ；

错误:  语法错误 在 "::" 或附近的

第3行       LATERAL String_To_Array(q.arr, ',')::int[] AS elem;

另一种SQL语句，CREATE INDEX 语句，也会触发相同的错误信息。如果使用cast函数和数据类型名函数，则是合法的 create index 语句。
create index t02_i1 on t02 (id::int);

错误:  语法错误 在 "::" 或附近的

第1行create index t02_i2 on t02 (id::int);

-- 合法改写 CREATE INDEX 语句

create index t02_i1 on t02 ((id::int));