Group 和 Distinct 列的次序影响查询性能

一、概述
二、work_mem 满足排序情况
- 1、Distinct 语句
- 2、Group by 语句
三、work_mem 不满足排序情况
- 1、Distinct 语句
- 2、Group by 语句
四、总结

一、概述

优化拥有大量的分组和去重列的SQL时，这些排序列的次序，也是可以优化的地方。

测试数据结构

kingbase=# select count(distinct txt1 ) txt1, avg(length(txt1))::int ln1, count(distinct txt3 ) txt3 ,avg(length(txt3))::int ln3 from txt01;

 txt1 | ln1  |  txt3   | ln3

------+------+---------+-----

 1000 | 1000 | 1000000 |  10

(1 行记录)

二、work_mem 满足排序情况

1、Distinct 语句

次序： txt1,txt3

kingbase=# explain (analyse ,buffers ) select /*+ set(work_mem 100MB) */ distinct txt1 ,txt3 from txt01 ;

                                                       QUERY PLAN

------------------------------------------------------------------------------------------------------------------------

 HashAggregate  (cost=269287.33..269687.33 rows=40000 width=64) (actual time=1543.995..1877.527 rows=1000000 loops=1)

   Group Key: txt1, txt3

   Buffers: shared hit=142858

   ->  Seq Scan on txt01  (cost=0.00..227144.22 rows=8428622 width=64) (actual time=0.008..159.858 rows=1000000 loops=1)

         Buffers: shared hit=142858

 Planning Time: 0.081 ms

 Execution Time: 1947.951 ms

(7 行记录)

次序： txt3,txt1

ingbase=# explain (analyse ,buffers ) select /*+ set(work_mem 100MB) */ distinct txt3 ,txt1 from txt01 ;

                                                       QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------

 HashAggregate  (cost=269287.33..269687.33 rows=40000 width=64) (actual time=1596.040..1812.380 rows=1000000 loops=1)

   Group Key: txt3, txt1

   Buffers: shared hit=142858

   ->  Seq Scan on txt01  (cost=0.00..227144.22 rows=8428622 width=64) (actual time=0.007..163.399 rows=1000000 loops=1)

         Buffers: shared hit=142858

 Planning Time: 0.075 ms

 Execution Time: 1884.907 ms

(7 行记录)

2、Group by 语句

次序： txt1,txt3

kingbase=# explain (analyse ,buffers ) select /*+ set(work_mem 100MB) */  txt1 ,txt3 from txt01 group by txt1 ,txt3 ;

                                                       QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------

 HashAggregate  (cost=269287.33..269687.33 rows=40000 width=64) (actual time=1540.948..1875.917 rows=1000000 loops=1)

   Group Key: txt1, txt3

   Buffers: shared hit=142858

   ->  Seq Scan on txt01  (cost=0.00..227144.22 rows=8428622 width=64) (actual time=0.006..160.419 rows=1000000 loops=1)

         Buffers: shared hit=142858

 Planning Time: 0.084 ms

 Execution Time: 1939.103 ms

(7 行记录)

次序： txt3,txt1

kingbase=# explain (analyse ,buffers ) select /*+ set(work_mem 100MB) */ txt1 ,txt3 from txt01 group by txt3 ,txt1 ;

                                                       QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------

 HashAggregate  (cost=269287.33..269687.33 rows=40000 width=64) (actual time=1557.257..1780.662 rows=1000000 loops=1)

   Group Key: txt3, txt1

   Buffers: shared hit=142858

   ->  Seq Scan on txt01  (cost=0.00..227144.22 rows=8428622 width=64) (actual time=0.018..165.221 rows=1000000 loops=1)

         Buffers: shared hit=142858

 Planning Time: 0.330 ms

 Execution Time: 1844.664 ms

(7 行记录)

三、work_mem 不满足排序情况

1、Distinct 语句

次序： txt1,txt3

kingbase=# explain (analyse ,buffers ) select /*+ set(work_mem 1MB) */ distinct txt1 ,txt3 from txt01  ;

                                                          QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------

 Unique  (cost=2464313.08..2527527.74 rows=40000 width=64) (actual time=21031.092..22131.259 rows=1000000 loops=1)

   Buffers: shared hit=142858, temp read=125368 written=125369

   ->  Sort  (cost=2464313.08..2485384.63 rows=8428622 width=64) (actual time=21031.089..22002.850 rows=1000000 loops=1)

         Sort Key: txt1, txt3

         Sort Method: external merge  Disk: 1002944kB

         Buffers: shared hit=142858, temp read=125368 written=125369

         ->  Seq Scan on txt01  (cost=0.00..227144.22 rows=8428622 width=64) (actual time=0.039..272.327 rows=1000000 loops=1)

               Buffers: shared hit=142858

 Planning Time: 0.272 ms

 Execution Time: 23648.185 ms

(10 行记录)

次序： txt3,txt1

kingbase=# explain (analyse ,buffers ) select /*+ set(work_mem 1MB) */ distinct txt3 ,txt1 from txt01  ;

                                                          QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------

 Unique  (cost=2464313.08..2527527.74 rows=40000 width=64) (actual time=4004.641..4367.218 rows=1000000 loops=1)

   Buffers: shared hit=142858, temp read=125491 written=125492

   ->  Sort  (cost=2464313.08..2485384.63 rows=8428622 width=64) (actual time=4004.639..4239.599 rows=1000000 loops=1)

         Sort Key: txt3, txt1

         Sort Method: external merge  Disk: 1003928kB

         Buffers: shared hit=142858, temp read=125491 written=125492

         ->  Seq Scan on txt01  (cost=0.00..227144.22 rows=8428622 width=64) (actual time=0.011..271.572 rows=1000000 loops=1)

               Buffers: shared hit=142858

 Planning Time: 0.086 ms

 Execution Time: 4457.751 ms

(10 行记录)

2、Group by 语句

次序： txt1,txt3

kingbase=# explain (analyse ,buffers ) select /*+ set(work_mem 1MB) */ txt1 ,txt3 from txt01 group by txt1 ,txt3 ;

                                                          QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------

 Group  (cost=2464313.08..2527527.74 rows=40000 width=64) (actual time=21715.770..22796.166 rows=1000000 loops=1)

   Group Key: txt1, txt3

   Buffers: shared hit=142858, temp read=125368 written=125369

   ->  Sort  (cost=2464313.08..2485384.63 rows=8428622 width=64) (actual time=21715.764..22658.413 rows=1000000 loops=1)

         Sort Key: txt1, txt3

         Sort Method: external merge  Disk: 1002944kB

         Buffers: shared hit=142858, temp read=125368 written=125369

         ->  Seq Scan on txt01  (cost=0.00..227144.22 rows=8428622 width=64) (actual time=0.029..271.335 rows=1000000 loops=1)

               Buffers: shared hit=142858

 Planning Time: 0.285 ms

 Execution Time: 25365.012 ms

(11 行记录)

次序： txt3,txt1

kingbase=# explain (analyse ,buffers ) select /*+ set(work_mem 1MB) */ txt1 ,txt3 from txt01 group by txt3 ,txt1 ;

                                                          QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------

 Group  (cost=2464313.08..2527527.74 rows=40000 width=64) (actual time=4156.296..4541.315 rows=1000000 loops=1)

   Group Key: txt3, txt1

   Buffers: shared hit=142858, temp read=125368 written=125369

   ->  Sort  (cost=2464313.08..2485384.63 rows=8428622 width=64) (actual time=4156.291..4402.265 rows=1000000 loops=1)

         Sort Key: txt3, txt1

         Sort Method: external merge  Disk: 1002944kB

         Buffers: shared hit=142858, temp read=125368 written=125369

         ->  Seq Scan on txt01  (cost=0.00..227144.22 rows=8428622 width=64) (actual time=0.008..270.872 rows=1000000 loops=1)

               Buffers: shared hit=142858

 Planning Time: 0.081 ms

 Execution Time: 4632.567 ms

(11 行记录)

四、总结

次序	txt1,txt1	txt3,txt1
work_mem满足排序	1947.951 ms	1884.907 ms
work_mem不足排序	25365.012 ms	4632.567 ms

字节少数据值多的列，处于排序列的前列，可以带来性能的提升。当work_mem满足排序时，性能差异不大，当work_mem不足时，性能提升较大。

Group 和 Distinct 列的次序影响查询性能的更多相关文章

Sql Server查询性能优化之走出索引的误区
据了解绝大多数开发人员对于索引的理解都是一知半解,局限于大多数日常工作没有机会.也什么没有必要去关心.了解索引,实在哪天某个查询太慢了找到查询条件建个索引就ok,哪天又有个查询慢了,再建立个索引就是, ...
怎样group by一列 select多列
之前sql用的少竟然不知道这个小技巧 1 将要查询的列添加到group by后面(会影响查询结果) 2 使用聚合函数如 max select a.accounttitlecode, max(b.c ...
SQL Server 执行计划利用统计信息对数据行的预估原理二（为什么复合索引列顺序会影响到执行计划对数据行的预估）
本文出处:http://www.cnblogs.com/wy123/p/6008477.html 关于统计信息对数据行数做预估,之前写过对非相关列(单独或者单独的索引列)进行预估时候的算法,参考这里. ...
mysql经常使用查询：group by，左连接，子查询，having where
前几天去了两个比較牛的互联网公司面试.在sql这块都遇到问题了,哎.可惜呀,先把简单的梳理一下成绩表 score 1.group by 使用按某一个维度进行分组比如: 求每一个同学的总分 SEL ...
SQL Server-聚焦计算列或计算列持久化查询性能（二十二）
前言上一节我们详细讲解了计算列以及计算列持久化的问题,本节我们依然如前面讲解来看看二者查询性能问题,简短的内容,深入的理解,Always to review the basics. 持久化计算列比非 ...
一种更高查询性能的列存储方式MaxMinT 第一部分
简介本文描述了一种列存储方式和对应的查询方法,这种存储方式具有更好的查询性能和更小的存储空间. And查询本文先用直观的图形方式展示and查询时的方式,这也是算法要解决的问题核心.通常在OLAP数据 ...
Phoenix表和索引分区数对插入和查询性能的影响
1. 概述 1.1 HBase概述 HBase由master节点和region server节点组成.在100-105集群上,100和101是master节点,102-105是region serve ...
高性能MySQL笔记第6章查询性能优化
6.1 为什么查询速度会慢查询的生命周期大致可按照顺序来看:从客户端,到服务器,然后在服务器上进行解析,生成执行计划,执行,并返回结果给客户端.其中“执行”可以认为是整个生命周期中最重要的阶段. ...
mysql笔记03 查询性能优化
查询性能优化 1. 为什么查询速度会慢? 1). 如果把查询看作是一个任务,那么它由一系列子任务组成,每个子任务都会消耗一定的时间.如果要优化查询,实际上要优化其子任务,要么消除其中一些子任务,要么减 ...

随机推荐

关于个人全栈项目【臻美IT】博客类出现的问题以及解决方法
每做一个项目,要记得写下心得哦,别偷懒啊!先上网址:https://www.maomin.club/ 这个项目属于博客类的,因为百度审核的问题就大体做了下,就当来练练手,里面文章链接的是CSDN的博客 ...
快速保存Win10锁屏壁纸，收获美丽瞬间
对于写程序而言,每天接触得最多的就是电脑了所以保持一种开放乐观,豁达美丽的心情是十分有必要的使用"Everything"工具,输入"LocalState\Assets ...
UiPathExcel写入操作
一.Excel 写操作 1．写一个单元格 (1)控件介绍 Write Cell: 使用Write Cell控件,在指定单元格写入内容常用属性介绍: Destination: Cell: 要写 ...
Solution -「二项式定理与组合恒等式」一些练习
Task 1 $\mathcal{Prob:}$ $(3x - 2y)^{18}$ 的展开式中, $x^5y^{13}$ 的系数是什么?$x^8y^9$ 的系数是什么? \(\math ...
20220727-Java中多态总结
目录方法的多态对象的多态多态的注意事项和细节向下转型 Java动态绑定机制多态polymorphism:方法或者对象具有多种形态方法的多态方法的重载可以体现多态代码示例 // 通过方法 ...
centos安装torch==1.4.0与相关细节
对于某些直接安装torch==1.4.0报错的情况(没错,就是我遇到了) 在网上查找了,大概的解决方法是先安装一个低版本的torch和torchvision, torchvision是pytorch中 ...
天人合一物我相融，站点升级渐进式Web应用PWA(Progressive Web Apps)实践
原文转载自「刘悦的技术博客」https://v3u.cn/a_id_216 PWA(Progressive web apps,渐进式 Web 应用)使用现代的 Web API 以及传统的渐进式增强策略 ...
Python 函数修饰器
# 一.用函数修饰函数 #!/usr/bin/python3 def decorate_func(func): def call(*args, **kwargs): print('you have c ...
Veux mapState、mapGetters、mapActions、mapMutations && Vuex命名空间
1 # 一.四个map方法的使用 2 # 1.mapState方法:用于帮助我们映射state中的数据为计算属性 3 computed:{ 4 // sum(){ 5 // return this.$ ...
react环境搭建及文件配置
webpack简介构建工具(基于Nodejs)node(v16)前端工程化. 环境搭建创建一个空的package.json npm init webpack核心包(提供了API,插件) npm i ...

Group 和 Distinct 列的次序影响查询性能

一、概述

二、work_mem 满足排序情况

1、Distinct 语句

2、Group by 语句

三、work_mem 不满足排序情况

1、Distinct 语句

2、Group by 语句

四、总结

Group 和 Distinct 列的次序影响查询性能的更多相关文章

随机推荐

热门专题