Indexing GROUP BY

SQL databases use two entirely different group by algorithms. The first one, the hash algorithm, aggregates the input records in a temporary hash table. Once all input records are processed, the hash table is returned as the result. The second algorithm, the sort/group algorithm, first sorts the input data by the grouping key so that the rows of each group follow each other in immediate succession. Afterwards, the database just needs to aggregate them. In general, both algorithms need to materialize an intermediate state, so they are not executed in a pipelined manner. Nevertheless the sort/group algorithm can use an index to avoid the sort operation, thus enabling a pipelined group by.

Consider the following query. It delivers yesterday's revenue grouped by PRODUCT_ID:

SELECT product_id, sum(eur_value)

  FROM sales

 WHERE sale_date = TRUNC(sysdate) - INTERVAL '' DAY

 GROUP BY product_id

Knowing the index on SALE_DATE and PRODUCT_ID from the previous section, the sort/group algorithm is more appropriate because an INDEX RANGE SCAN automatically delivers the rows in the required order. That means the database avoids materialization because it does not need an explicit sort operation—the group by is executed in a pipelined manner.

oracle:

---------------------------------------------------------------

|Id |Operation                    | Name        | Rows | Cost |

---------------------------------------------------------------

| 0 |SELECT STATEMENT             |             |   17 |  192 |

| 1 | SORT GROUP BY NOSORT        |             |   17 |  192 |

| 2 |  TABLE ACCESS BY INDEX ROWID| SALES       |  321 |  192 |

|*3 |   INDEX RANGE SCAN          | SALES_DT_PR |  321 |    3 |

---------------------------------------------------------------

The Oracle database's execution plan marks a pipelined SORT GROUP BY operation with the NOSORT addendum. The execution plan of other databases does not mention any sort operation at all.

The pipelined group by has the same prerequisites as the pipelined order by, except there are no ASC and DESC modifiers. That means that defining an index with ASC/DESC modifiers should not affect pipelined group by execution. The same is true for NULLS FIRST/LAST. Nevertheless there are databases that cannot properly use an ASC/DESC index for a pipelined group by.

For PostgreSQL, you must add an order by clause to make an index with NULLS LAST sorting usable for a pipelined group by. The Oracle database cannot read an index backwards in order to execute a pipelined group by that is followed by an order by. More details are available in the respective appendices: PostgreSQL, Oracle.

If we extend the query to consider all sales since yesterday, as we did in the example for the pipelined order by, it prevents the pipelined group by for the same reason as before: the INDEX RANGE SCAN does not deliver the rows ordered by the grouping key.

SELECT product_id, sum(eur_value)

  FROM sales

 WHERE sale_date >= TRUNC(sysdate) - INTERVAL '' DAY

 GROUP BY product_id

Oracle:

---------------------------------------------------------------

|Id |Operation                    | Name        | Rows | Cost |

---------------------------------------------------------------

| 0 |SELECT STATEMENT             |             |   24 |  356 |

| 1 | HASH GROUP BY               |             |   24 |  356 |

| 2 |  TABLE ACCESS BY INDEX ROWID| SALES       |  596 |  355 |

|*3 |   INDEX RANGE SCAN          | SALES_DT_PR |  596 |    4 |

---------------------------------------------------------------

Instead, the Oracle database uses the hash algorithm. The advantage of the hash algorithm is that it only needs to buffer the aggregated result, whereas the sort/group algorithm materializes the complete input set. In other words: the hash algorithm needs less memory.

As with pipelined order by, a fast execution is not the most important aspect of the pipelined group by execution. It is more important that the database executes it in a pipelined manner and delivers the first result before reading the entire input.

参考：

http://use-the-index-luke.com/sql/sorting-grouping/indexed-group-by

Indexing GROUP BY的更多相关文章

Elasticsearch: Indexing SQL databases. The easy way
Elasticsearchis a great search engine, flexible, fast and fun. So how can I get started with it? Thi ...
Indexing Sensor Data
In particular embodiments, a method includes, from an indexer in a sensor network, accessing a set o ...
pandas 之 group by 过程
import numpy as np import pandas as pd Categorizing a dataset and applying a function to each group ...
LINQ Group By操作
在上篇文章 .NET应用程序与数据库交互的若干问题这篇文章中,讨论了一个计算热门商圈的问题,现在在这里扩展一下,假设我们需要从两张表中统计出热门商圈,这两张表内容如下: 上表是所有政区,商圈中的餐饮 ...
Kafka消费组(consumer group)
一直以来都想写一点关于kafka consumer的东西,特别是关于新版consumer的中文资料很少.最近Kafka社区邮件组已经在讨论是否应该正式使用新版本consumer替换老版本,笔者也觉得时 ...
LINQ to SQL语句(6)之Group By/Having
适用场景:分组数据,为我们查找数据缩小范围. 说明:分配并返回对传入参数进行分组操作后的可枚举对象.分组:延迟 1.简单形式: var q = from p in db.Products group ...
学习笔记 MYSQL报错注入(count()、rand()、group by)
首先看下常见的攻击载荷,如下: select count(*),(floor(rand(0)*2))x from table group by x; 然后对于攻击载荷进行解释, floor(rand( ...
[备查]使用 SPQuery 查询 "Person or Group" 字段
原文地址:http://www.stum.de/2008/02/06/querying-the-person-or-group-field-using-spquery/ Querying the “P ...
order by 与 group by 区别
order by 排序查询.asc升序.desc降序示例: select * from 学生表 order by 年龄 ---查询学生表信息.按年龄的升序(默认.可缺省.从低到高)排列显示也可以多 ...

随机推荐

fizzbuzz Python很有意思的解法
写一个程序,打印数字1到100,3的倍数打印“Fizz”来替换这个数,5的倍数打印“Buzz”,对于既是3的倍数又是5的倍数的数字打印“FizzBuzz” 题目不难,解起来容易,用for循环做if,e ...
TW实习日记：第19天
今天一早上改完信息门户的代码之后,发现接口又出了问题,查了半天都不知道,原来又是网端的问题...真是心累啊,调整了一些细节样式,以及终于把企业微信的消息推送功能做完了.关键就在于有个表存放微信id的字 ...
搜索二维矩阵 II
描述写出一个高效的算法来搜索m×n矩阵中的值,返回这个值出现的次数. 这个矩阵具有以下特性: 每行中的整数从左到右是排序的. 每一列的整数从上到下是排序的. 在每一行或每一列中没有重复的整数. 样例 ...
初涉 JavaScript
网页是什么网页 = Html+CSS+JavaScriptHtml:网页元素内容CSS:控制网页样式JavaScript:操作网页内容,实现功能或者效果 JavaScirpt 发展历史参考使用 ...
NOIP2012 普及组真题 4.13校模拟
考试状态: 我今天抽签看了洛谷的… 这我能怂???凶中带吉,我怕考试??我!不!怕! 看着整个机房的男同学们,我明白我是不会触发我的忌了.很好,开刷. A. [NOIP2012普及组真题] 质因数分解 ...
SIG蓝牙mesh笔记3_网络结构
目录 3. Mesh Networking 3.1 Bearers 承载层 3.2 Network Layer 网络层 3.2.3 Address validity 地址有效性 3.2.4 Netwo ...
HDU 2492 Ping pong（数学+树状数组）（2008 Asia Regional Beijing）
Description N(3<=N<=20000) ping pong players live along a west-east street(consider the street ...
C中的除法，商和余数的大小、符号如何确定
对于C中的除法,商和余数的大小.符号是如何确定的呢?在C89中,只规定了如果两个数为正整数,那么余数的符号为正,并且商的值是接近真实值的最大整数.比如5 / 2,那么商就是2,余数就是1.但是,C89 ...
lintcode-167-链表求和
167-链表求和你有两个用链表代表的整数,其中每个节点包含一个数字.数字存储按照在原来整数中相反的顺序,使得第一个数字位于链表的开头.写出一个函数将两个整数相加,用链表形式返回和. 样例给出两个链 ...
grid++json页面数据传入
最近遇到一个问题,就是要用Grid++做页面数据报表打印,但是翻了Grid++文档就是没有直接从页面上传数据的,都是要加载txt文档,填写txt文档的url.自己尝试直接页面上传JSON数据到Grid ...

Indexing GROUP BY

Indexing GROUP BY的更多相关文章

随机推荐

热门专题