MySQL 8.0.2 introduces SQL window functions, or analytic functions as they are also sometimes called. They join CTEs (available since 8.0.1) as two of our most requested features, and are long awaited and powerful features. This is the first of a series of posts describing the details. Let’s get started!

Introduction

Similar to grouped aggregate functions, window functions perform some calculation on a set of rows, e.g. COUNT or SUM. But where a grouped aggregate collapses this set of rows into a single row, a window function will perform the aggregation for each row in the result set, letting each row retain its identity:

Given this simple table, notice the difference between the two SELECTs:

 
 
 
 
 
 

MySQL

 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
mysql> CREATE TABLE t(i INT);
mysql> INSERT INTO t VALUES (1),(2),(3),(4);
 
mysql> SELECT SUM(i) AS sum FROM t;
+------+
| sum  |
+------+
|   10 |
+------+
 
mysql> SELECT i, SUM(i) OVER () AS sum FROM t;
+------+------+
| i    | sum  |
+------+------+
|    1 |   10 |
|    2 |   10 |
|    3 |   10 |
|    4 |   10 |
+------+------+

In the first select, we have a grouped aggregate; the there is no GROUP BY clause, we we have an implicit group containing all rows. The values of i get summed up for the group, and we get a value of 10 as a result row.

For the second select, as you can see, every row from t appears in the output, but each rowhas has the value of the sum of all the rows.

The crucial difference is the addition of the OVER () syntax after the SUM(i). The keyword OVER signals that this is a window function, as opposed to a grouped aggregate function. The empty parentheses after OVER is a window specification. In this simple example it is empty; this means default to aggregating the window function over all rows in the result set, so as for the grouped aggregate, we get the value 10 returned from the window function calls.

In this sense, a window function can be thought of as just another SQL function, except that its value is based on the value of other rows in addition to the values of the for which it is called, i.e. they function as a window into other rows.

Now, it is possible to do this calculation without window functions, but it is more complex and/or less efficient, i.e.:

 
 
 
 
 
 

MySQL

 
1
2
3
4
5
6
7
8
9
mysql> SELECT i, (SELECT SUM(i) FROM t) FROM t;
+------+------------------------+
| i    | (SELECT SUM(i) FROM t) |
+------+------------------------+
|    1 |                     10 |
|    2 |                     10 |
|    3 |                     10 |
|    4 |                     10 |
+------+------------------------+

that is, we use an explicit subquery here to calculate the SUM for each row in “t”. It turns out that the functionality in window functions can be expressed with other SQL constructs, but usually at the expense of both clarity and/or performance. We will show other examples later where the difference in clarity becomes starker.

Window functions come in two flavors: SQL aggregate functions used as window functions and specialized window functions. This is the set of aggregate functions in MySQL that support windowing:

COUNTSUMAVGMINMAXBIT_ORBIT_ANDBIT_XOR,
STDDEV_POP (and its synonyms STDSTDDEV), STDDEV_SAMP,
VAR_POP (and its synonym VARIANCE) and VAR_SAMP.

The set of specialized window functions are:

RANKDENSE_RANKPERCENT_RANKCUME_DISTNTILE,
ROW_NUMBERFIRST_VALUELAST_VALUENTH_VALUELEAD
and LAG

We will discuss all of these in due course; but after reading this blog, you should be able to start experimenting with all of these right away by consulting the SQL reference manual.

But before we do that,  we need to discuss the window specification a little. The following concepts are central: the partition, row ordering, determinacy, the window frame, row peers, physical and logical window frame bounds.

The partition

Again, let us contrast with grouped aggregates. Below, the employees record their sales of the previous month on the first day of the next:

 
 
 
 
 
 

MySQL

 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
CREATE TABLE sales(employee VARCHAR(50), `date` DATE, sale INT);
 
INSERT INTO sales VALUES ('odin', '2017-03-01', 200),
                         ('odin', '2017-04-01', 300),
                         ('odin', '2017-05-01', 400),
                         ('thor', '2017-03-01', 400),
                         ('thor', '2017-04-01', 300),
                         ('thor', '2017-05-01', 500);
 
mysql> SELECT employee, SUM(sale) FROM sales GROUP BY employee;
+----------+-----------+
| employee | SUM(sale) |
+----------+-----------+
| odin     |       900 |
| thor     |      1200 |
+----------+-----------+
 
mysql> SELECT employee, date, sale, SUM(sale) OVER (PARTITION BY employee) AS sum FROM sales;
+----------+------------+------+------+
| employee | date       | sale | sum  |
+----------+------------+------+------+
| odin     | 2017-03-01 |  200 |  900 |
| odin     | 2017-04-01 |  300 |  900 |
| odin     | 2017-05-01 |  400 |  900 |
| thor     | 2017-03-01 |  400 | 1200 |
| thor     | 2017-04-01 |  300 | 1200 |
| thor     | 2017-05-01 |  500 | 1200 |
+----------+------------+------+------+

In the first SELECT, we group the rows on employee and sum the sales figures of that employee. Since we have two employees in this Norse outfit, we get two result rows.

Similarly, we can let a window function only see the rows of a subset of the total set of rows; this is called a partition, which is similar to a grouping: as you can see the sums for Odin and Thor are different. This illustrates an important property of window functions: they can never see rows outside the partition of the row for which they are invoked.

We can partition in more ways, of course:

 
 
 
 
 
 

MySQL

 
1
2
3
4
5
6
7
8
9
10
11
mysql> SELECT employee, MONTHNAME(date), sale, SUM(sale) OVER (PARTITION BY MONTH(date)) AS sum FROM sales;
+----------+-----------------+------+------+
| employee | MONTHNAME(date) | sale | sum  |
+----------+-----------------+------+------+
| odin     | March           |  200 |  600 |
| thor     | March           |  400 |  600 |
| odin     | April           |  300 |  600 |
| thor     | April           |  300 |  600 |
| odin     | May             |  400 |  900 |
| thor     | May             |  500 |  900 |
+----------+-----------------+------+------+

Here we see the sales of the different months, and how the contributions from our intrepid salesmen contribute. Now, what if we want to show the cumulative sales? It’s time to introduce ORDER BY.

ORDER BY, peers, window frames, logical and physical bounds

The window specification will often contain an ordering clause for the rows in a partition:

 
 
 
 
 
 

MySQL

 
1
2
3
4
5
6
7
8
9
10
11
mysql> SELECT employee, sale, date, SUM(sale) OVER (PARTITION by employee ORDER BY date) AS cum_sales FROM sales;
+----------+------+------------+-----------+
| employee | sale | date       | cum_sales |
+----------+------+------------+-----------+
| odin     |  200 | 2017-03-01 |       200 |
| odin     |  300 | 2017-04-01 |       500 |
| odin     |  400 | 2017-05-01 |       900 |
| thor     |  400 | 2017-03-01 |       400 |
| thor     |  300 | 2017-04-01 |       700 |
| thor     |  500 | 2017-05-01 |      1200 |
+----------+------+------------+-----------+

We are summing up sales ordered by date:  And as we can see in the ‘cum_sales’ column: row two contains the sum of sale of row one and two, and for each employee the final sale is, as before, 900 for Odin and 1200 for Thor (a big hammer is a presumably a persuasive sales argument).

So what happened here? Why did our adding an ordering of the partition’s row lead to partial sums instead of the total for each row, as we had before?

The answer is that the SQL standard prescribes a different default window in the case of ORDER BY. The above window specification is equivalent to the explicit:

 
 
 
 
 
 

MySQL

 
1
(PARTITION by employee ORDER BY date RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

that is, for each sorted row, the SUM should see all rows before it (UNBOUNDED), and up to and including the current row. This represents an expanding window frame, anchored in the first row of the ordering.

So far, so good. However, there is a subtle point lurking here: Watch what happens here:

 
 
 
 
 
 

MySQL

 
1
2
3
4
5
6
7
8
9
10
11
mysql> SELECT employee, sale, date, SUM(sale) OVER (ORDER BY date) AS cum_sales FROM sales;
+----------+------+------------+-----------+
| employee | sale | date       | cum_sales |
+----------+------+------------+-----------+
| odin     |  200 | 2017-03-01 |       600 |
| thor     |  400 | 2017-03-01 |       600 |
| odin     |  300 | 2017-04-01 |      1200 |
| thor     |  300 | 2017-04-01 |      1200 |
| odin     |  400 | 2017-05-01 |      2100 |
| thor     |  500 | 2017-05-01 |      2100 |
+----------+------+------------+-----------+

We removed the partitioning, so we can get accumulated sale over all our salesmen, and true enough, the total is 2100 on the last row as expected. But, so has the second but the last row! Similarly, row one and two have the same value, as do three and four. But, hold on, you’d be forgiven for asking: I thought you just said we’d get a window to and including the current row?

The clue lies in the keyword RANGE above: windows can be specified by physical (ROWS) and logical (RANGE) boundaries. Since we order on date here, we see that the rows having the same date have the same sum. That is, actually our window specification logically reads

 
 
 
 
 
 

MySQL

 
1
2
(PARTITION by employee ORDER BY date
  RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW /* and its peers */)

that is, the window frame here consists of all rows earlier in the ordering up to and including the current row and its peers; that is, any other rows that sort the same given the ORDER BY expression. Two rows with equal dates obviously sort the same, hence they are peers.

If we wanted the cumulative sum to increase per row, we’d need to specify a physical bound:

 
 
 
 
 
 

MySQL

 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
mysql> SELECT employee, sale, date,
       SUM(sale) OVER (ORDER BY date ROWS
                       BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cum_sales
       FROM sales;
+----------+------+------------+-----------+
| employee | sale | date       | cum_sales |
+----------+------+------------+-----------+
| odin     |  200 | 2017-03-01 |       200 |
| thor     |  400 | 2017-03-01 |       600 |
| odin     |  300 | 2017-04-01 |       900 |
| thor     |  300 | 2017-04-01 |      1200 |
| odin     |  400 | 2017-05-01 |      1600 |
| thor     |  500 | 2017-05-01 |      2100 |
+----------+------+------------+-----------+

Note: the upper bound of the window frame (CURRENT ROW) is also default and can be omitted, as we do here:

 
 
 
 
 
 

MySQL

 
1
2
3
4
5
6
7
8
9
10
11
12
13
mysql> SELECT employee, sale, date,
              SUM(sale) OVER (ORDER BY date ROWS UNBOUNDED PRECEDING) AS cum_sales
              FROM sales;  
+----------+------+------------+-----------+
| employee | sale | date       | cum_sales |
+----------+------+------------+-----------+
| odin     |  200 | 2017-03-01 |       200 |
| thor     |  400 | 2017-03-01 |       600 |
| odin     |  300 | 2017-04-01 |       900 |
| thor     |  300 | 2017-04-01 |      1200 |
| odin     |  400 | 2017-05-01 |      1600 |
| thor     |  500 | 2017-05-01 |      2100 |
+----------+------+------------+-----------+

The keyword ROWS signals physical boundaries (rows), whereas RANGE indicates logical bounds, which can create peer rows. Which one you need depends on your application.

Digression: if you omit an ORDER BY, there is no way to determine which row comes before another row, so all of the rows in the partition can be considered peers, and hence a degenerate result of:

 
 
 
 
 
 

MySQL

 
1
2
(PARTITION by employee /* ORDER BY <nothing> */
  RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW /* and its peers */)

End of digression.

Determinacy

Let’s look again at the previous query:

 
 
 
 
 
 

MySQL

 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
mysql> SELECT employee, sale, date, SUM(sale) OVER (ORDER BY date ROWS UNBOUNDED PRECEDING) AS cum_sales FROM sales;
+----------+------+------------+-----------+
| employee | sale | date       | cum_sales |
+----------+------+------------+-----------+
| odin     |  200 | 2017-03-01 |       200 |
| thor     |  400 | 2017-03-01 |       600 |
| odin     |  300 | 2017-04-01 |       900 |
| thor     |  300 | 2017-04-01 |      1200 |
| odin     |  400 | 2017-05-01 |      1600 |
| thor     |  500 | 2017-05-01 |      2100 |
| odin     |  200 | 2017-06-01 |      2300 |
| thor     |  400 | 2017-06-01 |      2700 |
| odin     |  600 | 2017-07-01 |      3300 |
| thor     |  600 | 2017-07-01 |      3900 |
| odin     |  100 | 2017-08-01 |      4000 |
| thor     |  150 | 2017-08-01 |      4150 |
+----------+------+------------+-----------+

We order the rows by date, but all rows share the same date as another row, so what is the order between those? In the above, it is not determined. An equally valid result would be:

 
 
 
 
 
 

MySQL

 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
+----------+------+------------+-----------+
| employee | sale | date       | cum_sales |
+----------+------+------------+-----------+
| thor     |  400 | 2017-03-01 |       400 |
| odin     |  200 | 2017-03-01 |       600 |
| thor     |  300 | 2017-04-01 |       900 |
| odin     |  300 | 2017-04-01 |      1200 |
| thor     |  500 | 2017-05-01 |      1700 |
| odin     |  400 | 2017-05-01 |      2100 |
| thor     |  400 | 2017-06-01 |      2500 |
| odin     |  200 | 2017-06-01 |      2700 |
| thor     |  600 | 2017-07-01 |      3300 |
| odin     |  600 | 2017-07-01 |      3900 |
| thor     |  150 | 2017-08-01 |      4050 |
| odin     |  100 | 2017-08-01 |      4150 |
+----------+------+------------+-----------+

This time, Thor’s rows precede Odin’s rows, which is OK since we didn’t say anything about this in the window specification. Since we had a window frame with physical bound (“ROWS”), both results are valid.

It is usually a good idea to make sure windowing queries are deterministic. In this case, we can ensure this by adding employee to the ORDER BY clause:

 
 
 
 
 
 

MySQL

 
1
(ORDER BY date, employee ROWS UNBOUNDED PRECEDING)

Movable window frames

One doesn’t always want to aggregate over all values in a partition, for example when using moving averages (or moving mean).  To quote from Wikipedia:

“Given a series of numbers and a fixed subset size, the first element of the moving average is obtained by taking the average of the initial fixed subset of the number series. Then the subset is modified by “shifting forward”; that is, excluding the first number of the series and including the next value in the subset. A moving average is commonly used with time series data to smooth out short-term fluctuations and highlight longer-term trends or cycles.”

This is easily accomplished using window functions.  For example, let’s see some more of the sales data for Odin and Thor:

 
 
 
 
 
 

MySQL

 
1
2
3
4
5
6
7
8
9
10
11
12
13
TRUNCATE sales;
INSERT INTO sales VALUES ('odin', '2017-03-01', 200),
                         ('odin', '2017-04-01', 300),
                         ('odin', '2017-05-01', 400),
                         ('odin', '2017-06-01', 200),
                         ('odin', '2017-07-01', 600),
                         ('odin', '2017-08-01', 100),
                         ('thor', '2017-03-01', 400),
                         ('thor', '2017-04-01', 300),
                         ('thor', '2017-05-01', 500),
                         ('thor', '2017-06-01', 400),
                         ('thor', '2017-07-01', 600),
                         ('thor', '2017-08-01', 150);                        

By averaging the current month with the previous and the next we get a
smoother curve:

 
 
 
 
 
 

MySQL

 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
mysql> SELECT MONTH(date), SUM(sale),
       AVG(SUM(sale)) OVER (ORDER BY MONTH(date)
                            RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS sliding_avg
       FROM sales GROUP BY MONTH(date);
 
+-------------+-----------+-------------+
| month(date) | SUM(sale) | sliding_avg |
+-------------+-----------+-------------+
|           3 |       600 |    600.0000 |
|           4 |       600 |    700.0000 |
|           5 |       900 |    700.0000 |
|           6 |       600 |    900.0000 |
|           7 |      1200 |    683.3333 |
|           8 |       250 |    725.0000 |
+-------------+-----------+-------------+

Or as shown in this graphic:

This can be expressed without window functions too:

 
 
 
 
 
 

MySQL

 
1
2
3
4
5
6
7
WITH sums AS (
   SELECT SUM(t2.sale) AS sum, t2.date FROM sales AS t2  GROUP BY t2.date )
SELECT t1.date,
      ( SELECT SUM(sums.sum) / COUNT(sums.sum) FROM sums WHERE MONTH(sums.date) - MONTH(t1.date)  BETWEEN -1 AND 1)
   FROM sales AS t1, sums
   GROUP BY  date
   ORDER BY t1.date;

and even without CTEs:

 
 
 
 
 
 

MySQL

 
1
2
3
4
5
6
7
8
9
10
SELECT t1.date, SUM(sale),
      ( SELECT SUM(sums.sum) / COUNT(sums.sum) FROM
           (SELECT SUM(t2.sale) AS sum, t2.date FROM sales AS t2
            GROUP BY t2.date) sums
        WHERE MONTH(sums.date) - MONTH(t1.date)
              BETWEEN -1 AND 1
      ) AS sliding_avg
   FROM sales AS t1
   GROUP BY date
   ORDER BY t1.date;

but it is rather more complex.

Conclusion

That’s all for now. Hopefully this gives you an idea of what window functions can be used for.  In the next installment we’ll delve more into all the ways you can specify window frames, and introduce some of the specialized window functions. See you then!

MySQL 8.0.2: Introducing Window Functions的更多相关文章

  1. CentOS 7.x下安装部署MySQL 8.0实施手册

    MySQL 8 正式版 8.0.11 已发布,官方表示 MySQL 8 要比 MySQL 5.7 快 2 倍,还带来了大量的改进和更快的性能! 一.  Mysql8.0版本相比之前版本的一些特性 1) ...

  2. MySQL 8.0的关系数据库新特性详解

    前言 MySQL 8.0 当前的最新版本是 8.0.4 rc,估计正式版本出来也快了.本文介绍几个 8.0 在关系数据库方面的主要新特性. 你可能已经知道 MySQL 从版本 5.7 开始提供了 No ...

  3. 【Mysql】- Mysql 8.0正式版新亮点

    MySQL 8.0 正式版 8.0.11 已发布,官方表示 MySQL 8 要比 MySQL 5.7 快 2 倍,还带来了大量的改进和更快的性能! 注意:从 MySQL 5.7 升级到 MySQL 8 ...

  4. MySQL 8.0 正式版 8.0.11 发布:比 MySQL 5.7 快 2 倍

    ySQL 8.0 正式版 8.0.11 已发布,官方表示 MySQL 8 要比 MySQL 5.7 快 2 倍,还带来了大量的改进和更快的性能! 注意:从 MySQL 5.7 升级到 MySQL 8. ...

  5. MySQL 8.0 技术详解

    MySQL 8.0 简介 MySQL 5.7 到 8.0,Oracle 官方跳跃了 Major Version 版本号,随之而来的就是在 MySQL 8.0 上做了许多重大更新,在往企业级数据库的路上 ...

  6. MySQL 8.0 新特性梳理汇总

    一 历史版本发布回顾 从上图可以看出,基本遵循 5+3+3 模式 5---GA发布后,5年 就停止通用常规的更新了(功能不再更新了): 3---企业版的,+3年功能不再更新了: 3 ---完全停止更新 ...

  7. 阿里云CentOS自动备份MySql 8.0并上传至七牛云

    本文主要介绍一下阿里云CentOS7下如何对MySql 8.0数据库进行自动备份,并使用.NET Core 将备份文件上传至七牛云存储上,并对整个过程所踩的坑加以记录. 环境.工具.准备工作 服务器: ...

  8. MySQL :: MySQL 8.0 Reference Manual :: B.6.4.3 Problems with NULL Values https://dev.mysql.com/doc/refman/8.0/en/problems-with-null.html

    MySQL :: MySQL 8.0 Reference Manual :: B.6.4.3 Problems with NULL Values https://dev.mysql.com/doc/r ...

  9. mysql 8.0 初识

    1 下载并安装mysql 8.0官网下载比较慢,这里选择163的镜像http://mirrors.163.com/mysql/Downloads/MySQL-8.0/下载版本mysql-8.0.14- ...

随机推荐

  1. docker使用非root用户启动容器出现“running exec setns process for init caused \"exit status 40\"": unknown”

    环境为centos7,linux内核版本为3.10 出现该问题的原因是内核3.10的bug,升级linux内核即可,升级办法如下,升级完成后重启系统,选择对应的内核版本启动即可. .导入key rpm ...

  2. T-SQL触发器,限制一次只能删除一条数据

    /****** Object: Trigger [dbo].[trg_del] Script Date: 01/01/2016 12:58:28 ******/ SET ANSI_NULLS ON G ...

  3. mysql dbcp Caused By: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: The last packet successfully received

    <bean id="dataSource" class="org.apache.commons.dbcp.BasicDataSource" destroy ...

  4. sip (gb28181)信令交互-视频点播与回播

    客户端发起的实时点播消息示范:(请求视频信令与断开视频信息 和 回播基本无差别) .请求视频流 INVITE sip:@ SIP/2.0 Via: SIP/;rport;branch=z9hG4bK2 ...

  5. 高并发之 - 全局有序唯一id Snowflake 应用实战

    前言 本篇主要介绍高并发算法Snowflake是怎么应用到实战项目中的. 对于怎么理解Snowflake算法,大家可以从网上搜索‘Snowflake’,大量资源可供查看,这里就不一一详诉,这里主要介绍 ...

  6. 获取当前iframe动态加载文档的href

    Insus.NET想实现一个功能,一个旧的站点A,它有两个网页logon.aspx和Default.aspx(登录成功能访问).由于某些原因,需另建一个新站点B,这个新站点B也有两个网页B_Index ...

  7. JS继承实现的几种方式

    //继承的几种实现: //解决方案1.通过原型继承 function Parent1(){ this.name = 'Parent1'; } function Child1(){} Child1.pr ...

  8. [日常] Go语言圣经-匿名函数习题

    Go语言圣经-匿名函数1.拥有函数名的函数只能在包级语法块中被声明,通过函数字面量(function literal),我们可绕过这一限制,在任何表达式中表示一个函数值2.通过这种方式定义的函数可以访 ...

  9. 批处理REG学习

    首先在批处理操作注册表之前,应该了解REG命令的使用方式,详情请参阅一下网址: https://www.jb51.net/article/30078.htm 从以上链接内容我们可以详细了解使用reg的 ...

  10. 入门angularJs笔记手记一

    angularjs表达式: ng-init ng-bind ng-app ng-model输入域的值绑定到程序 ng-repeat重复一个HTML元素 自定义指令: <script> va ...