转自:http://lxw1234.com/archives/2015/04/176.htm,Hive分析窗口函数(一) SUM,AVG,MIN,MAX

之前看到大数据田地有关于max()over(partition by)的用法,今天恰好工作中用到了它,但是使用中遇到了一个问题:在max(rsrp)over(partition by buildingid,height) as max_rsrp返回的结果不是分组中的最大值。最中找到了问题的原因:max_rsrp数据类型为string而不是double类型,导致的一个bug问题。

再处理的过程中也再次把大数据田地的中关于sum,avg,max,min的函数用法做了demo,因此有了该参考后的文章。

数据准备:

echo ''>data_file.txt
vim data_file.txt
cookie1,2015-04-10,1
cookie1,2015-04-11,5
cookie1,2015-04-12,7
cookie1,2015-04-13,3
cookie1,2015-04-14,2
cookie1,2015-04-15,4
cookie1,2015-04-16,4
cookie2,2015-04-10,6
cookie2,2015-04-11,5
cookie2,2015-04-12,7
cookie2,2015-04-13,4
cookie2,2015-04-14,3
cookie2,2015-04-15,5
cookie2,2015-04-16,5
hadoop fs -rm -r /user/jrf/test_data
hadoop fs -mkdir /user/jrf/test_data
hadoop fs -copyFromLocal data_file.txt /user/jrf/test_data/
drop table if exists test_data;
create EXTERNAL TABLE test_data (
cookieid string,
createtime string, --day
pv INT
) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
stored as textfile location '/user/jrf/test_data/';
select * from test_data;
+---------------------+-----------------------+---------------+--+
| test_data.cookieid | test_data.createtime | test_data.pv |
+---------------------+-----------------------+---------------+--+
| cookie1 | 2015-04-10 | 1 |
| cookie1 | 2015-04-11 | 5 |
| cookie1 | 2015-04-12 | 7 |
| cookie1 | 2015-04-13 | 3 |
| cookie1 | 2015-04-14 | 2 |
| cookie1 | 2015-04-15 | 4 |
| cookie1 | 2015-04-16 | 4 |
| cookie2 | 2015-04-10 | 6 |
| cookie2 | 2015-04-11 | 5 |
| cookie2 | 2015-04-12 | 7 |
| cookie2 | 2015-04-13 | 4 |
| cookie2 | 2015-04-14 | 3 |
| cookie2 | 2015-04-15 | 5 |
| cookie2 | 2015-04-16 | 5 |
+---------------------+-----------------------+---------------+--+

SUM — 注意,结果和ORDER BY相关,默认为升序

SELECT cookieid,createtime,pv,
SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
SUM(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行
SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4,--当前行+往前3行
SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5,--当前行+往前3行+往后1行
SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行
FROM test_data order by cookieid,createtime;
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookieid | createtime | pv | pv1 | pv2 | pv3 | pv4 | pv5 | pv6 |
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookie1 | 2015-04-10 | 1 | 1 | 1 | 26 | 1 | 6 | 26 |
| cookie1 | 2015-04-11 | 5 | 6 | 6 | 26 | 6 | 13 | 25 |
| cookie1 | 2015-04-12 | 7 | 13 | 13 | 26 | 13 | 16 | 20 |
| cookie1 | 2015-04-13 | 3 | 16 | 16 | 26 | 16 | 18 | 13 |
| cookie1 | 2015-04-14 | 2 | 18 | 18 | 26 | 17 | 21 | 10 |
| cookie1 | 2015-04-15 | 4 | 22 | 22 | 26 | 16 | 20 | 8 |
| cookie1 | 2015-04-16 | 4 | 26 | 26 | 26 | 13 | 13 | 4 |
| cookie2 | 2015-04-10 | 6 | 6 | 6 | 35 | 6 | 11 | 35 |
| cookie2 | 2015-04-11 | 5 | 11 | 11 | 35 | 11 | 18 | 29 |
| cookie2 | 2015-04-12 | 7 | 18 | 18 | 35 | 18 | 22 | 24 |
| cookie2 | 2015-04-13 | 4 | 22 | 22 | 35 | 22 | 25 | 17 |
| cookie2 | 2015-04-14 | 3 | 25 | 25 | 35 | 19 | 24 | 13 |
| cookie2 | 2015-04-15 | 5 | 30 | 30 | 35 | 19 | 24 | 10 |
| cookie2 | 2015-04-16 | 5 | 35 | 35 | 35 | 17 | 17 | 5 |
+-----------+-------------+-----+------+------+------+------+------+------+--+
pv1: 分组内从起点到当前行的pv累积,如,11号的pv1=10号的pv+11号的pv, 12号=10号+11号+12号
pv2: 同pv1
pv3: 分组内(cookie1)所有的pv累加
pv4: 分组内当前行+往前3行,如,11号=10号+11号, 12号=10号+11号+12号, 13号=10号+11号+12号+13号, 14号=11号+12号+13号+14号
pv5: 分组内当前行+往前3行+往后1行,如,14号=11号+12号+13号+14号+15号=5+7+3+2+4=21
pv6: 分组内当前行+往后所有行,如,13号=13号+14号+15号+16号=3+2+4+4=13,14号=14号+15号+16号=2+4+4=10

如果不指定ROWS BETWEEN,默认为从起点到当前行;
如果不指定ORDER BY,则将分组内所有值累加;
关键是理解ROWS BETWEEN含义,也叫做WINDOW子句:
PRECEDING:往前
FOLLOWING:往后
CURRENT ROW:当前行
UNBOUNDED:起点,UNBOUNDED PRECEDING 表示从前面的起点, UNBOUNDED FOLLOWING:表示到后面的终点

–其他AVG,MIN,MAX,和SUM用法一样。

--AVG
SELECT cookieid,createtime,pv,
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
AVG(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4,--当前行+往前3行
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5,--当前行+往前3行+往后1行
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行
FROM test_data order by cookieid,createtime;
+-----------+-------------+-----+---------------------+---------------------+---------------------+--------------------+--------------------+---------------------+--+
| cookieid | createtime | pv | pv1 | pv2 | pv3 | pv4 | pv5 | pv6 |
+-----------+-------------+-----+---------------------+---------------------+---------------------+--------------------+--------------------+---------------------+--+
| cookie1 | 2015-04-10 | 1 | 1.0 | 1.0 | 3.7142857142857144 | 1.0 | 3.0 | 3.7142857142857144 |
| cookie1 | 2015-04-11 | 5 | 3.0 | 3.0 | 3.7142857142857144 | 3.0 | 4.333333333333333 | 4.166666666666667 |
| cookie1 | 2015-04-12 | 7 | 4.333333333333333 | 4.333333333333333 | 3.7142857142857144 | 4.333333333333333 | 4.0 | 4.0 |
| cookie1 | 2015-04-13 | 3 | 4.0 | 4.0 | 3.7142857142857144 | 4.0 | 3.6 | 3.25 |
| cookie1 | 2015-04-14 | 2 | 3.6 | 3.6 | 3.7142857142857144 | 4.25 | 4.2 | 3.3333333333333335 |
| cookie1 | 2015-04-15 | 4 | 3.6666666666666665 | 3.6666666666666665 | 3.7142857142857144 | 4.0 | 4.0 | 4.0 |
| cookie1 | 2015-04-16 | 4 | 3.7142857142857144 | 3.7142857142857144 | 3.7142857142857144 | 3.25 | 3.25 | 4.0 |
| cookie2 | 2015-04-10 | 6 | 6.0 | 6.0 | 5.0 | 6.0 | 5.5 | 5.0 |
| cookie2 | 2015-04-11 | 5 | 5.5 | 5.5 | 5.0 | 5.5 | 6.0 | 4.833333333333333 |
| cookie2 | 2015-04-12 | 7 | 6.0 | 6.0 | 5.0 | 6.0 | 5.5 | 4.8 |
| cookie2 | 2015-04-13 | 4 | 5.5 | 5.5 | 5.0 | 5.5 | 5.0 | 4.25 |
| cookie2 | 2015-04-14 | 3 | 5.0 | 5.0 | 5.0 | 4.75 | 4.8 | 4.333333333333333 |
| cookie2 | 2015-04-15 | 5 | 5.0 | 5.0 | 5.0 | 4.75 | 4.8 | 5.0 |
| cookie2 | 2015-04-16 | 5 | 5.0 | 5.0 | 5.0 | 4.25 | 4.25 | 5.0 |
+-----------+-------------+-----+---------------------+---------------------+---------------------+--------------------+--------------------+---------------------+--+
--MIN
SELECT cookieid,createtime,pv,
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2,--从起点到当前行,结果同pv1
MIN(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4,--当前行+往前3行
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5,--当前行+往前3行+往后1行
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行
FROM test_data order by cookieid,createtime;
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookieid | createtime | pv | pv1 | pv2 | pv3 | pv4 | pv5 | pv6 |
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookie1 | 2015-04-10 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| cookie1 | 2015-04-11 | 5 | 1 | 1 | 1 | 1 | 1 | 2 |
| cookie1 | 2015-04-12 | 7 | 1 | 1 | 1 | 1 | 1 | 2 |
| cookie1 | 2015-04-13 | 3 | 1 | 1 | 1 | 1 | 1 | 2 |
| cookie1 | 2015-04-14 | 2 | 1 | 1 | 1 | 2 | 2 | 2 |
| cookie1 | 2015-04-15 | 4 | 1 | 1 | 1 | 2 | 2 | 4 |
| cookie1 | 2015-04-16 | 4 | 1 | 1 | 1 | 2 | 2 | 4 |
| cookie2 | 2015-04-10 | 6 | 6 | 6 | 3 | 6 | 5 | 3 |
| cookie2 | 2015-04-11 | 5 | 5 | 5 | 3 | 5 | 5 | 3 |
| cookie2 | 2015-04-12 | 7 | 5 | 5 | 3 | 5 | 4 | 3 |
| cookie2 | 2015-04-13 | 4 | 4 | 4 | 3 | 4 | 3 | 3 |
| cookie2 | 2015-04-14 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
| cookie2 | 2015-04-15 | 5 | 3 | 3 | 3 | 3 | 3 | 5 |
| cookie2 | 2015-04-16 | 5 | 3 | 3 | 3 | 3 | 3 | 5 |
+-----------+-------------+-----+------+------+------+------+------+------+--+
--MAX
SELECT cookieid,createtime,pv,
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
MAX(pv) OVER(PARTITION BY cookieid) AS pv3, --分组内所有行
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3行
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3行+往后1行
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行
FROM test_data order by cookieid,createtime;
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookieid | createtime | pv | pv1 | pv2 | pv3 | pv4 | pv5 | pv6 |
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookie1 | 2015-04-10 | 1 | 1 | 1 | 7 | 1 | 5 | 7 |
| cookie1 | 2015-04-11 | 5 | 5 | 5 | 7 | 5 | 7 | 7 |
| cookie1 | 2015-04-12 | 7 | 7 | 7 | 7 | 7 | 7 | 7 |
| cookie1 | 2015-04-13 | 3 | 7 | 7 | 7 | 7 | 7 | 4 |
| cookie1 | 2015-04-14 | 2 | 7 | 7 | 7 | 7 | 7 | 4 |
| cookie1 | 2015-04-15 | 4 | 7 | 7 | 7 | 7 | 7 | 4 |
| cookie1 | 2015-04-16 | 4 | 7 | 7 | 7 | 4 | 4 | 4 |
| cookie2 | 2015-04-10 | 6 | 6 | 6 | 7 | 6 | 6 | 7 |
| cookie2 | 2015-04-11 | 5 | 6 | 6 | 7 | 6 | 7 | 7 |
| cookie2 | 2015-04-12 | 7 | 7 | 7 | 7 | 7 | 7 | 7 |
| cookie2 | 2015-04-13 | 4 | 7 | 7 | 7 | 7 | 7 | 5 |
| cookie2 | 2015-04-14 | 3 | 7 | 7 | 7 | 7 | 7 | 5 |
| cookie2 | 2015-04-15 | 5 | 7 | 7 | 7 | 7 | 7 | 5 |
| cookie2 | 2015-04-16 | 5 | 7 | 7 | 7 | 5 | 5 | 5 |
+-----------+-------------+-----+------+------+------+------+------+------+--+ SELECT cookieid,
createtime,
pv,
min(pv) OVER(PARTITION BY cookieid) AS min_pv,
max(pv) OVER(PARTITION BY cookieid) AS max_pv
FROM test_data;
+-----------+-------------+-----+---------+---------+--+
| cookieid | createtime | pv | min_pv | max_pv |
+-----------+-------------+-----+---------+---------+--+
| cookie1 | 2015-04-10 | 1 | 1 | 7 |
| cookie1 | 2015-04-16 | 4 | 1 | 7 |
| cookie1 | 2015-04-15 | 4 | 1 | 7 |
| cookie1 | 2015-04-14 | 2 | 1 | 7 |
| cookie1 | 2015-04-13 | 3 | 1 | 7 |
| cookie1 | 2015-04-12 | 7 | 1 | 7 |
| cookie1 | 2015-04-11 | 5 | 1 | 7 |
| cookie2 | 2015-04-16 | 5 | 3 | 7 |
| cookie2 | 2015-04-15 | 5 | 3 | 7 |
| cookie2 | 2015-04-14 | 3 | 3 | 7 |
| cookie2 | 2015-04-13 | 4 | 3 | 7 |
| cookie2 | 2015-04-12 | 7 | 3 | 7 |
| cookie2 | 2015-04-11 | 5 | 3 | 7 |
| cookie2 | 2015-04-10 | 6 | 3 | 7 |
+-----------+-------------+-----+---------+---------+--+

Hive函数:SUM,AVG,MIN,MAX的更多相关文章

  1. Hive分析窗口函数(一) SUM,AVG,MIN,MAX

    Hive分析窗口函数(一) SUM,AVG,MIN,MAX Hive分析窗口函数(一) SUM,AVG,MIN,MAX Hive中提供了越来越多的分析函数,用于完成负责的统计分析.抽时间将所有的分析窗 ...

  2. Hive学习之路 (十三)Hive分析窗口函数(一) SUM,AVG,MIN,MAX

    数据准备 数据格式 cookie1,, cookie1,, cookie1,, cookie1,, cookie1,, cookie1,, cookie1,, 创建数据库及表 create datab ...

  3. MybatisPlus Lambda表达式 聚合查询 分组查询 COUNT SUM AVG MIN MAX GroupBy

    一.序言 众所周知,MybatisPlus在处理单表DAO操作时非常的方便.在处理多表连接连接查询也有优雅的解决方案.今天分享MybatisPlus基于Lambda表达式优雅实现聚合分组查询. 由于视 ...

  4. C# 中奇妙的函数–6. 五个序列聚合运算(Sum, Average, Min, Max,Aggregate)

    今天,我们将着眼于五个用于序列的聚合运算.很多时候当我们在对序列进行操作时,我们想要做基于这些序列执行某种汇总然后,计算结果. Enumerable 静态类的LINQ扩展方法可以做到这一点 .就像之前 ...

  5. SQL模糊查询,sum,AVG,MAX,min函数

    cmd mysql -hlocalhost -uroot -p select * from emp where ename like '___' -- 三个横线, - 代表字符,可以查询 三个enam ...

  6. 三、函数 (SUM、MIN、MAX、COUNT、AVG)

    第八章 使用数据处理函数 8.1 函数 SQL支持利用函数来处理数据.函数一般是在数据上执行的,给数据的转换和处理提供了方便. 每一个DBMS都有特定的函数.只有少数几个函数被所有主要的DBMS等同的 ...

  7. LINQ to SQL Count/Sum/Min/Max/Avg Join

    public class Linq { MXSICEDataContext Db = new MXSICEDataContext(); // LINQ to SQL // Count/Sum/Min/ ...

  8. LINQ to SQL 语句(3) 之 Count/Sum/Min/Max/Avg

    LINQ  to SQL 语句(3) 之  Count/Sum/Min/Max/Avg [1] Count/Sum 讲解 [2] Min 讲解 [3] Max 讲解 [4] Average 和 Agg ...

  9. [转]LINQ语句之Select/Distinct和Count/Sum/Min/Max/Avg

    在讲述了LINQ,顺便说了一下Where操作,这篇开始我们继续说LINQ语句,目的让大家从语句的角度了解LINQ,LINQ包括LINQ to Objects.LINQ to DataSets.LINQ ...

随机推荐

  1. javascript中的Promise使用

    参考自: http://m.jb51.net/article/102642.htm 1.基本用法: (1).首先我们new一个Promise,将Promise实例化 (2).然后在实例化的promis ...

  2. 初始化angularJS之ng-app的自动绑定和手动绑定

    在传统的angularJS应用中,都是通过ng-app把angular应用绑定到某个dom上,这样做会把js代码入侵到html上,angular提供了手动启动的API--angular.bootstr ...

  3. python安装第三方库

    在编写爬虫程序时发现unsolved import 一时不解,以为是ide出问题了,其实是没有安装第三方库导致的. 于是到https://pypi.python.org/pypi/requests/去 ...

  4. 【Django】 视图层说明

    [Django视图层] 视图层的主要工作是衔接HTTP请求,Python程序和HTML模板,使他们能够有机互相合作从模型层lou到数据并且反馈.说到视图层的工作就有以下几个方面要说 ■ URL映射 对 ...

  5. 网络通信 --> Linux 五种IO模型

    Linux 五种IO模型 聊聊Linux 五种IO模型

  6. 通过修改然后commit的方式创建自己的镜像

    创建自己的镜像:通过现有的镜像来创建自己的镜像.1.首先拉取一个镜像到本地$ sudo docker imagesREPOSITORY          TAG                 IMA ...

  7. Python类中的self到底是干啥的

    Python类中的self到底是干啥的 Python编写类的时候,每个函数参数第一个参数都是self,一开始我不管它到底是干嘛的,只知道必须要写上.后来对Python渐渐熟悉了一点,再回头看self的 ...

  8. WCF配置问题(配置WCF跨域)

    其它的先放一边.今天先来分享一下前段时间给公司做网站WCF服务接口的心得. 配置文件的配置问题 这里既然讨论WCF配置文件的问题,那么怎么创建的就不一一讲解了.好多博主都有提过的.所以直接分享自己开发 ...

  9. shiro(三),使用第三方jdbcRealm连接数据库操作

    这里采用第三方实现好的JdbcRealm连接数据库:首先来看一下源码: 接着前面的说:就把这个类当做我们自己写的就好了,我们需要实例化它,然后给他注入一个数据源 下面是ini文件配置 [main] # ...

  10. qt中控件的使用函数

    1.Text Edit编辑框 //将编辑框中的内容转化成Utf8编码 ui->textEdit->toPlainText().toUtf8(); 2.Combo Box下拉框的应用 (1) ...