Hive函数:SUM,AVG,MIN,MAX
转自:http://lxw1234.com/archives/2015/04/176.htm,Hive分析窗口函数(一) SUM,AVG,MIN,MAX
之前看到大数据田地有关于max()over(partition by)的用法,今天恰好工作中用到了它,但是使用中遇到了一个问题:在max(rsrp)over(partition by buildingid,height) as max_rsrp返回的结果不是分组中的最大值。最中找到了问题的原因:max_rsrp数据类型为string而不是double类型,导致的一个bug问题。
再处理的过程中也再次把大数据田地的中关于sum,avg,max,min的函数用法做了demo,因此有了该参考后的文章。
数据准备:
echo ''>data_file.txt
vim data_file.txt
cookie1,2015-04-10,1
cookie1,2015-04-11,5
cookie1,2015-04-12,7
cookie1,2015-04-13,3
cookie1,2015-04-14,2
cookie1,2015-04-15,4
cookie1,2015-04-16,4
cookie2,2015-04-10,6
cookie2,2015-04-11,5
cookie2,2015-04-12,7
cookie2,2015-04-13,4
cookie2,2015-04-14,3
cookie2,2015-04-15,5
cookie2,2015-04-16,5
hadoop fs -rm -r /user/jrf/test_data
hadoop fs -mkdir /user/jrf/test_data
hadoop fs -copyFromLocal data_file.txt /user/jrf/test_data/
drop table if exists test_data;
create EXTERNAL TABLE test_data (
cookieid string,
createtime string, --day
pv INT
) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
stored as textfile location '/user/jrf/test_data/';
select * from test_data;
+---------------------+-----------------------+---------------+--+
| test_data.cookieid | test_data.createtime | test_data.pv |
+---------------------+-----------------------+---------------+--+
| cookie1 | 2015-04-10 | 1 |
| cookie1 | 2015-04-11 | 5 |
| cookie1 | 2015-04-12 | 7 |
| cookie1 | 2015-04-13 | 3 |
| cookie1 | 2015-04-14 | 2 |
| cookie1 | 2015-04-15 | 4 |
| cookie1 | 2015-04-16 | 4 |
| cookie2 | 2015-04-10 | 6 |
| cookie2 | 2015-04-11 | 5 |
| cookie2 | 2015-04-12 | 7 |
| cookie2 | 2015-04-13 | 4 |
| cookie2 | 2015-04-14 | 3 |
| cookie2 | 2015-04-15 | 5 |
| cookie2 | 2015-04-16 | 5 |
+---------------------+-----------------------+---------------+--+
SUM — 注意,结果和ORDER BY相关,默认为升序
SELECT cookieid,createtime,pv,
SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
SUM(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行
SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4,--当前行+往前3行
SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5,--当前行+往前3行+往后1行
SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行
FROM test_data order by cookieid,createtime;
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookieid | createtime | pv | pv1 | pv2 | pv3 | pv4 | pv5 | pv6 |
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookie1 | 2015-04-10 | 1 | 1 | 1 | 26 | 1 | 6 | 26 |
| cookie1 | 2015-04-11 | 5 | 6 | 6 | 26 | 6 | 13 | 25 |
| cookie1 | 2015-04-12 | 7 | 13 | 13 | 26 | 13 | 16 | 20 |
| cookie1 | 2015-04-13 | 3 | 16 | 16 | 26 | 16 | 18 | 13 |
| cookie1 | 2015-04-14 | 2 | 18 | 18 | 26 | 17 | 21 | 10 |
| cookie1 | 2015-04-15 | 4 | 22 | 22 | 26 | 16 | 20 | 8 |
| cookie1 | 2015-04-16 | 4 | 26 | 26 | 26 | 13 | 13 | 4 |
| cookie2 | 2015-04-10 | 6 | 6 | 6 | 35 | 6 | 11 | 35 |
| cookie2 | 2015-04-11 | 5 | 11 | 11 | 35 | 11 | 18 | 29 |
| cookie2 | 2015-04-12 | 7 | 18 | 18 | 35 | 18 | 22 | 24 |
| cookie2 | 2015-04-13 | 4 | 22 | 22 | 35 | 22 | 25 | 17 |
| cookie2 | 2015-04-14 | 3 | 25 | 25 | 35 | 19 | 24 | 13 |
| cookie2 | 2015-04-15 | 5 | 30 | 30 | 35 | 19 | 24 | 10 |
| cookie2 | 2015-04-16 | 5 | 35 | 35 | 35 | 17 | 17 | 5 |
+-----------+-------------+-----+------+------+------+------+------+------+--+
pv1: 分组内从起点到当前行的pv累积,如,11号的pv1=10号的pv+11号的pv, 12号=10号+11号+12号
pv2: 同pv1
pv3: 分组内(cookie1)所有的pv累加
pv4: 分组内当前行+往前3行,如,11号=10号+11号, 12号=10号+11号+12号, 13号=10号+11号+12号+13号, 14号=11号+12号+13号+14号
pv5: 分组内当前行+往前3行+往后1行,如,14号=11号+12号+13号+14号+15号=5+7+3+2+4=21
pv6: 分组内当前行+往后所有行,如,13号=13号+14号+15号+16号=3+2+4+4=13,14号=14号+15号+16号=2+4+4=10
如果不指定ROWS BETWEEN,默认为从起点到当前行;
如果不指定ORDER BY,则将分组内所有值累加;
关键是理解ROWS BETWEEN含义,也叫做WINDOW子句:
PRECEDING:往前
FOLLOWING:往后
CURRENT ROW:当前行
UNBOUNDED:起点,UNBOUNDED PRECEDING 表示从前面的起点, UNBOUNDED FOLLOWING:表示到后面的终点
–其他AVG,MIN,MAX,和SUM用法一样。
--AVG
SELECT cookieid,createtime,pv,
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
AVG(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4,--当前行+往前3行
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5,--当前行+往前3行+往后1行
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行
FROM test_data order by cookieid,createtime;
+-----------+-------------+-----+---------------------+---------------------+---------------------+--------------------+--------------------+---------------------+--+
| cookieid | createtime | pv | pv1 | pv2 | pv3 | pv4 | pv5 | pv6 |
+-----------+-------------+-----+---------------------+---------------------+---------------------+--------------------+--------------------+---------------------+--+
| cookie1 | 2015-04-10 | 1 | 1.0 | 1.0 | 3.7142857142857144 | 1.0 | 3.0 | 3.7142857142857144 |
| cookie1 | 2015-04-11 | 5 | 3.0 | 3.0 | 3.7142857142857144 | 3.0 | 4.333333333333333 | 4.166666666666667 |
| cookie1 | 2015-04-12 | 7 | 4.333333333333333 | 4.333333333333333 | 3.7142857142857144 | 4.333333333333333 | 4.0 | 4.0 |
| cookie1 | 2015-04-13 | 3 | 4.0 | 4.0 | 3.7142857142857144 | 4.0 | 3.6 | 3.25 |
| cookie1 | 2015-04-14 | 2 | 3.6 | 3.6 | 3.7142857142857144 | 4.25 | 4.2 | 3.3333333333333335 |
| cookie1 | 2015-04-15 | 4 | 3.6666666666666665 | 3.6666666666666665 | 3.7142857142857144 | 4.0 | 4.0 | 4.0 |
| cookie1 | 2015-04-16 | 4 | 3.7142857142857144 | 3.7142857142857144 | 3.7142857142857144 | 3.25 | 3.25 | 4.0 |
| cookie2 | 2015-04-10 | 6 | 6.0 | 6.0 | 5.0 | 6.0 | 5.5 | 5.0 |
| cookie2 | 2015-04-11 | 5 | 5.5 | 5.5 | 5.0 | 5.5 | 6.0 | 4.833333333333333 |
| cookie2 | 2015-04-12 | 7 | 6.0 | 6.0 | 5.0 | 6.0 | 5.5 | 4.8 |
| cookie2 | 2015-04-13 | 4 | 5.5 | 5.5 | 5.0 | 5.5 | 5.0 | 4.25 |
| cookie2 | 2015-04-14 | 3 | 5.0 | 5.0 | 5.0 | 4.75 | 4.8 | 4.333333333333333 |
| cookie2 | 2015-04-15 | 5 | 5.0 | 5.0 | 5.0 | 4.75 | 4.8 | 5.0 |
| cookie2 | 2015-04-16 | 5 | 5.0 | 5.0 | 5.0 | 4.25 | 4.25 | 5.0 |
+-----------+-------------+-----+---------------------+---------------------+---------------------+--------------------+--------------------+---------------------+--+
--MIN
SELECT cookieid,createtime,pv,
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2,--从起点到当前行,结果同pv1
MIN(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4,--当前行+往前3行
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5,--当前行+往前3行+往后1行
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行
FROM test_data order by cookieid,createtime;
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookieid | createtime | pv | pv1 | pv2 | pv3 | pv4 | pv5 | pv6 |
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookie1 | 2015-04-10 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| cookie1 | 2015-04-11 | 5 | 1 | 1 | 1 | 1 | 1 | 2 |
| cookie1 | 2015-04-12 | 7 | 1 | 1 | 1 | 1 | 1 | 2 |
| cookie1 | 2015-04-13 | 3 | 1 | 1 | 1 | 1 | 1 | 2 |
| cookie1 | 2015-04-14 | 2 | 1 | 1 | 1 | 2 | 2 | 2 |
| cookie1 | 2015-04-15 | 4 | 1 | 1 | 1 | 2 | 2 | 4 |
| cookie1 | 2015-04-16 | 4 | 1 | 1 | 1 | 2 | 2 | 4 |
| cookie2 | 2015-04-10 | 6 | 6 | 6 | 3 | 6 | 5 | 3 |
| cookie2 | 2015-04-11 | 5 | 5 | 5 | 3 | 5 | 5 | 3 |
| cookie2 | 2015-04-12 | 7 | 5 | 5 | 3 | 5 | 4 | 3 |
| cookie2 | 2015-04-13 | 4 | 4 | 4 | 3 | 4 | 3 | 3 |
| cookie2 | 2015-04-14 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
| cookie2 | 2015-04-15 | 5 | 3 | 3 | 3 | 3 | 3 | 5 |
| cookie2 | 2015-04-16 | 5 | 3 | 3 | 3 | 3 | 3 | 5 |
+-----------+-------------+-----+------+------+------+------+------+------+--+
--MAX
SELECT cookieid,createtime,pv,
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
MAX(pv) OVER(PARTITION BY cookieid) AS pv3, --分组内所有行
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3行
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3行+往后1行
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行
FROM test_data order by cookieid,createtime;
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookieid | createtime | pv | pv1 | pv2 | pv3 | pv4 | pv5 | pv6 |
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookie1 | 2015-04-10 | 1 | 1 | 1 | 7 | 1 | 5 | 7 |
| cookie1 | 2015-04-11 | 5 | 5 | 5 | 7 | 5 | 7 | 7 |
| cookie1 | 2015-04-12 | 7 | 7 | 7 | 7 | 7 | 7 | 7 |
| cookie1 | 2015-04-13 | 3 | 7 | 7 | 7 | 7 | 7 | 4 |
| cookie1 | 2015-04-14 | 2 | 7 | 7 | 7 | 7 | 7 | 4 |
| cookie1 | 2015-04-15 | 4 | 7 | 7 | 7 | 7 | 7 | 4 |
| cookie1 | 2015-04-16 | 4 | 7 | 7 | 7 | 4 | 4 | 4 |
| cookie2 | 2015-04-10 | 6 | 6 | 6 | 7 | 6 | 6 | 7 |
| cookie2 | 2015-04-11 | 5 | 6 | 6 | 7 | 6 | 7 | 7 |
| cookie2 | 2015-04-12 | 7 | 7 | 7 | 7 | 7 | 7 | 7 |
| cookie2 | 2015-04-13 | 4 | 7 | 7 | 7 | 7 | 7 | 5 |
| cookie2 | 2015-04-14 | 3 | 7 | 7 | 7 | 7 | 7 | 5 |
| cookie2 | 2015-04-15 | 5 | 7 | 7 | 7 | 7 | 7 | 5 |
| cookie2 | 2015-04-16 | 5 | 7 | 7 | 7 | 5 | 5 | 5 |
+-----------+-------------+-----+------+------+------+------+------+------+--+ SELECT cookieid,
createtime,
pv,
min(pv) OVER(PARTITION BY cookieid) AS min_pv,
max(pv) OVER(PARTITION BY cookieid) AS max_pv
FROM test_data;
+-----------+-------------+-----+---------+---------+--+
| cookieid | createtime | pv | min_pv | max_pv |
+-----------+-------------+-----+---------+---------+--+
| cookie1 | 2015-04-10 | 1 | 1 | 7 |
| cookie1 | 2015-04-16 | 4 | 1 | 7 |
| cookie1 | 2015-04-15 | 4 | 1 | 7 |
| cookie1 | 2015-04-14 | 2 | 1 | 7 |
| cookie1 | 2015-04-13 | 3 | 1 | 7 |
| cookie1 | 2015-04-12 | 7 | 1 | 7 |
| cookie1 | 2015-04-11 | 5 | 1 | 7 |
| cookie2 | 2015-04-16 | 5 | 3 | 7 |
| cookie2 | 2015-04-15 | 5 | 3 | 7 |
| cookie2 | 2015-04-14 | 3 | 3 | 7 |
| cookie2 | 2015-04-13 | 4 | 3 | 7 |
| cookie2 | 2015-04-12 | 7 | 3 | 7 |
| cookie2 | 2015-04-11 | 5 | 3 | 7 |
| cookie2 | 2015-04-10 | 6 | 3 | 7 |
+-----------+-------------+-----+---------+---------+--+
Hive函数:SUM,AVG,MIN,MAX的更多相关文章
- Hive分析窗口函数(一) SUM,AVG,MIN,MAX
Hive分析窗口函数(一) SUM,AVG,MIN,MAX Hive分析窗口函数(一) SUM,AVG,MIN,MAX Hive中提供了越来越多的分析函数,用于完成负责的统计分析.抽时间将所有的分析窗 ...
- Hive学习之路 (十三)Hive分析窗口函数(一) SUM,AVG,MIN,MAX
数据准备 数据格式 cookie1,, cookie1,, cookie1,, cookie1,, cookie1,, cookie1,, cookie1,, 创建数据库及表 create datab ...
- MybatisPlus Lambda表达式 聚合查询 分组查询 COUNT SUM AVG MIN MAX GroupBy
一.序言 众所周知,MybatisPlus在处理单表DAO操作时非常的方便.在处理多表连接连接查询也有优雅的解决方案.今天分享MybatisPlus基于Lambda表达式优雅实现聚合分组查询. 由于视 ...
- C# 中奇妙的函数–6. 五个序列聚合运算(Sum, Average, Min, Max,Aggregate)
今天,我们将着眼于五个用于序列的聚合运算.很多时候当我们在对序列进行操作时,我们想要做基于这些序列执行某种汇总然后,计算结果. Enumerable 静态类的LINQ扩展方法可以做到这一点 .就像之前 ...
- SQL模糊查询,sum,AVG,MAX,min函数
cmd mysql -hlocalhost -uroot -p select * from emp where ename like '___' -- 三个横线, - 代表字符,可以查询 三个enam ...
- 三、函数 (SUM、MIN、MAX、COUNT、AVG)
第八章 使用数据处理函数 8.1 函数 SQL支持利用函数来处理数据.函数一般是在数据上执行的,给数据的转换和处理提供了方便. 每一个DBMS都有特定的函数.只有少数几个函数被所有主要的DBMS等同的 ...
- LINQ to SQL Count/Sum/Min/Max/Avg Join
public class Linq { MXSICEDataContext Db = new MXSICEDataContext(); // LINQ to SQL // Count/Sum/Min/ ...
- LINQ to SQL 语句(3) 之 Count/Sum/Min/Max/Avg
LINQ to SQL 语句(3) 之 Count/Sum/Min/Max/Avg [1] Count/Sum 讲解 [2] Min 讲解 [3] Max 讲解 [4] Average 和 Agg ...
- [转]LINQ语句之Select/Distinct和Count/Sum/Min/Max/Avg
在讲述了LINQ,顺便说了一下Where操作,这篇开始我们继续说LINQ语句,目的让大家从语句的角度了解LINQ,LINQ包括LINQ to Objects.LINQ to DataSets.LINQ ...
随机推荐
- javascript中的Promise使用
参考自: http://m.jb51.net/article/102642.htm 1.基本用法: (1).首先我们new一个Promise,将Promise实例化 (2).然后在实例化的promis ...
- 初始化angularJS之ng-app的自动绑定和手动绑定
在传统的angularJS应用中,都是通过ng-app把angular应用绑定到某个dom上,这样做会把js代码入侵到html上,angular提供了手动启动的API--angular.bootstr ...
- python安装第三方库
在编写爬虫程序时发现unsolved import 一时不解,以为是ide出问题了,其实是没有安装第三方库导致的. 于是到https://pypi.python.org/pypi/requests/去 ...
- 【Django】 视图层说明
[Django视图层] 视图层的主要工作是衔接HTTP请求,Python程序和HTML模板,使他们能够有机互相合作从模型层lou到数据并且反馈.说到视图层的工作就有以下几个方面要说 ■ URL映射 对 ...
- 网络通信 --> Linux 五种IO模型
Linux 五种IO模型 聊聊Linux 五种IO模型
- 通过修改然后commit的方式创建自己的镜像
创建自己的镜像:通过现有的镜像来创建自己的镜像.1.首先拉取一个镜像到本地$ sudo docker imagesREPOSITORY TAG IMA ...
- Python类中的self到底是干啥的
Python类中的self到底是干啥的 Python编写类的时候,每个函数参数第一个参数都是self,一开始我不管它到底是干嘛的,只知道必须要写上.后来对Python渐渐熟悉了一点,再回头看self的 ...
- WCF配置问题(配置WCF跨域)
其它的先放一边.今天先来分享一下前段时间给公司做网站WCF服务接口的心得. 配置文件的配置问题 这里既然讨论WCF配置文件的问题,那么怎么创建的就不一一讲解了.好多博主都有提过的.所以直接分享自己开发 ...
- shiro(三),使用第三方jdbcRealm连接数据库操作
这里采用第三方实现好的JdbcRealm连接数据库:首先来看一下源码: 接着前面的说:就把这个类当做我们自己写的就好了,我们需要实例化它,然后给他注入一个数据源 下面是ini文件配置 [main] # ...
- qt中控件的使用函数
1.Text Edit编辑框 //将编辑框中的内容转化成Utf8编码 ui->textEdit->toPlainText().toUtf8(); 2.Combo Box下拉框的应用 (1) ...