Hive函数:SUM,AVG,MIN,MAX
转自:http://lxw1234.com/archives/2015/04/176.htm,Hive分析窗口函数(一) SUM,AVG,MIN,MAX
之前看到大数据田地有关于max()over(partition by)的用法,今天恰好工作中用到了它,但是使用中遇到了一个问题:在max(rsrp)over(partition by buildingid,height) as max_rsrp返回的结果不是分组中的最大值。最中找到了问题的原因:max_rsrp数据类型为string而不是double类型,导致的一个bug问题。
再处理的过程中也再次把大数据田地的中关于sum,avg,max,min的函数用法做了demo,因此有了该参考后的文章。
数据准备:
- echo ''>data_file.txt
- vim data_file.txt
- cookie1,2015-04-10,1
- cookie1,2015-04-11,5
- cookie1,2015-04-12,7
- cookie1,2015-04-13,3
- cookie1,2015-04-14,2
- cookie1,2015-04-15,4
- cookie1,2015-04-16,4
- cookie2,2015-04-10,6
- cookie2,2015-04-11,5
- cookie2,2015-04-12,7
- cookie2,2015-04-13,4
- cookie2,2015-04-14,3
- cookie2,2015-04-15,5
- cookie2,2015-04-16,5
- hadoop fs -rm -r /user/jrf/test_data
- hadoop fs -mkdir /user/jrf/test_data
- hadoop fs -copyFromLocal data_file.txt /user/jrf/test_data/
- drop table if exists test_data;
- create EXTERNAL TABLE test_data (
- cookieid string,
- createtime string, --day
- pv INT
- ) ROW FORMAT DELIMITED
- FIELDS TERMINATED BY ','
- stored as textfile location '/user/jrf/test_data/';
- select * from test_data;
- +---------------------+-----------------------+---------------+--+
- | test_data.cookieid | test_data.createtime | test_data.pv |
- +---------------------+-----------------------+---------------+--+
- | cookie1 | 2015-04-10 | 1 |
- | cookie1 | 2015-04-11 | 5 |
- | cookie1 | 2015-04-12 | 7 |
- | cookie1 | 2015-04-13 | 3 |
- | cookie1 | 2015-04-14 | 2 |
- | cookie1 | 2015-04-15 | 4 |
- | cookie1 | 2015-04-16 | 4 |
- | cookie2 | 2015-04-10 | 6 |
- | cookie2 | 2015-04-11 | 5 |
- | cookie2 | 2015-04-12 | 7 |
- | cookie2 | 2015-04-13 | 4 |
- | cookie2 | 2015-04-14 | 3 |
- | cookie2 | 2015-04-15 | 5 |
- | cookie2 | 2015-04-16 | 5 |
- +---------------------+-----------------------+---------------+--+
SUM — 注意,结果和ORDER BY相关,默认为升序
- SELECT cookieid,createtime,pv,
- SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
- SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
- SUM(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行
- SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4,--当前行+往前3行
- SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5,--当前行+往前3行+往后1行
- SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行
- FROM test_data order by cookieid,createtime;
- +-----------+-------------+-----+------+------+------+------+------+------+--+
- | cookieid | createtime | pv | pv1 | pv2 | pv3 | pv4 | pv5 | pv6 |
- +-----------+-------------+-----+------+------+------+------+------+------+--+
- | cookie1 | 2015-04-10 | 1 | 1 | 1 | 26 | 1 | 6 | 26 |
- | cookie1 | 2015-04-11 | 5 | 6 | 6 | 26 | 6 | 13 | 25 |
- | cookie1 | 2015-04-12 | 7 | 13 | 13 | 26 | 13 | 16 | 20 |
- | cookie1 | 2015-04-13 | 3 | 16 | 16 | 26 | 16 | 18 | 13 |
- | cookie1 | 2015-04-14 | 2 | 18 | 18 | 26 | 17 | 21 | 10 |
- | cookie1 | 2015-04-15 | 4 | 22 | 22 | 26 | 16 | 20 | 8 |
- | cookie1 | 2015-04-16 | 4 | 26 | 26 | 26 | 13 | 13 | 4 |
- | cookie2 | 2015-04-10 | 6 | 6 | 6 | 35 | 6 | 11 | 35 |
- | cookie2 | 2015-04-11 | 5 | 11 | 11 | 35 | 11 | 18 | 29 |
- | cookie2 | 2015-04-12 | 7 | 18 | 18 | 35 | 18 | 22 | 24 |
- | cookie2 | 2015-04-13 | 4 | 22 | 22 | 35 | 22 | 25 | 17 |
- | cookie2 | 2015-04-14 | 3 | 25 | 25 | 35 | 19 | 24 | 13 |
- | cookie2 | 2015-04-15 | 5 | 30 | 30 | 35 | 19 | 24 | 10 |
- | cookie2 | 2015-04-16 | 5 | 35 | 35 | 35 | 17 | 17 | 5 |
- +-----------+-------------+-----+------+------+------+------+------+------+--+
- pv1: 分组内从起点到当前行的pv累积,如,11号的pv1=10号的pv+11号的pv, 12号=10号+11号+12号
- pv2: 同pv1
- pv3: 分组内(cookie1)所有的pv累加
- pv4: 分组内当前行+往前3行,如,11号=10号+11号, 12号=10号+11号+12号, 13号=10号+11号+12号+13号, 14号=11号+12号+13号+14号
- pv5: 分组内当前行+往前3行+往后1行,如,14号=11号+12号+13号+14号+15号=5+7+3+2+4=21
- pv6: 分组内当前行+往后所有行,如,13号=13号+14号+15号+16号=3+2+4+4=13,14号=14号+15号+16号=2+4+4=10
如果不指定ROWS BETWEEN,默认为从起点到当前行;
如果不指定ORDER BY,则将分组内所有值累加;
关键是理解ROWS BETWEEN含义,也叫做WINDOW子句:
PRECEDING:往前
FOLLOWING:往后
CURRENT ROW:当前行
UNBOUNDED:起点,UNBOUNDED PRECEDING 表示从前面的起点, UNBOUNDED FOLLOWING:表示到后面的终点
–其他AVG,MIN,MAX,和SUM用法一样。
- --AVG
- SELECT cookieid,createtime,pv,
- AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
- AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
- AVG(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行
- AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4,--当前行+往前3行
- AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5,--当前行+往前3行+往后1行
- AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行
- FROM test_data order by cookieid,createtime;
- +-----------+-------------+-----+---------------------+---------------------+---------------------+--------------------+--------------------+---------------------+--+
- | cookieid | createtime | pv | pv1 | pv2 | pv3 | pv4 | pv5 | pv6 |
- +-----------+-------------+-----+---------------------+---------------------+---------------------+--------------------+--------------------+---------------------+--+
- | cookie1 | 2015-04-10 | 1 | 1.0 | 1.0 | 3.7142857142857144 | 1.0 | 3.0 | 3.7142857142857144 |
- | cookie1 | 2015-04-11 | 5 | 3.0 | 3.0 | 3.7142857142857144 | 3.0 | 4.333333333333333 | 4.166666666666667 |
- | cookie1 | 2015-04-12 | 7 | 4.333333333333333 | 4.333333333333333 | 3.7142857142857144 | 4.333333333333333 | 4.0 | 4.0 |
- | cookie1 | 2015-04-13 | 3 | 4.0 | 4.0 | 3.7142857142857144 | 4.0 | 3.6 | 3.25 |
- | cookie1 | 2015-04-14 | 2 | 3.6 | 3.6 | 3.7142857142857144 | 4.25 | 4.2 | 3.3333333333333335 |
- | cookie1 | 2015-04-15 | 4 | 3.6666666666666665 | 3.6666666666666665 | 3.7142857142857144 | 4.0 | 4.0 | 4.0 |
- | cookie1 | 2015-04-16 | 4 | 3.7142857142857144 | 3.7142857142857144 | 3.7142857142857144 | 3.25 | 3.25 | 4.0 |
- | cookie2 | 2015-04-10 | 6 | 6.0 | 6.0 | 5.0 | 6.0 | 5.5 | 5.0 |
- | cookie2 | 2015-04-11 | 5 | 5.5 | 5.5 | 5.0 | 5.5 | 6.0 | 4.833333333333333 |
- | cookie2 | 2015-04-12 | 7 | 6.0 | 6.0 | 5.0 | 6.0 | 5.5 | 4.8 |
- | cookie2 | 2015-04-13 | 4 | 5.5 | 5.5 | 5.0 | 5.5 | 5.0 | 4.25 |
- | cookie2 | 2015-04-14 | 3 | 5.0 | 5.0 | 5.0 | 4.75 | 4.8 | 4.333333333333333 |
- | cookie2 | 2015-04-15 | 5 | 5.0 | 5.0 | 5.0 | 4.75 | 4.8 | 5.0 |
- | cookie2 | 2015-04-16 | 5 | 5.0 | 5.0 | 5.0 | 4.25 | 4.25 | 5.0 |
- +-----------+-------------+-----+---------------------+---------------------+---------------------+--------------------+--------------------+---------------------+--+
- --MIN
- SELECT cookieid,createtime,pv,
- MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
- MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2,--从起点到当前行,结果同pv1
- MIN(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行
- MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4,--当前行+往前3行
- MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5,--当前行+往前3行+往后1行
- MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行
- FROM test_data order by cookieid,createtime;
- +-----------+-------------+-----+------+------+------+------+------+------+--+
- | cookieid | createtime | pv | pv1 | pv2 | pv3 | pv4 | pv5 | pv6 |
- +-----------+-------------+-----+------+------+------+------+------+------+--+
- | cookie1 | 2015-04-10 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
- | cookie1 | 2015-04-11 | 5 | 1 | 1 | 1 | 1 | 1 | 2 |
- | cookie1 | 2015-04-12 | 7 | 1 | 1 | 1 | 1 | 1 | 2 |
- | cookie1 | 2015-04-13 | 3 | 1 | 1 | 1 | 1 | 1 | 2 |
- | cookie1 | 2015-04-14 | 2 | 1 | 1 | 1 | 2 | 2 | 2 |
- | cookie1 | 2015-04-15 | 4 | 1 | 1 | 1 | 2 | 2 | 4 |
- | cookie1 | 2015-04-16 | 4 | 1 | 1 | 1 | 2 | 2 | 4 |
- | cookie2 | 2015-04-10 | 6 | 6 | 6 | 3 | 6 | 5 | 3 |
- | cookie2 | 2015-04-11 | 5 | 5 | 5 | 3 | 5 | 5 | 3 |
- | cookie2 | 2015-04-12 | 7 | 5 | 5 | 3 | 5 | 4 | 3 |
- | cookie2 | 2015-04-13 | 4 | 4 | 4 | 3 | 4 | 3 | 3 |
- | cookie2 | 2015-04-14 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
- | cookie2 | 2015-04-15 | 5 | 3 | 3 | 3 | 3 | 3 | 5 |
- | cookie2 | 2015-04-16 | 5 | 3 | 3 | 3 | 3 | 3 | 5 |
- +-----------+-------------+-----+------+------+------+------+------+------+--+
- --MAX
- SELECT cookieid,createtime,pv,
- MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
- MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
- MAX(pv) OVER(PARTITION BY cookieid) AS pv3, --分组内所有行
- MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3行
- MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3行+往后1行
- MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行
- FROM test_data order by cookieid,createtime;
- +-----------+-------------+-----+------+------+------+------+------+------+--+
- | cookieid | createtime | pv | pv1 | pv2 | pv3 | pv4 | pv5 | pv6 |
- +-----------+-------------+-----+------+------+------+------+------+------+--+
- | cookie1 | 2015-04-10 | 1 | 1 | 1 | 7 | 1 | 5 | 7 |
- | cookie1 | 2015-04-11 | 5 | 5 | 5 | 7 | 5 | 7 | 7 |
- | cookie1 | 2015-04-12 | 7 | 7 | 7 | 7 | 7 | 7 | 7 |
- | cookie1 | 2015-04-13 | 3 | 7 | 7 | 7 | 7 | 7 | 4 |
- | cookie1 | 2015-04-14 | 2 | 7 | 7 | 7 | 7 | 7 | 4 |
- | cookie1 | 2015-04-15 | 4 | 7 | 7 | 7 | 7 | 7 | 4 |
- | cookie1 | 2015-04-16 | 4 | 7 | 7 | 7 | 4 | 4 | 4 |
- | cookie2 | 2015-04-10 | 6 | 6 | 6 | 7 | 6 | 6 | 7 |
- | cookie2 | 2015-04-11 | 5 | 6 | 6 | 7 | 6 | 7 | 7 |
- | cookie2 | 2015-04-12 | 7 | 7 | 7 | 7 | 7 | 7 | 7 |
- | cookie2 | 2015-04-13 | 4 | 7 | 7 | 7 | 7 | 7 | 5 |
- | cookie2 | 2015-04-14 | 3 | 7 | 7 | 7 | 7 | 7 | 5 |
- | cookie2 | 2015-04-15 | 5 | 7 | 7 | 7 | 7 | 7 | 5 |
- | cookie2 | 2015-04-16 | 5 | 7 | 7 | 7 | 5 | 5 | 5 |
- +-----------+-------------+-----+------+------+------+------+------+------+--+
- SELECT cookieid,
- createtime,
- pv,
- min(pv) OVER(PARTITION BY cookieid) AS min_pv,
- max(pv) OVER(PARTITION BY cookieid) AS max_pv
- FROM test_data;
- +-----------+-------------+-----+---------+---------+--+
- | cookieid | createtime | pv | min_pv | max_pv |
- +-----------+-------------+-----+---------+---------+--+
- | cookie1 | 2015-04-10 | 1 | 1 | 7 |
- | cookie1 | 2015-04-16 | 4 | 1 | 7 |
- | cookie1 | 2015-04-15 | 4 | 1 | 7 |
- | cookie1 | 2015-04-14 | 2 | 1 | 7 |
- | cookie1 | 2015-04-13 | 3 | 1 | 7 |
- | cookie1 | 2015-04-12 | 7 | 1 | 7 |
- | cookie1 | 2015-04-11 | 5 | 1 | 7 |
- | cookie2 | 2015-04-16 | 5 | 3 | 7 |
- | cookie2 | 2015-04-15 | 5 | 3 | 7 |
- | cookie2 | 2015-04-14 | 3 | 3 | 7 |
- | cookie2 | 2015-04-13 | 4 | 3 | 7 |
- | cookie2 | 2015-04-12 | 7 | 3 | 7 |
- | cookie2 | 2015-04-11 | 5 | 3 | 7 |
- | cookie2 | 2015-04-10 | 6 | 3 | 7 |
- +-----------+-------------+-----+---------+---------+--+
Hive函数:SUM,AVG,MIN,MAX的更多相关文章
- Hive分析窗口函数(一) SUM,AVG,MIN,MAX
Hive分析窗口函数(一) SUM,AVG,MIN,MAX Hive分析窗口函数(一) SUM,AVG,MIN,MAX Hive中提供了越来越多的分析函数,用于完成负责的统计分析.抽时间将所有的分析窗 ...
- Hive学习之路 (十三)Hive分析窗口函数(一) SUM,AVG,MIN,MAX
数据准备 数据格式 cookie1,, cookie1,, cookie1,, cookie1,, cookie1,, cookie1,, cookie1,, 创建数据库及表 create datab ...
- MybatisPlus Lambda表达式 聚合查询 分组查询 COUNT SUM AVG MIN MAX GroupBy
一.序言 众所周知,MybatisPlus在处理单表DAO操作时非常的方便.在处理多表连接连接查询也有优雅的解决方案.今天分享MybatisPlus基于Lambda表达式优雅实现聚合分组查询. 由于视 ...
- C# 中奇妙的函数–6. 五个序列聚合运算(Sum, Average, Min, Max,Aggregate)
今天,我们将着眼于五个用于序列的聚合运算.很多时候当我们在对序列进行操作时,我们想要做基于这些序列执行某种汇总然后,计算结果. Enumerable 静态类的LINQ扩展方法可以做到这一点 .就像之前 ...
- SQL模糊查询,sum,AVG,MAX,min函数
cmd mysql -hlocalhost -uroot -p select * from emp where ename like '___' -- 三个横线, - 代表字符,可以查询 三个enam ...
- 三、函数 (SUM、MIN、MAX、COUNT、AVG)
第八章 使用数据处理函数 8.1 函数 SQL支持利用函数来处理数据.函数一般是在数据上执行的,给数据的转换和处理提供了方便. 每一个DBMS都有特定的函数.只有少数几个函数被所有主要的DBMS等同的 ...
- LINQ to SQL Count/Sum/Min/Max/Avg Join
public class Linq { MXSICEDataContext Db = new MXSICEDataContext(); // LINQ to SQL // Count/Sum/Min/ ...
- LINQ to SQL 语句(3) 之 Count/Sum/Min/Max/Avg
LINQ to SQL 语句(3) 之 Count/Sum/Min/Max/Avg [1] Count/Sum 讲解 [2] Min 讲解 [3] Max 讲解 [4] Average 和 Agg ...
- [转]LINQ语句之Select/Distinct和Count/Sum/Min/Max/Avg
在讲述了LINQ,顺便说了一下Where操作,这篇开始我们继续说LINQ语句,目的让大家从语句的角度了解LINQ,LINQ包括LINQ to Objects.LINQ to DataSets.LINQ ...
随机推荐
- UWP:可滚动的PivotHeader
UWP开发里,Pivot真是个令人又爱又恨的控件.为了实现某些可滚动Header的效果,有些大佬甚至去掉了原本的Header,使用一个ListView或者ListBox自己画Header,不过这样会让 ...
- 笔记:Spring Cloud Ribbon 客户端配置详解
自动化配置 由于 Ribbon 中定义的每一个接口都有多种不同的策略实现,同时这些接口之间又有一定的依赖关系,Spring Cloud Ribbon 中的自动化配置能够很方便的自动化构建接口的具体实现 ...
- poj-1146 ID codes
Description It is 2084 and the year of Big Brother has finally arrived, albeit a century late. In or ...
- day1-计算机基础
第一单元 计算机组成原理 一.概念及过程 1.进行逻辑和数值高速计算的计算机器,有存储功能,能按照程序自动执行,且能够处理海量数据的现代化电子设备. 2.发展过程 数学运算:算盘,帕斯卡的齿轮装置, ...
- 网络通信 --> 互联网协议(二)
互联网协议(二) 一.对上一节的总结 我们已经知道,网络通信就是交换数据包.电脑A向电脑B发送一个数据包,后者收到了,回复一个数据包,从而实现两台电脑之间的通信.数据包的结构,基本上是下面这样: 发送 ...
- RxJS -- Subscription
Subscription是什么? 当subscribe一个observable的时候, 返回的就是一个subscription. 它是一个一次性对象(disposable), 它有一个非常重要的方法 ...
- [poj3468]A Simple Problem with Integers_线段树
A Simple Problem with Integers 题目大意:给出n个数,区间加.查询区间和. 注释:1<=n,q<=100,000.(q为操作次数). 想法:嗯...学了这么长 ...
- 第二次作业-关于Steam游戏平台的简单分析
1.1 Steam平台的简单介绍 你选择的产品是? 如题,这次的作业我选择了Steam作为分析的对象. 为什么选择该产品作为分析? 我选择数字游戏贩售平台STEAM作为分析对象的原因有以下几点: 1. ...
- 学号:201621123032 《Java程序设计》第5周学习总结
1:本周学习总结 1.1: 写出你认为本周学习中比较重要的知识点关键词 接口interface,comparator接口和comparable接口. 1.2:尝试使用思维导图将这些关键词组织起来. 2 ...
- PTA題目的處理(二)
題目7-1 計算分段函數[1] 1.實驗代碼 #include <stdio.h> int main() { float x,y; scanf("%f",&x) ...