Hive分析窗口函数(一) SUM,AVG,MIN,MAX
Hive分析窗口函数(一) SUM,AVG,MIN,MAX
Hive分析窗口函数(一) SUM,AVG,MIN,MAX
Hive中提供了越来越多的分析函数,用于完成负责的统计分析。抽时间将所有的分析窗口函数理一遍,将陆续发布。
今天先看几个基础的,SUM、AVG、MIN、MAX。
用于实现分组内所有和连续累积的统计。
数据准备
- CREATE EXTERNAL TABLE lxw1234 (
- cookieid string,
- createtime string, --day
- pv INT
- ) ROW FORMAT DELIMITED
- FIELDS TERMINATED BY ','
- stored as textfile location '/tmp/lxw11/';
- DESC lxw1234;
- cookieid STRING
- createtime STRING
- pv INT
- hive> select * from lxw1234;
- OK
- cookie1 2015-04-10 1
- cookie1 2015-04-11 5
- cookie1 2015-04-12 7
- cookie1 2015-04-13 3
- cookie1 2015-04-14 2
- cookie1 2015-04-15 4
- cookie1 2015-04-16 4
SUM — 注意,结果和ORDER BY相关,默认为升序
- SELECT cookieid,
- createtime,
- pv,
- SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
- SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
- SUM(pv) OVER(PARTITION BY cookieid) AS pv3, --分组内所有行
- SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3行
- SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3行+往后1行
- SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 ---当前行+往后所有行
- FROM lxw1234;
- cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6
- -----------------------------------------------------------------------------
- cookie1 2015-04-10 1 1 1 26 1 6 26
- cookie1 2015-04-11 5 6 6 26 6 13 25
- cookie1 2015-04-12 7 13 13 26 13 16 20
- cookie1 2015-04-13 3 16 16 26 16 18 13
- cookie1 2015-04-14 2 18 18 26 17 21 10
- cookie1 2015-04-15 4 22 22 26 16 20 8
- cookie1 2015-04-16 4 26 26 26 13 13 4
pv1: 分组内从起点到当前行的pv累积,如,11号的pv1=10号的pv+11号的pv, 12号=10号+11号+12号
pv2: 同pv1
pv3: 分组内(cookie1)所有的pv累加
pv4: 分组内当前行+往前3行,如,11号=10号+11号, 12号=10号+11号+12号, 13号=10号+11号+12号+13号, 14号=11号+12号+13号+14号
pv5: 分组内当前行+往前3行+往后1行,如,14号=11号+12号+13号+14号+15号=5+7+3+2+4=21
pv6: 分组内当前行+往后所有行,如,13号=13号+14号+15号+16号=3+2+4+4=13,14号=14号+15号+16号=2+4+4=10
如果不指定ROWS BETWEEN,默认为从起点到当前行;
如果不指定ORDER BY,则将分组内所有值累加;
关键是理解ROWS BETWEEN含义,也叫做WINDOW子句:
PRECEDING:往前
FOLLOWING:往后
CURRENT ROW:当前行
UNBOUNDED:起点,UNBOUNDED PRECEDING 表示从前面的起点, UNBOUNDED FOLLOWING:表示到后面的终点
–其他AVG,MIN,MAX,和SUM用法一样。
- --AVG
- SELECT cookieid,
- createtime,
- pv,
- AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
- AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
- AVG(pv) OVER(PARTITION BY cookieid) AS pv3, --分组内所有行
- AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3行
- AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3行+往后1行
- AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 ---当前行+往后所有行
- FROM lxw1234;
- cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6
- -----------------------------------------------------------------------------
- cookie1 2015-04-10 1 1.0 1.0 3.7142857142857144 1.0 3.0 3.7142857142857144
- cookie1 2015-04-11 5 3.0 3.0 3.7142857142857144 3.0 4.333333333333333 4.166666666666667
- cookie1 2015-04-12 7 4.333333333333333 4.333333333333333 3.7142857142857144 4.333333333333333 4.0 4.0
- cookie1 2015-04-13 3 4.0 4.0 3.7142857142857144 4.0 3.6 3.25
- cookie1 2015-04-14 2 3.6 3.6 3.7142857142857144 4.25 4.2 3.3333333333333335
- cookie1 2015-04-15 4 3.6666666666666665 3.6666666666666665 3.7142857142857144 4.0 4.0 4.0
- cookie1 2015-04-16 4 3.7142857142857144 3.7142857142857144 3.7142857142857144 3.25 3.25 4.0
- --MIN
- SELECT cookieid,
- createtime,
- pv,
- MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
- MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
- MIN(pv) OVER(PARTITION BY cookieid) AS pv3, --分组内所有行
- MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3行
- MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3行+往后1行
- MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 ---当前行+往后所有行
- FROM lxw1234;
- cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6
- -----------------------------------------------------------------------------
- cookie1 2015-04-10 1 1 1 1 1 1 1
- cookie1 2015-04-11 5 1 1 1 1 1 2
- cookie1 2015-04-12 7 1 1 1 1 1 2
- cookie1 2015-04-13 3 1 1 1 1 1 2
- cookie1 2015-04-14 2 1 1 1 2 2 2
- cookie1 2015-04-15 4 1 1 1 2 2 4
- cookie1 2015-04-16 4 1 1 1 2 2 4
- ----MAX
- SELECT cookieid,
- createtime,
- pv,
- MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
- MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
- MAX(pv) OVER(PARTITION BY cookieid) AS pv3, --分组内所有行
- MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3行
- MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3行+往后1行
- MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 ---当前行+往后所有行
- FROM lxw1234;
- cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6
- -----------------------------------------------------------------------------
- cookie1 2015-04-10 1 1 1 7 1 5 7
- cookie1 2015-04-11 5 5 5 7 5 7 7
- cookie1 2015-04-12 7 7 7 7 7 7 7
- cookie1 2015-04-13 3 7 7 7 7 7 4
- cookie1 2015-04-14 2 7 7 7 7 7 4
- cookie1 2015-04-15 4 7 7 7 7 7 4
- cookie1 2015-04-16 4 7 7 7 4 4 4
其他函数的介绍将陆续整理发布。。
Hive分析窗口函数(一) SUM,AVG,MIN,MAX的更多相关文章
- Hive学习之路 (十三)Hive分析窗口函数(一) SUM,AVG,MIN,MAX
数据准备 数据格式 cookie1,, cookie1,, cookie1,, cookie1,, cookie1,, cookie1,, cookie1,, 创建数据库及表 create datab ...
- Hive函数:SUM,AVG,MIN,MAX
转自:http://lxw1234.com/archives/2015/04/176.htm,Hive分析窗口函数(一) SUM,AVG,MIN,MAX 之前看到大数据田地有关于max()over(p ...
- MybatisPlus Lambda表达式 聚合查询 分组查询 COUNT SUM AVG MIN MAX GroupBy
一.序言 众所周知,MybatisPlus在处理单表DAO操作时非常的方便.在处理多表连接连接查询也有优雅的解决方案.今天分享MybatisPlus基于Lambda表达式优雅实现聚合分组查询. 由于视 ...
- Hive分析窗口函数
数据准备 CREATE EXTERNAL TABLE lxw1234 ( cookieid string, createtime string, --day pv INT ) ROW FORMAT D ...
- Hive(七)Hive分析窗口函数
一数据准备 cookie1,2015-04-10,1 cookie1,2015-04-11,5 cookie1,2015-04-12,7 cookie1,2015-04-13,3 cookie1,20 ...
- C# 中奇妙的函数–6. 五个序列聚合运算(Sum, Average, Min, Max,Aggregate)
今天,我们将着眼于五个用于序列的聚合运算.很多时候当我们在对序列进行操作时,我们想要做基于这些序列执行某种汇总然后,计算结果. Enumerable 静态类的LINQ扩展方法可以做到这一点 .就像之前 ...
- Hive学习之路 (十五)Hive分析窗口函数(三) CUME_DIST和PERCENT_RANK
这两个序列分析函数不是很常用,这里也练习一下. 数据准备 数据格式 cookie3.txt d1,user1, d1,user2, d1,user3, d2,user4, d2,user5, 创建表 ...
- Hive学习之路 (十七)Hive分析窗口函数(五) GROUPING SETS、GROUPING__ID、CUBE和ROLLUP
概述 GROUPING SETS,GROUPING__ID,CUBE,ROLLUP 这几个分析函数通常用于OLAP中,不能累加,而且需要根据不同维度上钻和下钻的指标统计,比如,分小时.天.月的UV数. ...
- Hive学习之路 (十六)Hive分析窗口函数(四) LAG、LEAD、FIRST_VALUE和LAST_VALUE
数据准备 数据格式 cookie4.txt cookie1, ::,url2 cookie1, ::,url1 cookie1, ::,1url3 cookie1, ::,url6 cookie1, ...
随机推荐
- [BZOJ 1535] [Luogu 3426]SZA-Template (KMP+fail树+双向链表)
[BZOJ 1535] [Luogu 3426]SZA-Template (KMP+fail树+双向链表) 题面 Byteasar 想在墙上涂一段很长的字符,他为了做这件事从字符的前面一段中截取了一段 ...
- java CAS和AQS
全面了解Java中的CAS机制 https://www.jb51.net/article/125232.htm https://www.cnblogs.com/javalyy/p/8882172.ht ...
- volatile关键字?MESI协议?指令重排?内存屏障?这都是啥玩意
一.摘要 三级缓存,MESI缓存一致性协议,指令重排,内存屏障,JMM,volatile.单拿一个出来,想必大家对这些概念应该有一定了解.但是这些东西有什么必然的联系,或者他们之间究竟有什么前世今生想 ...
- node.js中使用imagemagick进行图片裁剪压缩
node.js中使用imagemagick进行图片裁剪压缩 安装imagemagick sudo apt-get install imagemagick or wget http://www.imag ...
- 关于React中props与state的一知半解
props props英文翻译是道具的意思,我个人理解为参数,如果我们将react组件看作是一个函数,那么props便是函数接收外部数据所使用的参数.props具有以下特性: 1.不可变(只读性) p ...
- GitHub源码攻击事件
黑客擦除了微软多达392个代码存储库,并提出勒索要求.此前,黑客攻击了包含微软在内的大批受害者的Git存储库,删除了所有源代码和最近提交的内容,并留下了支持比特币支付的赎金票据. 勒索信息如下: “要 ...
- MATLAB仿真 让波形动起来
dt=1e-6;T=2*1e-3;for N=0:500; t=N*T+(0:dt:T); input=2*cos(2*pi*1005*t); carrier=5*cos(2*pi*(1e4)*t+0 ...
- SQL Server设置启动存储过程
--设置开关 启动程序自动运行存储过程必须启动该命令 sp_configure "show advanced options",1; go reconfigure; go --设置 ...
- robotframework ride报错 Keyword 'BuiltIn.Log' expected 1 to 5 arguments, got 12.
错误原因,else和else if使用了小写,必须使用大写才能识别到.
- Permute Digits
You are given two positive integer numbers a and b. Permute (change order) of the digits of a to con ...