转自:http://lxw1234.com/archives/2015/04/176.htm,Hive分析窗口函数(一) SUM,AVG,MIN,MAX

之前看到大数据田地有关于max()over(partition by)的用法,今天恰好工作中用到了它,但是使用中遇到了一个问题:在max(rsrp)over(partition by buildingid,height) as max_rsrp返回的结果不是分组中的最大值。最中找到了问题的原因:max_rsrp数据类型为string而不是double类型,导致的一个bug问题。

再处理的过程中也再次把大数据田地的中关于sum,avg,max,min的函数用法做了demo,因此有了该参考后的文章。

数据准备:

  1. echo ''>data_file.txt
  2. vim data_file.txt
  3. cookie1,2015-04-10,1
  4. cookie1,2015-04-11,5
  5. cookie1,2015-04-12,7
  6. cookie1,2015-04-13,3
  7. cookie1,2015-04-14,2
  8. cookie1,2015-04-15,4
  9. cookie1,2015-04-16,4
  10. cookie2,2015-04-10,6
  11. cookie2,2015-04-11,5
  12. cookie2,2015-04-12,7
  13. cookie2,2015-04-13,4
  14. cookie2,2015-04-14,3
  15. cookie2,2015-04-15,5
  16. cookie2,2015-04-16,5
  17. hadoop fs -rm -r /user/jrf/test_data
  18. hadoop fs -mkdir /user/jrf/test_data
  19. hadoop fs -copyFromLocal data_file.txt /user/jrf/test_data/
  1. drop table if exists test_data;
  2. create EXTERNAL TABLE test_data (
  3. cookieid string,
  4. createtime string, --day
  5. pv INT
  6. ) ROW FORMAT DELIMITED
  7. FIELDS TERMINATED BY ','
  8. stored as textfile location '/user/jrf/test_data/';
  9. select * from test_data;
  10. +---------------------+-----------------------+---------------+--+
  11. | test_data.cookieid | test_data.createtime | test_data.pv |
  12. +---------------------+-----------------------+---------------+--+
  13. | cookie1 | 2015-04-10 | 1 |
  14. | cookie1 | 2015-04-11 | 5 |
  15. | cookie1 | 2015-04-12 | 7 |
  16. | cookie1 | 2015-04-13 | 3 |
  17. | cookie1 | 2015-04-14 | 2 |
  18. | cookie1 | 2015-04-15 | 4 |
  19. | cookie1 | 2015-04-16 | 4 |
  20. | cookie2 | 2015-04-10 | 6 |
  21. | cookie2 | 2015-04-11 | 5 |
  22. | cookie2 | 2015-04-12 | 7 |
  23. | cookie2 | 2015-04-13 | 4 |
  24. | cookie2 | 2015-04-14 | 3 |
  25. | cookie2 | 2015-04-15 | 5 |
  26. | cookie2 | 2015-04-16 | 5 |
  27. +---------------------+-----------------------+---------------+--+

SUM — 注意,结果和ORDER BY相关,默认为升序

  1. SELECT cookieid,createtime,pv,
  2. SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
  3. SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
  4. SUM(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行
  5. SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4,--当前行+往前3
  6. SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5,--当前行+往前3行+往后1
  7. SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行
  8. FROM test_data order by cookieid,createtime;
  9. +-----------+-------------+-----+------+------+------+------+------+------+--+
  10. | cookieid | createtime | pv | pv1 | pv2 | pv3 | pv4 | pv5 | pv6 |
  11. +-----------+-------------+-----+------+------+------+------+------+------+--+
  12. | cookie1 | 2015-04-10 | 1 | 1 | 1 | 26 | 1 | 6 | 26 |
  13. | cookie1 | 2015-04-11 | 5 | 6 | 6 | 26 | 6 | 13 | 25 |
  14. | cookie1 | 2015-04-12 | 7 | 13 | 13 | 26 | 13 | 16 | 20 |
  15. | cookie1 | 2015-04-13 | 3 | 16 | 16 | 26 | 16 | 18 | 13 |
  16. | cookie1 | 2015-04-14 | 2 | 18 | 18 | 26 | 17 | 21 | 10 |
  17. | cookie1 | 2015-04-15 | 4 | 22 | 22 | 26 | 16 | 20 | 8 |
  18. | cookie1 | 2015-04-16 | 4 | 26 | 26 | 26 | 13 | 13 | 4 |
  19. | cookie2 | 2015-04-10 | 6 | 6 | 6 | 35 | 6 | 11 | 35 |
  20. | cookie2 | 2015-04-11 | 5 | 11 | 11 | 35 | 11 | 18 | 29 |
  21. | cookie2 | 2015-04-12 | 7 | 18 | 18 | 35 | 18 | 22 | 24 |
  22. | cookie2 | 2015-04-13 | 4 | 22 | 22 | 35 | 22 | 25 | 17 |
  23. | cookie2 | 2015-04-14 | 3 | 25 | 25 | 35 | 19 | 24 | 13 |
  24. | cookie2 | 2015-04-15 | 5 | 30 | 30 | 35 | 19 | 24 | 10 |
  25. | cookie2 | 2015-04-16 | 5 | 35 | 35 | 35 | 17 | 17 | 5 |
  26. +-----------+-------------+-----+------+------+------+------+------+------+--+
  27. pv1: 分组内从起点到当前行的pv累积,如,11号的pv1=10号的pv+11号的pv, 12号=10号+11号+12
  28. pv2: pv1
  29. pv3: 分组内(cookie1)所有的pv累加
  30. pv4: 分组内当前行+往前3行,如,11号=10号+11号, 12号=10号+11号+12号, 13号=10号+11号+12号+13号, 14号=11号+12号+13号+14
  31. pv5: 分组内当前行+往前3行+往后1行,如,14号=11号+12号+13号+14号+15号=5+7+3+2+4=21
  32. pv6: 分组内当前行+往后所有行,如,13号=13号+14号+15号+16号=3+2+4+4=1314号=14号+15号+16号=2+4+4=10

如果不指定ROWS BETWEEN,默认为从起点到当前行;
如果不指定ORDER BY,则将分组内所有值累加;
关键是理解ROWS BETWEEN含义,也叫做WINDOW子句:
PRECEDING:往前
FOLLOWING:往后
CURRENT ROW:当前行
UNBOUNDED:起点,UNBOUNDED PRECEDING 表示从前面的起点, UNBOUNDED FOLLOWING:表示到后面的终点

–其他AVG,MIN,MAX,和SUM用法一样。

  1. --AVG
  2. SELECT cookieid,createtime,pv,
  3. AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
  4. AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
  5. AVG(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行
  6. AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4,--当前行+往前3
  7. AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5,--当前行+往前3行+往后1
  8. AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行
  9. FROM test_data order by cookieid,createtime;
  10. +-----------+-------------+-----+---------------------+---------------------+---------------------+--------------------+--------------------+---------------------+--+
  11. | cookieid | createtime | pv | pv1 | pv2 | pv3 | pv4 | pv5 | pv6 |
  12. +-----------+-------------+-----+---------------------+---------------------+---------------------+--------------------+--------------------+---------------------+--+
  13. | cookie1 | 2015-04-10 | 1 | 1.0 | 1.0 | 3.7142857142857144 | 1.0 | 3.0 | 3.7142857142857144 |
  14. | cookie1 | 2015-04-11 | 5 | 3.0 | 3.0 | 3.7142857142857144 | 3.0 | 4.333333333333333 | 4.166666666666667 |
  15. | cookie1 | 2015-04-12 | 7 | 4.333333333333333 | 4.333333333333333 | 3.7142857142857144 | 4.333333333333333 | 4.0 | 4.0 |
  16. | cookie1 | 2015-04-13 | 3 | 4.0 | 4.0 | 3.7142857142857144 | 4.0 | 3.6 | 3.25 |
  17. | cookie1 | 2015-04-14 | 2 | 3.6 | 3.6 | 3.7142857142857144 | 4.25 | 4.2 | 3.3333333333333335 |
  18. | cookie1 | 2015-04-15 | 4 | 3.6666666666666665 | 3.6666666666666665 | 3.7142857142857144 | 4.0 | 4.0 | 4.0 |
  19. | cookie1 | 2015-04-16 | 4 | 3.7142857142857144 | 3.7142857142857144 | 3.7142857142857144 | 3.25 | 3.25 | 4.0 |
  20. | cookie2 | 2015-04-10 | 6 | 6.0 | 6.0 | 5.0 | 6.0 | 5.5 | 5.0 |
  21. | cookie2 | 2015-04-11 | 5 | 5.5 | 5.5 | 5.0 | 5.5 | 6.0 | 4.833333333333333 |
  22. | cookie2 | 2015-04-12 | 7 | 6.0 | 6.0 | 5.0 | 6.0 | 5.5 | 4.8 |
  23. | cookie2 | 2015-04-13 | 4 | 5.5 | 5.5 | 5.0 | 5.5 | 5.0 | 4.25 |
  24. | cookie2 | 2015-04-14 | 3 | 5.0 | 5.0 | 5.0 | 4.75 | 4.8 | 4.333333333333333 |
  25. | cookie2 | 2015-04-15 | 5 | 5.0 | 5.0 | 5.0 | 4.75 | 4.8 | 5.0 |
  26. | cookie2 | 2015-04-16 | 5 | 5.0 | 5.0 | 5.0 | 4.25 | 4.25 | 5.0 |
  27. +-----------+-------------+-----+---------------------+---------------------+---------------------+--------------------+--------------------+---------------------+--+
  28. --MIN
  29. SELECT cookieid,createtime,pv,
  30. MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
  31. MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2,--从起点到当前行,结果同pv1
  32. MIN(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行
  33. MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4,--当前行+往前3
  34. MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5,--当前行+往前3行+往后1
  35. MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行
  36. FROM test_data order by cookieid,createtime;
  37. +-----------+-------------+-----+------+------+------+------+------+------+--+
  38. | cookieid | createtime | pv | pv1 | pv2 | pv3 | pv4 | pv5 | pv6 |
  39. +-----------+-------------+-----+------+------+------+------+------+------+--+
  40. | cookie1 | 2015-04-10 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
  41. | cookie1 | 2015-04-11 | 5 | 1 | 1 | 1 | 1 | 1 | 2 |
  42. | cookie1 | 2015-04-12 | 7 | 1 | 1 | 1 | 1 | 1 | 2 |
  43. | cookie1 | 2015-04-13 | 3 | 1 | 1 | 1 | 1 | 1 | 2 |
  44. | cookie1 | 2015-04-14 | 2 | 1 | 1 | 1 | 2 | 2 | 2 |
  45. | cookie1 | 2015-04-15 | 4 | 1 | 1 | 1 | 2 | 2 | 4 |
  46. | cookie1 | 2015-04-16 | 4 | 1 | 1 | 1 | 2 | 2 | 4 |
  47. | cookie2 | 2015-04-10 | 6 | 6 | 6 | 3 | 6 | 5 | 3 |
  48. | cookie2 | 2015-04-11 | 5 | 5 | 5 | 3 | 5 | 5 | 3 |
  49. | cookie2 | 2015-04-12 | 7 | 5 | 5 | 3 | 5 | 4 | 3 |
  50. | cookie2 | 2015-04-13 | 4 | 4 | 4 | 3 | 4 | 3 | 3 |
  51. | cookie2 | 2015-04-14 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
  52. | cookie2 | 2015-04-15 | 5 | 3 | 3 | 3 | 3 | 3 | 5 |
  53. | cookie2 | 2015-04-16 | 5 | 3 | 3 | 3 | 3 | 3 | 5 |
  54. +-----------+-------------+-----+------+------+------+------+------+------+--+
  55. --MAX
  56. SELECT cookieid,createtime,pv,
  57. MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
  58. MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
  59. MAX(pv) OVER(PARTITION BY cookieid) AS pv3, --分组内所有行
  60. MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3
  61. MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3行+往后1
  62. MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行
  63. FROM test_data order by cookieid,createtime;
  64. +-----------+-------------+-----+------+------+------+------+------+------+--+
  65. | cookieid | createtime | pv | pv1 | pv2 | pv3 | pv4 | pv5 | pv6 |
  66. +-----------+-------------+-----+------+------+------+------+------+------+--+
  67. | cookie1 | 2015-04-10 | 1 | 1 | 1 | 7 | 1 | 5 | 7 |
  68. | cookie1 | 2015-04-11 | 5 | 5 | 5 | 7 | 5 | 7 | 7 |
  69. | cookie1 | 2015-04-12 | 7 | 7 | 7 | 7 | 7 | 7 | 7 |
  70. | cookie1 | 2015-04-13 | 3 | 7 | 7 | 7 | 7 | 7 | 4 |
  71. | cookie1 | 2015-04-14 | 2 | 7 | 7 | 7 | 7 | 7 | 4 |
  72. | cookie1 | 2015-04-15 | 4 | 7 | 7 | 7 | 7 | 7 | 4 |
  73. | cookie1 | 2015-04-16 | 4 | 7 | 7 | 7 | 4 | 4 | 4 |
  74. | cookie2 | 2015-04-10 | 6 | 6 | 6 | 7 | 6 | 6 | 7 |
  75. | cookie2 | 2015-04-11 | 5 | 6 | 6 | 7 | 6 | 7 | 7 |
  76. | cookie2 | 2015-04-12 | 7 | 7 | 7 | 7 | 7 | 7 | 7 |
  77. | cookie2 | 2015-04-13 | 4 | 7 | 7 | 7 | 7 | 7 | 5 |
  78. | cookie2 | 2015-04-14 | 3 | 7 | 7 | 7 | 7 | 7 | 5 |
  79. | cookie2 | 2015-04-15 | 5 | 7 | 7 | 7 | 7 | 7 | 5 |
  80. | cookie2 | 2015-04-16 | 5 | 7 | 7 | 7 | 5 | 5 | 5 |
  81. +-----------+-------------+-----+------+------+------+------+------+------+--+
  82.  
  83. SELECT cookieid,
  84. createtime,
  85. pv,
  86. min(pv) OVER(PARTITION BY cookieid) AS min_pv,
  87. max(pv) OVER(PARTITION BY cookieid) AS max_pv
  88. FROM test_data;
  89. +-----------+-------------+-----+---------+---------+--+
  90. | cookieid | createtime | pv | min_pv | max_pv |
  91. +-----------+-------------+-----+---------+---------+--+
  92. | cookie1 | 2015-04-10 | 1 | 1 | 7 |
  93. | cookie1 | 2015-04-16 | 4 | 1 | 7 |
  94. | cookie1 | 2015-04-15 | 4 | 1 | 7 |
  95. | cookie1 | 2015-04-14 | 2 | 1 | 7 |
  96. | cookie1 | 2015-04-13 | 3 | 1 | 7 |
  97. | cookie1 | 2015-04-12 | 7 | 1 | 7 |
  98. | cookie1 | 2015-04-11 | 5 | 1 | 7 |
  99. | cookie2 | 2015-04-16 | 5 | 3 | 7 |
  100. | cookie2 | 2015-04-15 | 5 | 3 | 7 |
  101. | cookie2 | 2015-04-14 | 3 | 3 | 7 |
  102. | cookie2 | 2015-04-13 | 4 | 3 | 7 |
  103. | cookie2 | 2015-04-12 | 7 | 3 | 7 |
  104. | cookie2 | 2015-04-11 | 5 | 3 | 7 |
  105. | cookie2 | 2015-04-10 | 6 | 3 | 7 |
  106. +-----------+-------------+-----+---------+---------+--+

Hive函数:SUM,AVG,MIN,MAX的更多相关文章

  1. Hive分析窗口函数(一) SUM,AVG,MIN,MAX

    Hive分析窗口函数(一) SUM,AVG,MIN,MAX Hive分析窗口函数(一) SUM,AVG,MIN,MAX Hive中提供了越来越多的分析函数,用于完成负责的统计分析.抽时间将所有的分析窗 ...

  2. Hive学习之路 (十三)Hive分析窗口函数(一) SUM,AVG,MIN,MAX

    数据准备 数据格式 cookie1,, cookie1,, cookie1,, cookie1,, cookie1,, cookie1,, cookie1,, 创建数据库及表 create datab ...

  3. MybatisPlus Lambda表达式 聚合查询 分组查询 COUNT SUM AVG MIN MAX GroupBy

    一.序言 众所周知,MybatisPlus在处理单表DAO操作时非常的方便.在处理多表连接连接查询也有优雅的解决方案.今天分享MybatisPlus基于Lambda表达式优雅实现聚合分组查询. 由于视 ...

  4. C# 中奇妙的函数–6. 五个序列聚合运算(Sum, Average, Min, Max,Aggregate)

    今天,我们将着眼于五个用于序列的聚合运算.很多时候当我们在对序列进行操作时,我们想要做基于这些序列执行某种汇总然后,计算结果. Enumerable 静态类的LINQ扩展方法可以做到这一点 .就像之前 ...

  5. SQL模糊查询,sum,AVG,MAX,min函数

    cmd mysql -hlocalhost -uroot -p select * from emp where ename like '___' -- 三个横线, - 代表字符,可以查询 三个enam ...

  6. 三、函数 (SUM、MIN、MAX、COUNT、AVG)

    第八章 使用数据处理函数 8.1 函数 SQL支持利用函数来处理数据.函数一般是在数据上执行的,给数据的转换和处理提供了方便. 每一个DBMS都有特定的函数.只有少数几个函数被所有主要的DBMS等同的 ...

  7. LINQ to SQL Count/Sum/Min/Max/Avg Join

    public class Linq { MXSICEDataContext Db = new MXSICEDataContext(); // LINQ to SQL // Count/Sum/Min/ ...

  8. LINQ to SQL 语句(3) 之 Count/Sum/Min/Max/Avg

    LINQ  to SQL 语句(3) 之  Count/Sum/Min/Max/Avg [1] Count/Sum 讲解 [2] Min 讲解 [3] Max 讲解 [4] Average 和 Agg ...

  9. [转]LINQ语句之Select/Distinct和Count/Sum/Min/Max/Avg

    在讲述了LINQ,顺便说了一下Where操作,这篇开始我们继续说LINQ语句,目的让大家从语句的角度了解LINQ,LINQ包括LINQ to Objects.LINQ to DataSets.LINQ ...

随机推荐

  1. UWP:可滚动的PivotHeader

    UWP开发里,Pivot真是个令人又爱又恨的控件.为了实现某些可滚动Header的效果,有些大佬甚至去掉了原本的Header,使用一个ListView或者ListBox自己画Header,不过这样会让 ...

  2. 笔记:Spring Cloud Ribbon 客户端配置详解

    自动化配置 由于 Ribbon 中定义的每一个接口都有多种不同的策略实现,同时这些接口之间又有一定的依赖关系,Spring Cloud Ribbon 中的自动化配置能够很方便的自动化构建接口的具体实现 ...

  3. poj-1146 ID codes

    Description It is 2084 and the year of Big Brother has finally arrived, albeit a century late. In or ...

  4. day1-计算机基础

    第一单元  计算机组成原理 一.概念及过程 1.进行逻辑和数值高速计算的计算机器,有存储功能,能按照程序自动执行,且能够处理海量数据的现代化电子设备. 2.发展过程 数学运算:算盘,帕斯卡的齿轮装置, ...

  5. 网络通信 --> 互联网协议(二)

    互联网协议(二) 一.对上一节的总结 我们已经知道,网络通信就是交换数据包.电脑A向电脑B发送一个数据包,后者收到了,回复一个数据包,从而实现两台电脑之间的通信.数据包的结构,基本上是下面这样: 发送 ...

  6. RxJS -- Subscription

    Subscription是什么? 当subscribe一个observable的时候, 返回的就是一个subscription. 它是一个一次性对象(disposable), 它有一个非常重要的方法 ...

  7. [poj3468]A Simple Problem with Integers_线段树

    A Simple Problem with Integers 题目大意:给出n个数,区间加.查询区间和. 注释:1<=n,q<=100,000.(q为操作次数). 想法:嗯...学了这么长 ...

  8. 第二次作业-关于Steam游戏平台的简单分析

    1.1 Steam平台的简单介绍 你选择的产品是? 如题,这次的作业我选择了Steam作为分析的对象. 为什么选择该产品作为分析? 我选择数字游戏贩售平台STEAM作为分析对象的原因有以下几点: 1. ...

  9. 学号:201621123032 《Java程序设计》第5周学习总结

    1:本周学习总结 1.1: 写出你认为本周学习中比较重要的知识点关键词 接口interface,comparator接口和comparable接口. 1.2:尝试使用思维导图将这些关键词组织起来. 2 ...

  10. PTA題目的處理(二)

    題目7-1 計算分段函數[1] 1.實驗代碼 #include <stdio.h> int main() { float x,y; scanf("%f",&x) ...