hive 函数
collect_set(x) 列转行函数---没有重复, 组装多列的数据的结构体
collect_list(x) 列转行函数---可以有重复,组装多列的数据的结构体
concat_ws 拼接函数, 用于多列转成同一行字段后,间隔符
UDF(User-Defined-Function) 用户定义(普通)函数,只对单行数值产生作用;
UDAF(User- Defined Aggregation Funcation)用户定义聚合函数,可对多行数据产生作用;等同与SQL中常用的SUM(),AVG(),也是聚合函数;
UDTF(User-Defined Table-Generating Functions) 用来解决 输入一行输出多行(On-to-many maping) 的需求。
lateral view用于和split、explode等UDTF一起使用的,能将一行数据拆分成多行数据,在此基础上可以对拆分的数据进行聚合,lateral view首先为原始表的每行调用UDTF,UDTF会把一行拆分成一行或者多行,lateral view把结果组合,产生一个支持别名表的虚拟表。下例中的 lateral view explode(subdinates) adTable as aa; 虚拟表adTable的别名为aa
explode(ARRAY) 列表中的每个元素生成一行
explode(MAP) map中每个key-value对,生成一行,key为一列,value为一列
| CREATE TABLE `employees`( |
| `name` string, |
| `salary` float, |
| `subdinates` array<string>, |
| `deducation` map<string,float>, |
| `address` struct<street:string,city:string,state:string,zip:int>) |
| ROW FORMAT SERDE |
| 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.mapred.TextInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
| LOCATION |
| 'hdfs://localhost:9000/user/hive/warehouse/gamedw.db/employees' |
| TBLPROPERTIES ( |
| 'creator'='tianyongtao', |
| 'last_modified_by'='root', |
| 'last_modified_time'='1521447397', |
| 'numFiles'='0', |
| 'numRows'='0', |
| 'rawDataSize'='0', |
| 'totalSize'='0', |
| 'transient_lastDdlTime'='1521447397') |
+----------------------------------------------------------------------+--+
Array类型字段的处理
0: jdbc:hive2://192.168.53.122:10000/default> select name,subdinates from employees;
+---------------+-------------------------+--+
| name | subdinates |
+---------------+-------------------------+--+
| tianyongtao | ["wang","ZHANG","LIU"] |
| wangyangming | ["ma","zhong"] |
+---------------+-------------------------+--+
2 rows selected (0.301 seconds)
0: jdbc:hive2://192.168.53.122:10000/default> select name,aa from employees lateral view explode(subdinates) adTable as aa;
+---------------+--------+--+
| name | aa |
+---------------+--------+--+
| tianyongtao | wang |
| tianyongtao | ZHANG |
| tianyongtao | LIU |
| wangyangming | ma |
| wangyangming | zhong |
+---------------+--------+--+
5 rows selected (0.312 seconds)
Map类型字段的处理
0: jdbc:hive2://192.168.53.122:10000/default> select deducation from employees;
+---------------------------------+--+
| deducation |
+---------------------------------+--+
| {"aaa":10.0,"bb":5.0,"CC":8.0} |
| {"aaa":6.0,"bb":12.0} |
+---------------------------------+--+
2 rows selected (0.315 seconds)
0: jdbc:hive2://192.168.53.122:10000/default> select explode(deducation) as (aa,bb) from employees;
+------+-------+--+
| aa | bb |
+------+-------+--+
| aaa | 10.0 |
| bb | 5.0 |
| CC | 8.0 |
| aaa | 6.0 |
| bb | 12.0 |
+------+-------+--+
5 rows selected (0.314 seconds)
0: jdbc:hive2://192.168.53.122:10000/default> select name,aa,bb from employees lateral view explode(deducation) mtable as aa,bb;
+---------------+------+-------+--+
| name | aa | bb |
+---------------+------+-------+--+
| tianyongtao | aaa | 10.0 |
| tianyongtao | bb | 5.0 |
| tianyongtao | CC | 8.0 |
| wangyangming | aaa | 6.0 |
| wangyangming | bb | 12.0 |
+---------------+------+-------+--+
5 rows selected (0.347 seconds)
0: jdbc:hive2://192.168.53.122:10000/default> select name,aa,bb,cc from employees lateral view explode(deducation) mtable as aa,bb lateral view explode(subdinates) adTable as cc;
+---------------+------+-------+--------+--+
| name | aa | bb | cc |
+---------------+------+-------+--------+--+
| tianyongtao | aaa | 10.0 | wang |
| tianyongtao | aaa | 10.0 | ZHANG |
| tianyongtao | aaa | 10.0 | LIU |
| tianyongtao | bb | 5.0 | wang |
| tianyongtao | bb | 5.0 | ZHANG |
| tianyongtao | bb | 5.0 | LIU |
| tianyongtao | CC | 8.0 | wang |
| tianyongtao | CC | 8.0 | ZHANG |
| tianyongtao | CC | 8.0 | LIU |
| wangyangming | aaa | 6.0 | ma |
| wangyangming | aaa | 6.0 | zhong |
| wangyangming | bb | 12.0 | ma |
| wangyangming | bb | 12.0 | zhong |
+---------------+------+-------+--------+--+
13 rows selected (0.305 seconds)
结构体类型字段:
0: jdbc:hive2://192.168.53.122:10000/default> select name,address.street,address.city,address.state from employees;
+---------------+---------+-----------+----------+--+
| name | street | city | state |
+---------------+---------+-----------+----------+--+
| tianyongtao | HENAN | LUOHE | LINYING |
| wangyangming | hunan | changsha | NULL |
+---------------+---------+-----------+----------+--+
2 rows selected (0.309 seconds)
collect_set():该函数的作用是将某字段的值进行去重汇总,产生Array类型字段
0: jdbc:hive2://192.168.53.122:10000/default> select * from cust;
+------------------+-----------+----------------+--+
| cust.custname | cust.sex | cust.nianling |
+------------------+-----------+----------------+--+
| tianyt_touch100 | 1 | 50 |
| wangwu | 1 | 85 |
| zhangsan | 1 | 20 |
| liuqin | 0 | 56 |
| wangwu | 0 | 47 |
| liuyang | 1 | 32 |
| hello | 0 | 100 |
| mahuateng | 1 | 1001 |
| tianyt_touch100 | 1 | 50 |
| wangwu | 1 | 85 |
| zhangsan | 1 | 20 |
| liuqin | 0 | 56 |
| wangwu | 0 | 47 |
| nihao | 1 | 5 |
| liuyang | 1 | 32 |
| hello | 0 | 100 |
| mahuateng | 1 | 1001 |
| nihao | 1 | 5 |
+------------------+-----------+----------------+--+
scala> hcon.sql("select sex,collect_set(nianling) from gamedw.cust group by sex").show
+---+---------------------+
|sex|collect_set(nianling)|
+---+---------------------+
| 1| [85, 5, 20, 50, 3...|
| 0| [100, 56, 47]|
+---+---------------------+
0: jdbc:hive2://192.168.53.122:10000/default> select * from cityinfo;
+----------------+---------------------------------------------------------------+--+
| cityinfo.city | cityinfo.districts |
+----------------+---------------------------------------------------------------+--+
| shenzhen | longhua,futian,baoan,longgang,dapeng,guangming,nanshan,luohu |
| qingdao | shinan,lichang,jimo,jiaozhou,huangdao,laoshan |
+----------------+---------------------------------------------------------------+--+
0: jdbc:hive2://192.168.53.122:10000/default> select city,area from cityinfo lateral view explode(split(districts,",")) areatable as area;
+-----------+------------+--+
| city | area |
+-----------+------------+--+
| shenzhen | longhua |
| shenzhen | futian |
| shenzhen | baoan |
| shenzhen | longgang |
| shenzhen | dapeng |
| shenzhen | guangming |
| shenzhen | nanshan |
| shenzhen | luohu |
| qingdao | shinan |
| qingdao | lichang |
| qingdao | jimo |
| qingdao | jiaozhou |
| qingdao | huangdao |
| qingdao | laoshan |
+-----------+------------+--+
14 rows selected (0.479 seconds)
已知数据求截止当前月的最大值与截止当前月份的和:
scala> hcon.sql("select * from gamedw.visists order by custid,monthid").show
+------+-------+-----+
|custid|monthid|times|
+------+-------+-----+
| 1| 201801| 25|
| 1| 201801| 10|
| 1| 201802| 35|
| 1| 201802| 7|
| 1| 201803| 52|
| 1| 201805| 6|
| 2| 201801| 32|
| 2| 201801| 1|
| 2| 201802| 10|
| 2| 201802| 18|
| 2| 201803| 91|
| 2| 201804| 6|
| 2| 201804| 4|
| 2| 201805| 31|
+------+-------+-----+
scala> hcon.sql("select custid,b.monthid,sum(times),max(times) from gamedw.visists a inner join (select distinct monthid from gamedw.visists) b on a.monthid<=b.monthid group by custid,b.monthid order by custid,b.monthid").show
+------+-------+----------+----------+
|custid|monthid|sum(times)|max(times)|
+------+-------+----------+----------+
| 1| 201801| 35| 25|
| 1| 201802| 77| 35|
| 1| 201803| 129| 52|
| 1| 201804| 129| 52|
| 1| 201805| 135| 52|
| 2| 201801| 33| 32|
| 2| 201802| 61| 32|
| 2| 201803| 152| 91|
| 2| 201804| 162| 91|
| 2| 201805| 193| 91|
+------+-------+----------+----------+
关联的时候小表写在左边
hive 函数的更多相关文章
- hive函数参考手册
hive函数参考手册 原文见:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF 1.内置运算符1.1关系运算符 运 ...
- Hive函数以及自定义函数讲解(UDF)
Hive函数介绍HQL内嵌函数只有195个函数(包括操作符,使用命令show functions查看),基本能够胜任基本的hive开发,但是当有较为复杂的需求的时候,可能需要进行定制的HQL函数开发. ...
- 大数据入门第十一天——hive详解(三)hive函数
一.hive函数 1.内置运算符与内置函数 函数分类: 查看函数信息: DESC FUNCTION concat; 常用的分析函数之rank() row_number(),参考:https://www ...
- Hadoop生态圈-Hive函数
Hadoop生态圈-Hive函数 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任.
- Hive(四)hive函数与hive shell
一.hive函数 1.hive内置函数 (1)内容较多,见< Hive 官方文档> https://cwiki.apache.org/confluence/displ ...
- Hive入门笔记---2.hive函数大全
Hive函数大全–完整版 现在虽然有很多SQL ON Hadoop的解决方案,像Spark SQL.Impala.Presto等等,但就目前来看,在基于Hadoop的大数据分析平台.数据仓库中,Hiv ...
- 【Hive五】Hive函数UDF
Hive函数 系统自带的函数 查看系统自带的函数 查看系统自带的函数 show functions; 显示自带的函数的用法 desc function upper; 详细显示自带的函数的用法 desc ...
- Hive函数大全-完整版
现在虽然有很多SQL ON Hadoop的解决方案,像Spark SQL.Impala.Presto等等,但就目前来看,在基于Hadoop的大数据分析平台.数据仓库中,Hive仍然是不可替代的角色.尽 ...
- 【翻译】Flink Table Api & SQL — Hive —— Hive 函数
本文翻译自官网:Hive Functions https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/hive/h ...
- hive函数之数学函数
hive函数之数学函数 round(double d)--返回double型d的近似值(四舍五入),返回bigint型: round(double d,int n)--返回保留double型d的n ...
随机推荐
- IIS7 UNC File caching issue
You have to either choose dir-monitoring and file-change-notification with its drawback of using SMB ...
- Hadoop概念学习系列之Hadoop、Spark学习路线(很值得推荐)(十八)
不多说,直接上干货! 说在前面的话 此笔,对于仅对于Hadoop和Spark初中学者.高手请忽略! 1 Java基础: 视频方面: 推荐<毕向东JAVA基础视频教程>.学 ...
- 峰Redis学习(2)Jedis 入门实例
参考博客:http://blog.java1234.com/blog/articles/314.html 第一节:使用Jedis 连接Redis 新建maven项目: pom.xml: <pro ...
- 我的第一个react native
虽然react native出来了很久,但是自己一直因为各种原因没有接触学习,中间尝试过一次,但是因为复杂的环境配置而放弃了.现在,终于因为公司的项目不得不去学习了,当然了,再配置开发环境上面,我还是 ...
- spring4.0之三:@RestController
spring4.0重要的一个新的改进是@RestController注解,它继承自@Controller注解.4.0之前的版本,Spring MVC的组件都使用@Controller来标识当前类是一个 ...
- java学习——异常处理机制
public class ExceptionDemo2 { public static void main(String[] args) { // TODO Auto-generated method ...
- [UE4]蓝图节点的组织
1.将选择的多个蓝图节点变成一个节点,可以给这个节点命名:还可以随时展开这个节点 2.也可以将选中的蓝图节点转换成一个函数或者一个宏.当然也是可以随时展开成原来的样子. 3.变成节点的话,会生成一个子 ...
- Html5弹幕视频播放器插件
Danmmu Player是一个具备弹幕功能的Html5视频播放器.我们在观看视频的时候,可以对视频发表自己的观点,当点击发送按钮后,发表的内容会在视频屏幕上以彩弹的形式发出,并做滚动展示动画效果,即 ...
- c#day04从控制台获取一个字符
class EnumStruct { //定义QQ的状态枚举 Qme QnLine OffLine Leave Busy //提示用户现在一个状态 然后接收 ,并将用户输入转换成枚举类 enum Qs ...
- UE4中使用URL图片
转自:http://www.52vr.com/article-911-1.html