hive 函数
collect_set(x) 列转行函数---没有重复, 组装多列的数据的结构体
collect_list(x) 列转行函数---可以有重复,组装多列的数据的结构体
concat_ws 拼接函数, 用于多列转成同一行字段后,间隔符
UDF(User-Defined-Function) 用户定义(普通)函数,只对单行数值产生作用;
UDAF(User- Defined Aggregation Funcation)用户定义聚合函数,可对多行数据产生作用;等同与SQL中常用的SUM(),AVG(),也是聚合函数;
UDTF(User-Defined Table-Generating Functions) 用来解决 输入一行输出多行(On-to-many maping) 的需求。
lateral view用于和split、explode等UDTF一起使用的,能将一行数据拆分成多行数据,在此基础上可以对拆分的数据进行聚合,lateral view首先为原始表的每行调用UDTF,UDTF会把一行拆分成一行或者多行,lateral view把结果组合,产生一个支持别名表的虚拟表。下例中的 lateral view explode(subdinates) adTable as aa; 虚拟表adTable的别名为aa
explode(ARRAY) 列表中的每个元素生成一行
explode(MAP) map中每个key-value对,生成一行,key为一列,value为一列
| CREATE TABLE `employees`( |
| `name` string, |
| `salary` float, |
| `subdinates` array<string>, |
| `deducation` map<string,float>, |
| `address` struct<street:string,city:string,state:string,zip:int>) |
| ROW FORMAT SERDE |
| 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.mapred.TextInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
| LOCATION |
| 'hdfs://localhost:9000/user/hive/warehouse/gamedw.db/employees' |
| TBLPROPERTIES ( |
| 'creator'='tianyongtao', |
| 'last_modified_by'='root', |
| 'last_modified_time'='1521447397', |
| 'numFiles'='0', |
| 'numRows'='0', |
| 'rawDataSize'='0', |
| 'totalSize'='0', |
| 'transient_lastDdlTime'='1521447397') |
+----------------------------------------------------------------------+--+
Array类型字段的处理
0: jdbc:hive2://192.168.53.122:10000/default> select name,subdinates from employees;
+---------------+-------------------------+--+
| name | subdinates |
+---------------+-------------------------+--+
| tianyongtao | ["wang","ZHANG","LIU"] |
| wangyangming | ["ma","zhong"] |
+---------------+-------------------------+--+
2 rows selected (0.301 seconds)
0: jdbc:hive2://192.168.53.122:10000/default> select name,aa from employees lateral view explode(subdinates) adTable as aa;
+---------------+--------+--+
| name | aa |
+---------------+--------+--+
| tianyongtao | wang |
| tianyongtao | ZHANG |
| tianyongtao | LIU |
| wangyangming | ma |
| wangyangming | zhong |
+---------------+--------+--+
5 rows selected (0.312 seconds)
Map类型字段的处理
0: jdbc:hive2://192.168.53.122:10000/default> select deducation from employees;
+---------------------------------+--+
| deducation |
+---------------------------------+--+
| {"aaa":10.0,"bb":5.0,"CC":8.0} |
| {"aaa":6.0,"bb":12.0} |
+---------------------------------+--+
2 rows selected (0.315 seconds)
0: jdbc:hive2://192.168.53.122:10000/default> select explode(deducation) as (aa,bb) from employees;
+------+-------+--+
| aa | bb |
+------+-------+--+
| aaa | 10.0 |
| bb | 5.0 |
| CC | 8.0 |
| aaa | 6.0 |
| bb | 12.0 |
+------+-------+--+
5 rows selected (0.314 seconds)
0: jdbc:hive2://192.168.53.122:10000/default> select name,aa,bb from employees lateral view explode(deducation) mtable as aa,bb;
+---------------+------+-------+--+
| name | aa | bb |
+---------------+------+-------+--+
| tianyongtao | aaa | 10.0 |
| tianyongtao | bb | 5.0 |
| tianyongtao | CC | 8.0 |
| wangyangming | aaa | 6.0 |
| wangyangming | bb | 12.0 |
+---------------+------+-------+--+
5 rows selected (0.347 seconds)
0: jdbc:hive2://192.168.53.122:10000/default> select name,aa,bb,cc from employees lateral view explode(deducation) mtable as aa,bb lateral view explode(subdinates) adTable as cc;
+---------------+------+-------+--------+--+
| name | aa | bb | cc |
+---------------+------+-------+--------+--+
| tianyongtao | aaa | 10.0 | wang |
| tianyongtao | aaa | 10.0 | ZHANG |
| tianyongtao | aaa | 10.0 | LIU |
| tianyongtao | bb | 5.0 | wang |
| tianyongtao | bb | 5.0 | ZHANG |
| tianyongtao | bb | 5.0 | LIU |
| tianyongtao | CC | 8.0 | wang |
| tianyongtao | CC | 8.0 | ZHANG |
| tianyongtao | CC | 8.0 | LIU |
| wangyangming | aaa | 6.0 | ma |
| wangyangming | aaa | 6.0 | zhong |
| wangyangming | bb | 12.0 | ma |
| wangyangming | bb | 12.0 | zhong |
+---------------+------+-------+--------+--+
13 rows selected (0.305 seconds)
结构体类型字段:
0: jdbc:hive2://192.168.53.122:10000/default> select name,address.street,address.city,address.state from employees;
+---------------+---------+-----------+----------+--+
| name | street | city | state |
+---------------+---------+-----------+----------+--+
| tianyongtao | HENAN | LUOHE | LINYING |
| wangyangming | hunan | changsha | NULL |
+---------------+---------+-----------+----------+--+
2 rows selected (0.309 seconds)
collect_set():该函数的作用是将某字段的值进行去重汇总,产生Array类型字段
0: jdbc:hive2://192.168.53.122:10000/default> select * from cust;
+------------------+-----------+----------------+--+
| cust.custname | cust.sex | cust.nianling |
+------------------+-----------+----------------+--+
| tianyt_touch100 | 1 | 50 |
| wangwu | 1 | 85 |
| zhangsan | 1 | 20 |
| liuqin | 0 | 56 |
| wangwu | 0 | 47 |
| liuyang | 1 | 32 |
| hello | 0 | 100 |
| mahuateng | 1 | 1001 |
| tianyt_touch100 | 1 | 50 |
| wangwu | 1 | 85 |
| zhangsan | 1 | 20 |
| liuqin | 0 | 56 |
| wangwu | 0 | 47 |
| nihao | 1 | 5 |
| liuyang | 1 | 32 |
| hello | 0 | 100 |
| mahuateng | 1 | 1001 |
| nihao | 1 | 5 |
+------------------+-----------+----------------+--+
scala> hcon.sql("select sex,collect_set(nianling) from gamedw.cust group by sex").show
+---+---------------------+
|sex|collect_set(nianling)|
+---+---------------------+
| 1| [85, 5, 20, 50, 3...|
| 0| [100, 56, 47]|
+---+---------------------+
0: jdbc:hive2://192.168.53.122:10000/default> select * from cityinfo;
+----------------+---------------------------------------------------------------+--+
| cityinfo.city | cityinfo.districts |
+----------------+---------------------------------------------------------------+--+
| shenzhen | longhua,futian,baoan,longgang,dapeng,guangming,nanshan,luohu |
| qingdao | shinan,lichang,jimo,jiaozhou,huangdao,laoshan |
+----------------+---------------------------------------------------------------+--+
0: jdbc:hive2://192.168.53.122:10000/default> select city,area from cityinfo lateral view explode(split(districts,",")) areatable as area;
+-----------+------------+--+
| city | area |
+-----------+------------+--+
| shenzhen | longhua |
| shenzhen | futian |
| shenzhen | baoan |
| shenzhen | longgang |
| shenzhen | dapeng |
| shenzhen | guangming |
| shenzhen | nanshan |
| shenzhen | luohu |
| qingdao | shinan |
| qingdao | lichang |
| qingdao | jimo |
| qingdao | jiaozhou |
| qingdao | huangdao |
| qingdao | laoshan |
+-----------+------------+--+
14 rows selected (0.479 seconds)
已知数据求截止当前月的最大值与截止当前月份的和:
scala> hcon.sql("select * from gamedw.visists order by custid,monthid").show
+------+-------+-----+
|custid|monthid|times|
+------+-------+-----+
| 1| 201801| 25|
| 1| 201801| 10|
| 1| 201802| 35|
| 1| 201802| 7|
| 1| 201803| 52|
| 1| 201805| 6|
| 2| 201801| 32|
| 2| 201801| 1|
| 2| 201802| 10|
| 2| 201802| 18|
| 2| 201803| 91|
| 2| 201804| 6|
| 2| 201804| 4|
| 2| 201805| 31|
+------+-------+-----+
scala> hcon.sql("select custid,b.monthid,sum(times),max(times) from gamedw.visists a inner join (select distinct monthid from gamedw.visists) b on a.monthid<=b.monthid group by custid,b.monthid order by custid,b.monthid").show
+------+-------+----------+----------+
|custid|monthid|sum(times)|max(times)|
+------+-------+----------+----------+
| 1| 201801| 35| 25|
| 1| 201802| 77| 35|
| 1| 201803| 129| 52|
| 1| 201804| 129| 52|
| 1| 201805| 135| 52|
| 2| 201801| 33| 32|
| 2| 201802| 61| 32|
| 2| 201803| 152| 91|
| 2| 201804| 162| 91|
| 2| 201805| 193| 91|
+------+-------+----------+----------+
关联的时候小表写在左边
hive 函数的更多相关文章
- hive函数参考手册
hive函数参考手册 原文见:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF 1.内置运算符1.1关系运算符 运 ...
- Hive函数以及自定义函数讲解(UDF)
Hive函数介绍HQL内嵌函数只有195个函数(包括操作符,使用命令show functions查看),基本能够胜任基本的hive开发,但是当有较为复杂的需求的时候,可能需要进行定制的HQL函数开发. ...
- 大数据入门第十一天——hive详解(三)hive函数
一.hive函数 1.内置运算符与内置函数 函数分类: 查看函数信息: DESC FUNCTION concat; 常用的分析函数之rank() row_number(),参考:https://www ...
- Hadoop生态圈-Hive函数
Hadoop生态圈-Hive函数 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任.
- Hive(四)hive函数与hive shell
一.hive函数 1.hive内置函数 (1)内容较多,见< Hive 官方文档> https://cwiki.apache.org/confluence/displ ...
- Hive入门笔记---2.hive函数大全
Hive函数大全–完整版 现在虽然有很多SQL ON Hadoop的解决方案,像Spark SQL.Impala.Presto等等,但就目前来看,在基于Hadoop的大数据分析平台.数据仓库中,Hiv ...
- 【Hive五】Hive函数UDF
Hive函数 系统自带的函数 查看系统自带的函数 查看系统自带的函数 show functions; 显示自带的函数的用法 desc function upper; 详细显示自带的函数的用法 desc ...
- Hive函数大全-完整版
现在虽然有很多SQL ON Hadoop的解决方案,像Spark SQL.Impala.Presto等等,但就目前来看,在基于Hadoop的大数据分析平台.数据仓库中,Hive仍然是不可替代的角色.尽 ...
- 【翻译】Flink Table Api & SQL — Hive —— Hive 函数
本文翻译自官网:Hive Functions https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/hive/h ...
- hive函数之数学函数
hive函数之数学函数 round(double d)--返回double型d的近似值(四舍五入),返回bigint型: round(double d,int n)--返回保留double型d的n ...
随机推荐
- idea下的调试配置
react和ts的整合 https://github.com/Microsoft/TypeScript-React-Starter vue的 https://github.com/ducksoupde ...
- C++11--Tuple类<tuple>
#include "stdafx.h" #include <iomanip> #include <condition_variable> #include ...
- 【Linux】使用fsck对磁盘进行修复
在后台执行 磁盘修复 nohup fsck.ext3 -y /dev/sdb1 > /root/fsck.log 2>&1 & 使用nohup和& 让进程在后台执行 ...
- hive中的表
一.内部表与外部表的比较 Hive表概念和关系型数据库表概念差不多.在Hive里表会和HDFS的一个目录相对应,这个目录会存放表的数据.目录默认是/usr/hive/warehouse/. 比如你在h ...
- Spring Boot下Druid连接池+mybatis
目前Spring Boot中默认支持的连接池有dbcp,dbcp2, hikari三种连接池. 引言: 在Spring Boot下默认提供了若干种可用的连接池,Druid来自于阿里系的一个开源连 ...
- PHP7.1扩展开发入门
第1步: 首先从官网下载了PHP源码http://am1.php.net/distributions/php-7.1.3.tar.bz2 第2步: 解压后可以看到根目录下面的ext文件夹里有ext_s ...
- 搭建DUBBO项目解决DUBBO.XML标签报错的问题(转载)
https://www.cnblogs.com/ajax-li/p/7856393.html 报错内容: Multiple annotations found at this line: - cvc- ...
- optparse模块解析命令行参数的说明及优化
一.关于解析命令行参数的方法 关于“解析命令行参数”的方法我们一般都会用到sys.argv跟optparse模块.关于sys.argv,网上有一篇非常优秀的博客已经介绍的很详细了,大家可以去这里参考: ...
- phpmyadmin在nginx环境下配置错误
location ~ \.css { add_header Content-Type text/css; } location ~ \.js { ...
- vue 定义方法执行方法 获取数据 改变数据 执行方法传值 以及事件对象
<template> <div id="app"> <!-- <img v-bind:src='url' /> <img :src= ...