1.hive中基本操作;

DDL,DML

2.hive中函数

User-Defined Functions : UDF(用户自定义函数,简称JDF函数)
UDF: 一进一出  upper  lower substring(进来一条记录,出去还是一条记录)
UDAF:Aggregation(用户自定的聚合函数)  多进一出  count max min sum ...
UDTF: Table-Generation  一进多出

3.举例

show functions显示系统支持的函数

行数举例:split(),explode()

exercise:使用hive统计单词出现次数

explode把数组转成多行的数据

[hadoop@hadoop000 data]$ vi hive-wc.txt
hello,world,welcome
hello,welcome
hive> create table hive_wc(sentence string);
OK
Time taken: 1.083 seconds hive> load data local inpath '/home/hadoop/data/hive-wc.txt' into table hive_wc;
Loading data to table default.hive_wc
Table default.hive_wc stats: [numFiles=, totalSize=]
OK
Time taken: 1.539 seconds hive> select * from hive_wc;
OK
hello,world,welcome
hello,welcome Time taken: 0.536 seconds, Fetched: row(s)
hive> select split(sentence,",") from hive_wc;
OK
["hello","world","welcome"]
["hello","welcome"]
[""]
Time taken: 0.161 seconds, Fetched: row(s)
"hello"
"world"
"welcome"
"hello"
"welcome"

用一个SQL完成wordcount统计:

hive> select word, count() as c
> from (select explode(split(sentence,",")) as word from hive_wc) t
> group by word ;
Query ID = hadoop_20180613094545_920c2e72--47eb-9a9c-5e5a30ebb1ae
Total jobs =
Launching Job out of
Number of reduce tasks not specified. Estimated from input data size:
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1528851144815_0001, Tracking URL = http://hadoop000:8088/proxy/application_1528851144815_0001/
Kill Command = /home/hadoop/app/hadoop-2.6.-cdh5.7.0/bin/hadoop job -kill job_1528851144815_0001
Hadoop job information for Stage-: number of mappers: ; number of reducers:
-- ::, Stage- map = %, reduce = %
-- ::, Stage- map = %, reduce = %, Cumulative CPU 2.42 sec
-- ::, Stage- map = %, reduce = %, Cumulative CPU 4.31 sec
MapReduce Total cumulative CPU time: seconds msec
Ended Job = job_1528851144815_0001
MapReduce Jobs Launched:
Stage-Stage-: Map: Reduce: Cumulative CPU: 4.31 sec HDFS Read: HDFS Write: SUCCESS
Total MapReduce CPU Time Spent: seconds msec
OK hello
welcome
world
Time taken: 26.859 seconds, Fetched: row(s)

4.json类型数据

使用到的文件: rating.json

创建一张表 rating_json,上传数据,并查看前十行数据信息:

hive> create table rating_json(json string);
OK hive> load data local inpath '/home/hadoop/data/rating.json' into table rating_json;
Loading data to table default.rating_json
Table default.rating_json stats: [numFiles=, totalSize=]
OK hive> select * from rating_json limit ;
OK
{"movie":"","rate":"","time":"","userid":""}
{"movie":"","rate":"","time":"","userid":""}
{"movie":"","rate":"","time":"","userid":""}
{"movie":"","rate":"","time":"","userid":""}
{"movie":"","rate":"","time":"","userid":""}
{"movie":"","rate":"","time":"","userid":""}
{"movie":"","rate":"","time":"","userid":""}
{"movie":"","rate":"","time":"","userid":""}
{"movie":"","rate":"","time":"","userid":""}
{"movie":"","rate":"","time":"","userid":""}
Time taken: 0.195 seconds, Fetched: row(s)

对json的数据进行处理,json_tuple 是一个UDTF是 Hive0.7版本引进的:

hive> select
> json_tuple(json,"movie","rate","time","userid") as (movie,rate,time,userid)
> from rating_json limit ;
OK Time taken: 0.189 seconds, Fetched: row(s)

5.时间类型的转换:

[hadoop@hadoop000 data]$ more hive_row_number.txt
,,ruoze,M
,,jepson,M
,,wangwu,F
,,zhaoliu,F
,,tianqi,M
,,wangba,F
[hadoop@hadoop000 data]$
hive> create table hive_rownumber(id int,age int, name string, sex string)
> row format delimited fields terminated by ',';
OK
Time taken: 0.451 seconds
hive> load data local inpath '/home/hadoop/data/hive_row_number.txt' into table hive_rownumber;
Loading data to table hive3.hive_rownumber
Table hive3.hive_rownumber stats: [numFiles=, totalSize=]
OK
Time taken: 1.381 seconds
hive> select * from hive_rownumber ;
OK
ruoze M
jepson M
wangwu F
zhaoliu F
tianqi M
wangba F
Time taken: 0.455 seconds, Fetched: row(s)

需求查询出每种性别中年龄最大的两条数据 -- > topn:

分析:order by 是全局的排序,是做不到分组内的排序的 ;组内进行排序,就要用到窗口函数or分析函数

select id,age,name.sex

from

(select id,age,name,sex,

row_number() over(partition by sex order by age desc)

from hive_rownumber) t

where rank<=2;

hive> select id,age,name,sex
> from
> (select id,age,name,sex,
> row_number() over(partition by sex order by age desc) as rank
> from hive_rownumber) t
> where rank<=;
Query ID = hadoop_20180614202525_9829dc42-3c37--8b12-89c416589ebc
Total jobs =
Launching Job out of
Number of reduce tasks not specified. Estimated from input data size:
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1528975858636_0001, Tracking URL = http://hadoop000:/proxy/application_1528975858636_0001/
Kill Command = /home/hadoop/app/hadoop-2.6.-cdh5.7.0/bin/hadoop job -kill job_1528975858636_0001
Hadoop job information for Stage-: number of mappers: ; number of reducers:
-- ::, Stage- map = %, reduce = %
-- ::, Stage- map = %, reduce = %, Cumulative CPU 1.48 sec
-- ::, Stage- map = %, reduce = %, Cumulative CPU 3.86 sec
MapReduce Total cumulative CPU time: seconds msec
Ended Job = job_1528975858636_0001
MapReduce Jobs Launched:
Stage-Stage-: Map: Reduce: Cumulative CPU: 3.86 sec HDFS Read: HDFS Write: SUCCESS
Total MapReduce CPU Time Spent: seconds msec
OK
wangba F
wangwu F
tianqi M
jepson M
Time taken: 29.262 seconds, Fetched: row(s)

hive中 udf,udaf,udtf的更多相关文章

  1. hive中UDF、UDAF和UDTF使用

    Hive进行UDF开发十分简单,此处所说UDF为Temporary的function,所以需要hive版本在0.4.0以上才可以. 一.背景:Hive是基于Hadoop中的MapReduce,提供HQ ...

  2. 【转】hive中UDF、UDAF和UDTF使用

    原博文出自于: http://blog.csdn.net/liuj2511981/article/details/8523084 感谢! Hive进行UDF开发十分简单,此处所说UDF为Tempora ...

  3. [转]HIVE UDF/UDAF/UDTF的Map Reduce代码框架模板

    FROM : http://hugh-wangp.iteye.com/blog/1472371 自己写代码时候的利用到的模板   UDF步骤: 1.必须继承org.apache.hadoop.hive ...

  4. Hive 自定义函数 UDF UDAF UDTF

    1.UDF:用户定义(普通)函数,只对单行数值产生作用: 继承UDF类,添加方法 evaluate() /** * @function 自定义UDF统计最小值 * @author John * */ ...

  5. 【转】HIVE UDF UDAF UDTF 区别 使用

    原博文出自于:http://blog.csdn.net/longzilong216/article/details/23921235(暂时) 感谢! 自己写代码时候的利用到的模板   UDF步骤: 1 ...

  6. 在hive中UDF和UDAF使用说明

    Hive进行UDF开发十分简单,此处所说UDF为Temporary的function,所以需要hive版本在0.4.0以上才可以. 一.背景:Hive是基于Hadoop中的MapReduce,提供HQ ...

  7. 简述UDF/UDAF/UDTF是什么,各自解决问题及应用场景

    UDF User-Defined-Function 自定义函数 .一进一出: 背景 系统内置函数无法解决实际的业务问题,需要开发者自己编写函数实现自身的业务实现诉求. 应用场景非常多,面临的业务不同导 ...

  8. Hive中的UDF详解

    hive作为一个sql查询引擎,自带了一些基本的函数,比如count(计数),sum(求和),有时候这些基本函数满足不了我们的需求,这时候就要写hive hdf(user defined funati ...

  9. hive自定义UDF

    udf udaf udtf 使用方式 hiverc文件 1.jar包放到安装日录下或者指定目录下 2.${HIVE_HOME}/bin目录下有个.hiverc文件,它是隐藏文件. 3.把初始化语句加载 ...

随机推荐

  1. 【Hankson 的趣味题】

    可能我只适合这道题的50分 但还是要争取一下的 我们知道对于\(gcd\)和\(lcm\)有这样的定义 \(a=\prod _{i=1}^{\pi(a)}p_i^{d_{i}}\) \(b=\prod ...

  2. python 中if-else的多种简洁的写法

    因写多了判断语句,看着短短的代码却占据来好几行,于是便搜下if-else简洁的写法,结果也是发现新大陆 4种: 第1种:__就是普通写法 a, b, c = 1, 2, 3 if a>b: c ...

  3. iOS之3DTouch的使用---很简单,看我就够啦~~

    3DTouch是苹果在iOS9之后新推出的功能,功能大致可以分成两种,一种是长按app的icon,会出现以下的界面,还有一种是在app内部的某个视图上使用,效果如下图. 详细的效果也可以参见微信.微信 ...

  4. 算法 - 给出一个字符串str,输出包含两个字符串str的最短字符串,如str为abca时,输出则为abcabca

    今天碰到一个算法题觉得比较有意思,研究后自己实现了出来,代码比较简单,如发现什么问题请指正.思路和代码如下: 基本思路:从左开始取str的最大子字符串,判断子字符串是否为str的后缀,如果是则返回st ...

  5. FreeRTOS 查询任务 剩余的栈空间的 方法

    FreeRTOS 源码下载地址 1.官方文档提供了   函数  用来查询  任务 剩余   栈 空间,首先是看官方的文档解释(某位大神 翻译 的 官方文档.) 参数解释:     xTask:被查询任 ...

  6. uiwebview与objective-c

    利用oc调用js很简单, 系统直接提供了方法stringByEvaluatingJavaScriptFromString [webView stringByEvaluatingJavaScriptFr ...

  7. oracle查询相关注意点

    单表查询: .or 和 and 混合使用 需求:查询业主名称包含'刘'或门牌号包含'5'的,并且地址编号为3的记录 and 的权限优先于 or 所以需要在or的两边添加() 2. 范围查询 除了传统的 ...

  8. IOS本地日志记录解决方案

    我们在项目中日志记录这块也算是比较重要的,有时候用户程序出什么问题,光靠服务器的日志还不能准确的找到问题 现在一般记录日志有几种方式: 1.使用第三方工具来记录日志,如腾讯的Bugly,它是只把程序的 ...

  9. Vue.js与 ASP.NET Core 服务端渲染功能整合

    http://mgyongyosi.com/2016/Vuejs-server-side-rendering-with-aspnet-core/ 原作者:Mihály Gyöngyösi 译者:oop ...

  10. 【Django笔记三】Django2.0配置mysql模型

    一.环境版本信息: 操作系统:windows10 Django版本:2.0.5 Python版本:3.6.4 Mysql版本: 5.5.53   安装mysql 二.安装Mysqlclient: 1. ...