1、hive 数组简单实践:

CREATE TABLE `emp`(
`name` string,
`emps` array<string>)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://node:9000/user/hive/warehouse/daxin.db/emp' 存入数据,借助insert into ... select : insert into emp select "daxin",array('zhangsan','lisi','wangwu') from ptab; hive> select * from emp;
OK
daxin ["zhangsan","lisi","wangwu"]
mali ["jack","lixisan","fala"]
Time taken: 0.045 seconds, Fetched: 2 row(s)
hive>
>
> select * from emp LATERAL VIEW explode(emps) tmp ;
OK
daxin ["zhangsan","lisi","wangwu"] zhangsan
daxin ["zhangsan","lisi","wangwu"] lisi
daxin ["zhangsan","lisi","wangwu"] wangwu
mali ["jack","lixisan","fala"] jack
mali ["jack","lixisan","fala"] lixisan
mali ["jack","lixisan","fala"] fala
Time taken: 0.047 seconds, Fetched: 6 row(s)
hive> select * from emp LATERAL VIEW explode(emps) tmp as empeeName ;
OK
daxin ["zhangsan","lisi","wangwu"] zhangsan
daxin ["zhangsan","lisi","wangwu"] lisi
daxin ["zhangsan","lisi","wangwu"] wangwu
mali ["jack","lixisan","fala"] jack
mali ["jack","lixisan","fala"] lixisan
mali ["jack","lixisan","fala"] fala
Time taken: 0.038 seconds, Fetched: 6 row(s)
hive>
> set hive.cli.print.header=true;
hive> select * from emp LATERAL VIEW explode(emps) tmp as empeeName ;
OK
emp.name emp.emps tmp.empeename
daxin ["zhangsan","lisi","wangwu"] zhangsan
daxin ["zhangsan","lisi","wangwu"] lisi
daxin ["zhangsan","lisi","wangwu"] wangwu
mali ["jack","lixisan","fala"] jack
mali ["jack","lixisan","fala"] lixisan
mali ["jack","lixisan","fala"] fala
Time taken: 0.046 seconds, Fetched: 6 row(s)

LATERAL VIEW explode(emps) tmp as empeeName 其中as后面的名字指定被拆分数组的字段名字为empeeName;

2、Hive复杂数据类型之Map

创建表语句:
CREATE TABLE `userinfo`(
`name` string,
`info` map<string,string>)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://node:9000/user/hive/warehouse/daxin.db/userinfo' 插入数据:
insert into userinfo select "daxin",map("addr","liaoning") from ptab limit ;

插入数据时候注意,map的key与value之间使用逗号分隔,而不是使用冒号!!!

hive> select * from userinfo;
OK
userinfo.name userinfo.info
daxin {"addr":"liaoning"}
Time taken: 0.04 seconds, Fetched: 1 row(s)

带有where条件查询:

hive>  select * from userinfo where info['addr']="liaoning";
OK
userinfo.name userinfo.info
daxin {"addr":"liaoning"}
Time taken: 0.041 seconds, Fetched: 1 row(s)
hive> insert into userinfo select "zhansan",map("addr","beijing","sex","boy","word","coder") from ptab limit 1;
Query ID = liuguangxin_20181102201144_b74fcc0e-1c2d-49e6-9268-bdc97e79ba86
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1541155477807_0005, Tracking URL = http://10.12.141.138:8099/proxy/application_1541155477807_0005/
Kill Command = /Users/liuguangxin/bigdata/hadoop/bin/hadoop job -kill job_1541155477807_0005
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-11-02 20:11:50,234 Stage-1 map = 0%, reduce = 0%
2018-11-02 20:11:55,370 Stage-1 map = 100%, reduce = 0%
2018-11-02 20:11:59,478 Stage-1 map = 100%, reduce = 100%
Ended Job = job_1541155477807_0005
Loading data to table daxin.userinfo
Table daxin.userinfo stats: [numFiles=2, numRows=2, totalSize=60, rawDataSize=58]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 9552 HDFS Write: 110 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
_c0 _c1
Time taken: 15.827 seconds
hive> select * from userinfo where info['addr1']="liaoning"; //当map中不存在key时候不会报错,只会查询不到数据
OK
userinfo.name userinfo.info
Time taken: 0.04 seconds

 查看信息个数:

hive > select size(info) as infoCount,* from userinfo ;
OK
infocount userinfo.name userinfo.info
1 daxin {"addr":"liaoning"}
3 zhansan {"addr":"beijing","sex":"boy","word":"coder"}
Time taken: 0.045 seconds, Fetched: 2 row(s)

  

  

3、hive复杂数据类型Map

CREATE TABLE `fixuserinfo`(
`name` string,
`info` struct<addr:string,mail:string,sex:string>)
COMMENT 'the count of info is fixed'
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://node:9000/user/hive/warehouse/daxin.db/fixuserinfo'  

插入数据:

参考一下:https://blog.csdn.net/xiaolang85/article/details/51330634

创建数据表
CREATE TABLE test(id int,course struct<course:string,score:int>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ',';
数据
1 english,80
2 math,89
3 chinese,95
入库
LOAD DATA LOCAL INPATH '/home/hadoop/test.txt' OVERWRITE INTO TABLE test;
查询
hive> select * from test;
OK
1 {"course":"english","score":80}
2 {"course":"math","score":89}
3 {"course":"chinese","score":95}
Time taken: 0.275 seconds
hive> select course from test;
{"course":"english","score":80}
{"course":"math","score":89}
{"course":"chinese","score":95}
Time taken: 44.968 seconds
select t.course.course from test t;
english
math
chinese
Time taken: 15.827 seconds
hive> select t.course.score from test t;
80
89
95
Time taken: 13.235 seconds

  

4、数组查询数据的 : LATERAL VIEW explode(emps) tmp as empeeName使用:

对某一个字段进行展开,并将该字段指定一个名字,对于一个 表有多个array类型的表而言,每一条记录展开之后产生的记录数是该行记录的展开数组个数相乘,例如:

CREATE TABLE `empinfo`(
`name` string,
`emps` array<string>,
`sal` array<string>);

  

表中的数据:

empinfo.name empinfo.emps empinfo.sal

daxin ["zhangsan","lisi","wangwu"] ["99999","88888","999999"]

mali ["11","22","33"] ["6666","7777","8888"]

查询语句:

按照emps与sal进行展开,对与第一行数据的每一个数组都是3个元素,因此展开之后变成9条数据!第二行同理,所以共计18行记录!!!

5、Hive在线查看函数文档

参考官网:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

参考:https://blog.csdn.net/wangtao6791842/article/details/37966035

hive复杂类型实战的更多相关文章

  1. Scala 深入浅出实战经典 第54讲:Scala中复合类型实战详解

    王家林亲授<DT大数据梦工厂>大数据实战视频 Scala 深入浅出实战经典(1-64讲)完整视频.PPT.代码下载:百度云盘:http://pan.baidu.com/s/1c0noOt6 ...

  2. Scala 深入浅出实战经典 第53讲:Scala中结构类型实战详解

    王家林亲授<DT大数据梦工厂>大数据实战视频 Scala 深入浅出实战经典(1-64讲)完整视频.PPT.代码下载:百度云盘:http://pan.baidu.com/s/1c0noOt6 ...

  3. Hive 表类型简述

    Hive 表类型简述   表类型一.管理表或内部表Table Type:  MANAGED_TABLE example: create table  Inner(id int,name string, ...

  4. hive 复杂类型

    hive提供一种复合类型的数据 struct:可以使用"."来存取数据 map:可以使用键值对来存取数据 array:array中存取的数据为相同类型,其中的数据可以通过下表获取数 ...

  5. 第54讲:Scala中复合类型实战详解

    今天学习了scala的复合类型的内容,让我们通过实战来看看代码: trait Compound_Type1trait Compound_Type2class Compound_Type extends ...

  6. sqoop mysql导入hive 数值类型变成null的问题分析

    问题描述:mysql通过sqoop导入到hive表中,发现有个别数据类型为int或tinyint的列导入后数据为null.设置各种行分隔符,列分隔符都没有效果. 问题分析:hive中单独将有问题的那几 ...

  7. 解决hue/hiveserver2对于hive date类型显示为NULL的问题

    用户报在Hue中执行一条sql:select admission_date, discharge_date,birth_date from hm_004_20170309141149.inpatien ...

  8. Hive调优实战[转]

    Hive优化总结 [转自:http://sznmail.iteye.com/blog/1499789] 优化时,把hive sql当做map reduce程序来读,会有意想不到的惊喜. 理解hadoo ...

  9. 转载:几种 hive join 类型简介

    作为数据分析中经常进行的join 操作,传统DBMS 数据库已经将各种算法优化到了极致,而对于hadoop 使用的mapreduce 所进行的join 操作,去年开始也是有各种不同的算法论文出现,讨论 ...

随机推荐

  1. MySQL技巧(二)——无限级分类表设计

    无限级分类表的设计(掌握'自身连接') 类似图书这种,会有很多种分类,而且在现实生活中这种分类会无限的往下分,所以不可能每有一个分类就创建一个分类表.应该使用下面这种语句 DROP TABLE IF ...

  2. Java设计模式 - 单例模式详解(下)

    单例模式引发相关整理 关联线程安全 在多线程下,懒汉式会有一定修改.当两个线程在if(null == instance)语句阻塞的时候,可能由两个线程进入创建实例,从而返回了两个对象.对此,我们可以加 ...

  3. idea项目git版本回退

    idea项目git版本回退 一.查询提交历史 项目上右键,点击Git,点击Show History 二.复制版本号 我这里有两个测试的版本,我的当前版本是[二],所以我选择[一],右键,选择Copy ...

  4. 通过示例学习JavaScript闭包

    译者按: 在上一篇博客,我们通过实现一个计数器,了解了如何使用闭包(Closure),这篇博客将提供一些代码示例,帮助大家理解闭包. 原文: JavaScript Closures for Dummi ...

  5. EF中更新操作 ID自增但不是主键 ;根据ViewModel更新实体的部分属性

    //ID自增但不是主键的情况 public int Update_join<TEntity>(TEntity entity) where TEntity : class { dbconte ...

  6. Android 常用数据操作封装类案例

    1.DbHelper类 继承自SQLiteOpenHelper类,实现对数据库的基本操作 package com.example.utils; import android.content.Conte ...

  7. Sql 判断函数是否存在、sql判断表是否存在、sql判断存储过程是否存在、sql判断视图是否存在

    --数据库是否存在 IF exists(SELECT * FROM master..sysdatabases WHERE name=N'库名') PRINT 'exists' ELSE PRINT ' ...

  8. JS的函数节流(throttle)

    什么是函数节流? 介绍前,先说下背景.在前端开发中,有时会为页面绑定resize事件,或者为一个页面元素绑定拖拽事件(其核心就是绑定mousemove),这种事件有一个特点,就是用户不必特地捣乱,他在 ...

  9. 正则去除html字符串中的注释、标签、属性

    var str = '<!-- 注释1 --><h1 style="color:#00ff00;text-align: center;">ProsperLe ...

  10. MyBatis笔记----SSM框架mybatis3整合springmvc spring4

    上节 无springmvc框架 http://www.cnblogs.com/tk55/p/6661786.html 结构 jar包 web.xml 与index.jsp <?xml versi ...