类型转换

可以使用CAST操作显示进行数据类型转换

例如CAST('' AS INT)将把字符串'' 转换成整数1；如果强制类型转换失败，如执行CAST('X' AS INT)，表达式返回空值 NULL。

0: jdbc:hive2://hadoop101:10000> select ''+2, cast(''as int) + 2;

+------+------+--+

| _c0  | _c1  |

+------+------+--+

| 3.0  | 3    |

对于Hive的String类型相当于数据库的varchar类型，该类型是一个可变的字符串，不过它不能声明其中最多能存储多少个字符，理论上它可以存储2GB的字符数。

集合数据类型

Hive有三种复杂数据类型ARRAY、MAP 和 STRUCT。ARRAY和MAP与Java中的Array和Map类似，而STRUCT与C语言中的Struct类似，它封装了一个命名字段集合，复杂数据类型允许任意层次的嵌套。

[kris@hadoop101 datas]$ vim test.txt

songsong,bingbing_lili,xiao song:18_xiaoxiao song:,hui long guan_beijing

yangyang,caicai_susu,xiao yang:18_xiaoxiao yang:, chao yang_beijing

hive (default)> create table test(

              > name string,

              > friends array<string>,

              > children map<string, int>,

              > address struct<street:string, city:string>)

              > row format delimited fields terminated by ','

              > collection items terminated by '_'

              > map keys terminated by ':'

              > lines terminated by '\n';

OK

Time taken: 0.249 seconds

hive (default)> load data local inpath '/opt/module/datas/test.txt/' into table test;

Loading data to table default.test

Table default.test stats: [numFiles=1, totalSize=145]

OK

Time taken: 1.365 seconds

: jdbc:hive2://hadoop101:> select * from test;

: jdbc:hive2://hadoop101:> select friends[], children['xiao song'], address.city from test where name="songsong";

+-------+------+----------+--+

|  _c0  | _c1  |   city   |

+-------+------+----------+--+

| lili  |    | beijing  |

+-------+------+----------+--+

 row selected (0.321 seconds)

DDL数据定义

创建数据库

创建一个数据库，数据库在HDFS上的默认存储路径是/user/hive/warehouse/*.db。

修改

用户可以使用ALTER DATABASE命令为某个数据库的DBPROPERTIES设置键-值对属性值，来描述这个数据库的属性信息。数据库的其他元数据信息都是不可更改的，包括数据库名和数据库所在的目录位置。

① 创建数据库
0: jdbc:hive2://hadoop101:10000> create database if not exists db_hive;  避免要创建的数据库已经存在错误，增加if not exists判断。（标准写法）

No rows affected (0.032 seconds)

0: jdbc:hive2://hadoop101:10000> create database if not exists db_hive2 location '/db_hive2.db'; 指定数据库在HDFS上存放的位置

② 修改数据库

hive (db_hive)> alter database db_hive set dbproperties('createtime'='');

OK

Time taken: 0.031 seconds
③ 查看数据库| 切换数据库 use xx;

hive (db_hive)> desc database extended db_hive;  显示数据库详细信息； 也可以去掉extended即显示数据库信息；

OK

db_name comment location        owner_name      owner_type      parameters

db_hive         hdfs://hadoop101:9000/user/hive/warehouse/db_hive.db    kris    USER    {createtime=20190215}

Time taken: 0.016 seconds, Fetched: 1 row(s)

④ 删除数据库

hive (db_hive)> drop database db_hive2;

hive (db_hive)> drop database if exists db_hive2;

hive (db_hive)> drop database db_hive cascade;  ##若数据库不为空，则强制删除用cascade；

创建表

hive (default)> create table if not exists student2(

              > id int, name string)

              > row format delimited fields terminated by '\t'

              > stored as textfile

              > location '/user/hive/warehouse/student2';

OK

管理表| 内部表

管理表，有时也被称为内部表。因为这种表，Hive会（或多或少地）控制着数据的生命周期。Hive默认情况下会将这些表的数据存储在由配置项hive.metastore.warehouse.dir(例如，/user/hive/warehouse)所定义的目录的子目录下。当我们删除一个管理表时，Hive也会删除这个表中数据。管理表不适合和其他工具共享数据。

外部表，Hive并非认为其完全拥有这份数据。删除该表并不会删除掉这份数据，不过描述表的元数据信息会被删除掉。

使用场景：每天将收集到的网站日志定期流入HDFS文本文件。在外部表（原始日志表）的基础上做大量的统计分析，用到的中间表、结果表使用内部表存储，数据通过SELECT+INSERT进入内部表。

内部表数据可进可出元数据+hdfs
外部表元数据---HDFS，只包含元数据；不会删hdfs数据

① 普通创建表
hive (default)> create table if not exists student3 as select id, name from student;

hive (default)> create table if not exists student4 like student; //根据已经存在的表机构创建表

hive (default)> desc formatted student2; #查询表的类型；查看格式化数据

OK

col_name        data_type       comment

② 外部表

hive (default)> dfs -mkdir /student;

hive (default)> dfs -put /opt/module/datas/student.txt /student;

hive (default)> create external table stu_external(  //创建外部表

id int,

name string)

row format delimited fields terminated by '\t'

location '/student';

: jdbc:hive2://hadoop101:> select * from stu_external;

: jdbc:hive2://hadoop101:> desc formatted stu_external;

 Table Type:                   | EXTERNAL_TABLE

 : jdbc:hive2://hadoop101:> drop table stu_external;

 外部表删除后，hdfs中的数据还在，但是metadata中stu_external的元数据已被删除

 ③ 内部表和外部表的互相转换

  desc formatted student2;

  Table Type:                   | MANAGED_TABLE

: jdbc:hive2://hadoop101:> alter table student2 set tblproperties('EXTERNAL'='TRUE');

   Table Type:                   | EXTERNAL_TABLE

: jdbc:hive2://hadoop101:> alter table student2 set tblproperties('EXTERNAL'='FALSE');

分区表

分区表实际上就是对应一个HDFS文件系统上的独立的文件夹，该文件夹下是该分区所有的数据文件。Hive中的分区就是分目录，把一个大的数据集根据业务需要分割成小的数据集。在查询时通过WHERE子句中的表达式选择查询所需要的指定的分区，这样的查询效率会提高很多。

① 创建分区表
hive (default)> create table dept_partition(

              > deptno int, dname string, loc string)

              > partitioned by (month string)

              > row format delimited fields terminated by '\t';

OK
　　加载数据

hive (default)> load data local inpath '/opt/module/datas/dept.txt' into table default.dept_partition partition(month='');

Loading data to table default.dept_partition partition (month=)

Partition default.dept_partition{month=} stats: [numFiles=1, numRows=0, totalSize=71, rawDataSize=0]

OK

load data local inpath '/opt/module/datas/dept.txt' into table default.dept_partition partition(month='');

load data local inpath '/opt/module/datas/dept.txt' into table default.dept_partition partition(month='');

② 单分区查询

: jdbc:hive2://hadoop101:> select * from dept_partition where month='';

+------------------------+-----------------------+---------------------+-----------------------+--+

| dept_partition.deptno  | dept_partition.dname  | dept_partition.loc  | dept_partition.month  |

+------------------------+-----------------------+---------------------+-----------------------+--+

|                      | ACCOUNTING            |                 |                 |

|                      | RESEARCH              |                 |                 |

|                      | SALES                 |                 |                 |

|                      | OPERATIONS            |                 |                 |

+------------------------+-----------------------+---------------------+-----------------------+--

　　多分区联合查询

: jdbc:hive2://hadoop101:> select * from dept_partition where month=''

: jdbc:hive2://hadoop101:> union

: jdbc:hive2://hadoop101:> select * from dept_partition where month=''

: jdbc:hive2://hadoop101:> union

: jdbc:hive2://hadoop101:> select * from dept_partition where month='';

③ 增加分区| 增加单个、增加多个分区

: jdbc:hive2://hadoop101:> alter table dept_partition add partition(month='') partition(month='');

④ 删除分区| 单个、删多个用，连接

: jdbc:hive2://hadoop101:> alter table dept_partition drop partition(month=''), partition(month='');

⑤ 查看分区有多少分区

: jdbc:hive2://hadoop101:> show partitions dept_partition;

+---------------+--+

|   partition   |

+---------------+--+

| month=  |

| month=  |

| month=  |

+---------------+--+

⑥ 查看分区表结构

: jdbc:hive2://hadoop101:> desc formatted dept_partition;

⑦ 创建二级分区

　　hive (default)> create table dept_partition2(
　　 deptno int, dname string, loc string)
　　 partitioned by (month string, day string)
　　 row format delimited fields terminated by '\t';

　　加载数据到二级分区

: jdbc:hive2://hadoop101:> load data local inpath '/opt/module/datas/dept.txt' into table default.dept_partition2 partition(month='', day='');

: jdbc:hive2://hadoop101:> select * from dept_partition2 where month='' and day=''; 查看分区数据

　　把数据直接上传到分区目录上，让分区表和数据产生关联的三种方式

方式一：上传数据后修复
0: jdbc:hive2://hadoop101:> dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=/day=;

: jdbc:hive2://hadoop101:> dfs -put /opt/module/datas/dept.txt /user/hive/warehouse/dept_partition2/month=/day=;

: jdbc:hive2://hadoop101:> msck repair table dept_partition2;  //修复下才能查到数据

No rows affected (0.15 seconds)

: jdbc:hive2://hadoop101:> select * from dept_partition2 where month='' and day='';

 alter table dept_partition2 drop partition(month='', day=''); 删除

 方式二：上传数据后添加分区

 : jdbc:hive2://hadoop101:> dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=/day=; 不能加引号

 : jdbc:hive2://hadoop101:> dfs -put /opt/module/datas/dept.txt /user/hive/warehouse/dept_partition2/month=/day=;

 : jdbc:hive2://hadoop101:> alter table dept_partition2 add partition(month='', day='');

 : jdbc:hive2://hadoop101:> select * from dept_partition2 where month='' and day='';

 方式三：创建文件夹后load数据到分区

 : jdbc:hive2://hadoop101:> dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=''/day='';

 : jdbc:hive2://hadoop101:> load data local inpath '/opt/module/datas/dept.txt' into table dept_partition2 partition(month='',day='');

 : jdbc:hive2://hadoop101:> select * from dept_partition2 where month='' and day='';

修改表

重命名表

 jdbc:hive2://hadoop101:> alter table teacher rename to new_teacher;
添加列

 : jdbc:hive2://hadoop101:> alter table dept_partition add columns(deptdesc string);
更新列

 : jdbc:hive2://hadoop101:> alter table dept_partition change column deptdesc desc int;

No rows affected (0.112 seconds)

: jdbc:hive2://hadoop101:> desc dept_partition;
替换列

: jdbc:hive2://hadoop101:> alter table dept_partition replace columns(deptid int, name string, loc string);
删除表

: jdbc:hive2://hadoop101:> drop table new_teacher;

DML数据操作

数据导入

向表中装载数据（Load）

① 向表中装载数据： 
　　从本地到hive
0: jdbc:hive2://hadoop101:> create table student(id int, name string) row format delimited fields terminated by '\t';

: jdbc:hive2://hadoop101:> load data local inpath '/opt/module/datas/student.txt' into table default.student;  加载本地文件到hive

　　从HDFS到hive

: jdbc:hive2://hadoop101:> dfs -mkdir -p /user/kris/hive;

: jdbc:hive2://hadoop101:> dfs -put /opt/module/datas/student.txt /user/kris/hive;

: jdbc:hive2://hadoop101:> load data inpath '/user/kris/hive/student.txt' into table default.student; //移动hdfs上的文件；加载HDFS上的数据

: jdbc:hive2://hadoop101:> load data inpath '/user/kris/hive/student.txt' overwrite into table default.student; 加载数据覆盖表中已有的数据

② 通过查询语句向表中插入数据Insert

create table student(id int, name string) partitioned by (month string) row format delimited fields terminated by '\t'; 创建一张分区表

: jdbc:hive2://hadoop101:> insert into table student partition(month='') values (, "kris"), (, "egon"); 插入数据

-rwxrwxr-x    kris    supergroup     B    // 下午7::         MB    000000_

　　根据单张表查询结果来插入insert into是追加数据的方式插入表或分区，原有数据不会被删除；
　　　　　　　　　　　　　　insert overwrite是会覆盖表或分区中已有数据；


: jdbc:hive2://hadoop101:> insert overwrite table student partition(month="") select id,name from student where month=''; 在原本基础上追加

: jdbc:hive2://hadoop101:> select * from student;

+-------------+---------------+----------------+--+

| student.id  | student.name  | student.month  |

+-------------+---------------+----------------+--+

|            | kris          |          |

|            | egon          |          |

|            | kris          |          |

|            | egon          |          |

+-------------+---------------+----------------+--+

　　多表查询结果插入

hive (default)> from student insert overwrite table student partition(month="")

              > select id, name where month=""

              > insert overwrite table student partition(month="")

              > select id, name where month="";

: jdbc:hive2://hadoop101:> select * from student;

+-------------+---------------+----------------+--+

| student.id  | student.name  | student.month  |

+-------------+---------------+----------------+--+

|            | kris          |          |

|            | egon          |          |

|            | kris          |          |

|            | egon          |          |

|            | kris          |          |

|            | egon          |          |

|            | kris          |          |

|            | egon          |          |

+-------------+---------------+----------------+-
③ 查询语句中创建并加载数据 AS Select

create table if not exists student3 as select id, name from student;

create table if not exists student4 like student;

④ 创建表时通过Location指定加载数据路径

: jdbc:hive2://hadoop101:> create external table if not exists stu(id int, name string) row format delimited fields terminated by '\t' location '/student';

⑤ Import数据到指定Hive表中；要先使用export导出后，才能将数据import导入

　　export table student to '/hive_data/student';
　　import table student from '/hive_data/student';

create table student22(

id int, name string)

partitioned by (month string)

row format delimited fields terminated by '\t';

import table student22 partition(month='') from  //student22必须要有分区才能导入成功

 '/user/hive/warehouse/export/student';

数据导出（Impala都不支持）

① Insert导出
　　将输出文件导出到本地/opt/module/datas/export/student中；

: jdbc:hive2://hadoop101:> insert overwrite local directory '/opt/module/datas/export/student' select * from student;

　　结果格式化导出到本地
hive (default)> insert overwrite local directory '/opt/module/datas/export/student1'

              > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' select * from student;

    结果导出到HDFS；只能用overwrite，不能用into

hive (default)> insert overwrite directory '/user/kris/student2'

              > row format delimited fields terminated by '\t'

              > select * from student;

② Hadoop命令导出到本地

hive (default)> dfs -get /user/hive/warehouse/student/month=/000000_ /opt/module/datas/export/student3.txt;

[kris@hadoop101 export]$ cat student3.txt

       kris

       egon

[kris@hadoop101 export]$ pwd

/opt/module/datas/export

③ Shell命令导出到本地

[kris@hadoop101 hive]$ bin/hive -e 'select * from default.student;' > /opt/module/datas/export/student4.txt

④ Export导出到HDFS上

hive (default)> export table default.student to '/user/hive/warehouse/export/student';
⑤ Sqoop导出（导入）
　　https://www.cnblogs.com/shengyang17/p/10512510.html

Hive表导出成csv文件

hive -e "

set hive.cli.print.header=true;

select * from student where sex = 'male';

" | sed 's/[\t]/,/g'  > /opt/module/student.csv

清除表中数据（Truncate）

注意：Truncate只能删除管理表，不能删除外部表中数据

　　hive (default)> truncate table student;

Hive| DDL| DML的更多相关文章

Hive DDL DML SQL操作
工作中经常要用到的一些东西,一直没整理,用的多的记住了,用的不多的每次都是去查,所以记录一下. DDL(数据定义语言),那就包括建表,修改表结构等等了建表:create hive table hiv ...
Hive DDL&DML
1.删除分区 ALTER TABLE table_name DROP IF EXISTS PARTITION(dt=') 如果是外部表,记得rm对应文件 2.添加分区 ALTER TABLE tabl ...
Hive数据据类型 DDL DML
Hive的基本数据类型 DDL DML: 基本数据类型对于Hive而言String类型相当于数据库的varchar类型,该类型是一个可变的字符串,不过它不能声明其中最多能存储多少个字符,理论上它可以 ...
hive DDL
官网地址:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL https://cwiki.apache.org/co ...
Hive 官方手册翻译 -- Hive DDL(数据定义语言)
Hive DDL(数据定义语言) Confluence Administrator创建, Janaki Lahorani修改于 2018年9月19日原文链接 https://cwiki.apache ...
吴裕雄--天生自然HADOOP操作实验学习笔记：hive DDL
实验目的了解hive DDL的基本格式了解hive和hdfs的关系学习hive在hdfs中的保存方式学习一些典型常用的hiveDDL 实验原理有关hive的安装和原理我们已经了解,这次实验我 ...
DDL DML DCL SQL
https://dev.mysql.com/doc/refman/5.7/en/glossary.html#glos_ddl SQL The Structured Query Language tha ...
ORA-38301:can not perform DDL/DML over objects in Recycle Bin
一个智障操作,drop一个用户,下面的东西比较多,删得比较慢,然后shell突然关了. 就导致了,删不掉,又不能创建新的用户.出版本要得比较急,就先创建新的用户测试去了. 今天要弄个东西,又想起这个事 ...
MySQL中的DDL,DML
MySQL中的DDL,DMLDDL:数据定义语言: CREATE,ALTER,DROP DB组件:数据库.表.索引.视图.用户.存储过程.存储函数.触发器.事件调度器等 CR ...

随机推荐

git bash的命令
git bash cd /f 该命令可以把当前目录切换到f盘 git clone git上的项目的url
Go学习笔记（只有链接）
Go学习笔记 link: https://blog.csdn.net/u011304970/article/details/69908641 仅作为记录使用.
TIMESTAMPDIFF用法
TIMESTAMPDIFF用法: TIMESTAMPDIFF(interval,datetime_expr1,datetime_expr2) 返回日期或日期时间表达式datetime_expr1 和d ...
C#生成Excel保存到服务器端并下载
using MongoDB.Bson; using Newtonsoft.Json.Linq; using NPOI.HSSF.UserModel; using NPOI.SS.UserModel; ...
WebSocket服务端和客户端使用
using System;using System.Collections.Generic;using System.IO;using System.Linq;using System.Net;usi ...
ctrl + alt + T无法启动终端
kill -9 -1重新进入即可
【python】获取http响应
一个相对完整的http请求,输入ip和端口,输出响应码,响应头,响应体,是否超时,以及出错时的错误信息处理包括: 1.协议处理,如果是443用https,其他用http 2.HTTPError处理, ...
修改ElementUI源码样式
参考:https://segmentfault.com/a/1190000010932321
查看mysql 默认端口号和修改端口号
1. 登录mysql mysql -u root -p //输入密码 2. 使用命令show global variables like 'port';查看端口号 mysql> show glo ...
论文阅读笔记三十九：Accurate Single Stage Detector Using Recurrent Rolling Convolution（RRC CVPR2017）
论文源址:https://arxiv.org/abs/1704.05776 开源代码:https://github.com/xiaohaoChen/rrc_detection 摘要大多数目标检测及定 ...

Hive| DDL| DML