Hive- Hive 的基本操作

创建数据库

create database db_hive;

use db_hive;

create database if not exists db_hive_02;

create database if not exists db_hive_01 location '/user/rz_lee/warehouse/db_hive_01.db';   //指定数据库存储的位置

查看数据库

hive (db_hive)> show databases;

#模糊查询

hive (db_hive)> show databases like 'db_hive*';

OK

database_name

db_hive

db_hive_01

Time taken: 0.075 seconds, Fetched:  row(s)

hive (db_hive)>

删除数据库

drop database myhive_db ; //这个database下面没有表可以用该命令

drop database if exists myhive_db cascade; //cedilla删除database

创建表

create table student(id int,name string) row format delimited fields terminated by '\t';

查看表结构

hive> desc student;

OK

id                      int

name                    string

Time taken: 0.114 seconds, Fetched:  row(s)

查看表详细信息

hive> desc extended student;

OK

id                      int

name                    string                                      

Detailed Table Information      Table(tableName:student, dbName:db_hive, owner:root, createTime:, lastAccessTime:, retention:,
sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null), FieldSchema(name:name, type:string, comment:null)], location:hdfs:
//ns1/user/hive/warehouse/db_hive.db/student, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=       , field.delim=

Time taken: 0.356 seconds, Fetched:  row(s)

hive>

将表的详细信息格式化一下，开发中常用

hive> desc formatted student;

OK

# col_name              data_type               comment             

id                      int

name                    string                                      

# Detailed Table Information

Database:               db_hive

Owner:                  root

CreateTime:             Tue Oct  :: CST

LastAccessTime:         UNKNOWN

Retention:

Location:               hdfs://ns1/user/hive/warehouse/db_hive.db/student

Table Type:             MANAGED_TABLE

Table Parameters:

        COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}

        numFiles

        numRows

        rawDataSize

        totalSize

        transient_lastDdlTime             

# Storage Information

SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

InputFormat:            org.apache.hadoop.mapred.TextInputFormat

OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

Compressed:             No

Num Buckets:            -

Bucket Columns:         []

Sort Columns:           []

Storage Desc Params:

        field.delim             \t

        serialization.format    \t

Time taken: 0.065 seconds, Fetched:  row(s)

hive>

加载本地数据到表中

hive> load data local inpath '/root/student.txt' into table db_hive.student;    //防止当使用时忘记use databse的时候，出现报错，在表的前面加上数据库名

Loading data to table db_hive.student

OK

Time taken: 1.632 seconds

hive> select * from student;

OK

    gailun

    jeckes

    anny

    yi

Time taken: 1.376 seconds, Fetched:  row(s)

hive>

修改hive表字段信息（字段名/字段类型）

ALTER TABLE table_name CHANGE COLUMN col_old_name col_new_name column_type     # 需要同时将三个参数写上，hive检查后面两个参数，有更新的就去alter

将hive内部表转为外部表

ALTER TABLE table_name SET TBLPROPERTIES ('EXTERNAL'='TRUE');

改变外部表的location路径

alter table table_namel set location 'hdfs://nameservice1/user/new_path'

查看hive表的创建信息

show create table

查看hive中的函数

hive> show functions;

OK

!

!=

$sum0

%

&

*

+

-

/

<

<=

<=>

<>

=

==

>

>=

^

abs

acos

add_months

aes_decrypt

aes_encrypt

and

array

array_contains

ascii

asin

assert_true

atan

avg

base64

between

bin

bround

case

cbrt

ceil

ceiling

chr

coalesce

collect_list

collect_set

compute_stats

concat

concat_ws

context_ngrams

conv

corr

cos

count

covar_pop

covar_samp

crc32

create_union

cume_dist

current_database

current_date

current_timestamp

current_user

date_add

date_format

date_sub

datediff

day

dayofmonth

decode

degrees

dense_rank

div

e

elt

encode

ewah_bitmap

ewah_bitmap_and

ewah_bitmap_empty

ewah_bitmap_or

exp

explode

factorial

field

find_in_set

first_value

floor

format_number

from_unixtime

from_utc_timestamp

get_json_object

get_splits

greatest

hash

hex

histogram_numeric

hour

if

in

in_file

index

initcap

inline

instr

isnotnull

isnull

java_method

json_tuple

lag

last_day

last_value

lcase

lead

least

length

levenshtein

like

ln

locate

log

log10

log2

lower

lpad

ltrim

map

map_keys

map_values

mask

mask_first_n

mask_hash

mask_last_n

mask_show_first_n

mask_show_last_n

matchpath

max

md5

min

minute

month

months_between

named_struct

negative

next_day

ngrams

noop

noopstreaming

noopwithmap

noopwithmapstreaming

not

ntile

nvl

or

parse_url

parse_url_tuple

percent_rank

percentile

percentile_approx

pi

pmod

posexplode

positive

pow

power

printf

quarter

radians

rand

rank

reflect

reflect2

regexp

regexp_extract

regexp_replace

repeat

replace

reverse

rlike

round

row_number

rpad

rtrim

second

sentences

sha

sha1

sha2

shiftleft

shiftright

shiftrightunsigned

sign

sin

size

sort_array

soundex

space

split

sqrt

stack

std

stddev

stddev_pop

stddev_samp

str_to_map

struct

substr

substring

substring_index

sum

tan

to_date

to_unix_timestamp

to_utc_timestamp

translate

trim

trunc

ucase

unbase64

unhex

unix_timestamp

upper

var_pop

var_samp

variance

version

weekofyear

when

windowingtablefunction

xpath

xpath_boolean

xpath_double

xpath_float

xpath_int

xpath_long

xpath_number

xpath_short

xpath_string

year

|

~

Time taken: 0.012 seconds, Fetched:  row(s)

hive>

查看函数怎么使用

hive> desc function upper;

OK

upper(str) - Returns str with all characters changed to uppercase

Time taken: 0.017 seconds, Fetched:  row(s)

hive>

查看带案例函数使用

hive> desc function extended upper ;

OK

upper(str) - Returns str with all characters changed to uppercase

Synonyms: ucase

Example:

  > SELECT upper('Facebook') FROM src LIMIT ;

  'FACEBOOK'

Time taken: 0.011 seconds, Fetched:  row(s)

hive>

往配置文件添加属性以在hive中显示数据库名和位置

<property>

    <name>hive.cli.print.header</name>

    <value>true</value>

    <description>Whether to print the names of the columns in query output.</description>
</property>

<property>
　　　　<name>hive.cli.print.current.db</name>
　　　　<value>true</value>
　　　　<description>Whether to include the current database in the Hive prompt.</description>
</property>

结果

hive (default)> use db_hive;

OK

Time taken: 0.897 seconds

hive (db_hive)> select * from student;

OK

student.id      student.name

    gailun

    jeckes

    anny

    yi

Time taken: 1.744 seconds, Fetched:  row(s)

hive (db_hive)>

Hive 常见属性配置

Hive 数据仓库位置配置

hive 就是在建立在 Hadoop 基础之上的数据仓库,数据就是存储在 HDFS 之上,默认(default)数据仓库位置为: /user/hive/warehouse 下,每当我们创建一个数据库,就会在该目录下创建相应的文件夹如 db_hive

注意事项
* 在仓库目录下，没有对默认的数据库default创建文件夹
* 如果某张表属于default数据库，直接在数据仓库目录下创建一个文件夹

可以在下面属性中修改数据仓库的位置:

 <property>

  <name>hive.metastore.warehouse.dir</name>

  <value>/user/hive/warehouse</value>

 </property>

同组用户的执行权限,看官网

Hive uses Hadoop, so:

you must have Hadoop in your path OR
export HADOOP_HOME=<hadoop-install-dir>

In addition, you must use below HDFS commands to create /tmp and /user/hive/warehouse (aka hive.metastore.warehouse.dir) and set them chmod g+w before you can create a table in Hive.

  $ $HADOOP_HOME/bin/hadoop fs -mkdir       /tmp

  $ $HADOOP_HOME/bin/hadoop fs -mkdir       /user/hive/warehouse

  $ $HADOOP_HOME/bin/hadoop fs -chmod g+w   /tmp

  $ $HADOOP_HOME/bin/hadoop fs -chmod g+w   /user/hive/warehouse

Hive运行日志信息位置

$HIVE_HOME/conf/hive-log4j.properties
hive.log.dir=/opt/modules/hive-0.13.1/logs
hive.log.file=hive.log

指定hive运行时显示的log日志的级别

$HIVE_HOME/conf/hive-log4j.properties
hive.root.logger=INFO,DRFA

在cli命令行上显示当前数据库，以及查询表的行头信息

$HIVE_HOME/conf/hive-site.xml
  <property>
   <name>hive.cli.print.header</name>
   <value>true</value>
   <description>Whether to print the names of the columns in query output.</description>
  </property>

<property>
   <name>hive.cli.print.current.db</name>
   <value>true</value>
   <description>Whether to include the current database in the Hive prompt.</description>
  </property>

在启动hive时设置配置属性信息

$ bin/hive --hiveconf <property=value>

查看当前所有的配置信息

hive > set ;

hive (db_hive)> set system:user.name ;
system:user.name=beifeng
hive (db_hive)> set system:user.name=beifeng ;

此种方式，设置属性的值，仅仅在当前会话session生效

查看历史操作命令

$home/.hivehistory

Hive中"一次使用"命令

用户可能有时期望执行一个或者多个查询(使用分号隔开),执行结束后hive CLI立即退出.hive提供了这样的功能,因为CLI可以接受 -e 命令这样的形式.如果表 student 具有一个字符串字段和一个整形字段 ,我们可以看到如下输出:

[root@srv01 ~]#hive -e "select * from db_hive.student limit 3";

OK

student.id      student.name

    gailun

    jeckes

    anny

Time taken: 7.964 seconds, Fetched:  row(s)

[root@srv01 ~]#

从文件中执行Hive查询

Hive中可以使用 -f 文件名方式执行指定文件中的一个或者多个查询语句按照惯例,一般把这些Hive 查询文件保存为具有 .q 或者 .hql 后缀名的文件.工作中常用的sql脚本

[root@srv01 ~]# vim hivef.sql

select * from db_hive.student limit

[root@srv01 ~]# hive -f hivef.sql > /root/hivef_result.txt

[root@srv01 ~]# cat hivef_result.txt

student.id      student.name

    gailun

    jeckes

    anny

[root@srv01 ~]#

[root@srv01 ~]# hive -f hivef.sql

Logging initialized using configuration in file:/usr/hive/conf/hive-log4j2.properties Async: true

OK

student.id      student.name

    gailun

    jeckes

    anny

Time taken: 2.972 seconds, Fetched:  row(s)

执行shell命令

用户不需要退出 hive CLI 就可以执行简单的 bash shell 命令.只要在命令的前面加上! 并且以分号(;) 结尾就可以:

hive (db_hive)> !cat /root/student.txt

              > ;

    gailun

    jeckes

    anny

    yi

hive (db_hive)>

hive CLI中不能使用需要用户进行输入的交互式命令,而且不支持 shell 的"管道" 功能的文件的自动补全功能.例如, !ls *.sql ; 这个命令表示的是查找文件名为 *.sql;的文件,而不是表示显示以*.sql结尾的所有文件.

在Hive内使用Hadoop的dfs命令

用户可以在Hive CLI 中执行Hadoop的dfs...命令,只需要将hadoop 命令中的关键字hadoop 去掉,然后以分号结尾就可以了:

hive (db_hive)> dfs -ls /user;

Found  items

drwxr-xr-x   - root supergroup           -- : /user/hive

hive (db_hive)>

这种使用hadoop命令的方式实际上比与其等价的在bash shell 中执行的hadoop dfs ...命令更加高效.因为后者每次都会启动一个新的JVM实例,而 Hive 会在同一个进程中挂靠这些命令.

用户可以通过如下命令查看 dfs 所提供的所有功能选项列表 :

hive (db_hive)> dfs -help;

Hive日期数据类型

类型	支持版本
TIMESTAMP	Note: Only available starting with Hive 0.8.0
DATE	Note: Only available starting with Hive 0.12.0
INTERVAL	Note: Only available starting with Hive 1.2.0

Dates
DATE值描述特定的年/月/日，格式为YYYY-MM-DD。例如，DATE’2013-01-01’。日期类型没有时间组件。 Date类型支持的值范围是0000-01-01到9999-12-31，这取决于Java Date类型的原始支持。
Date types只能在Date, Timestamp, or String types之间转换。

转换类型	结果
cast(date as date)	Same date value
cast(date as string)	The year/month/day represented by the Date is formatted as a string in the form ‘YYYY-MM-DD’.
cast(date as timestamp)	A timestamp value is generated corresponding to midnight of the year/month/day of the date value, based on the local timezone.
cast(string as date)	If the string is in the form ‘YYYY-MM-DD’, then a date value corresponding to that year/month/day is returned. If the string value does not match this formate, then NULL is returned.
cast(timestamp as date)	The year/month/day of the timestamp is determined, based on the local timezone, and returned as a date value.

hive查看版本号

hive> hive.hwi.war.file

Beeline

hive的使用界面非常不友好，在hive2我们可以使用beeline来操作hive。

连接到hive

bin/beeline   -u jdbc:hive2://srv01:1000 -n root

删除表：

drop table tableName;

删除表的效果是：

　　hive会从元数据库中清除关于这个表的信息；

　　hive会从hdfs中删除这个表的表目录