Hive Data Definition Language

Overview

HiveQL DDL statements are documented here, including:

  • CREATE DATABASE/SCHEMA, TABLE, VIEW, FUNCTION, INDEX
  • DROP DATABASE/SCHEMA, TABLE, VIEW, INDEX
  • TRUNCATE TABLE
  • ALTER DATABASE/SCHEMA, TABLE, VIEW
  • MSCK REPAIR TABLE (or ALTER TABLE RECOVER PARTITIONS)
  • SHOW DATABASES/SCHEMAS, TABLES, TBLPROPERTIES, PARTITIONS, FUNCTIONS, INDEX[ES], COLUMNS, CREATE TABLE
  • DESCRIBE DATABASE/SCHEMA, table_name, view_name

PARTITION statements are usually options of TABLE statements, except for SHOW PARTITIONS.

Create/Drop/Alter Database

Create Database

CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name

[COMMENT database_comment]

[LOCATION hdfs_path]

[WITH DBPROPERTIES (property_name=property_value, ...)];

The uses of SCHEMA and DATABASE are interchangeable – they mean the same thing. CREATE DATABASE was added in Hive 0.6 (HIVE-675).  The WITH DBPROPERTIES clause was added in Hive 0.7 (HIVE-1836).

Drop Database

DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];

The uses of SCHEMA and DATABASE are interchangeable – they mean the same thing. DROP DATABASE was added in Hive 0.6 (HIVE-675).

SHEMA 和数据库是一样的意思。

Alter Database

ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...);   -- (Note: SCHEMA added in Hive 0.14.0)

ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role;   -- (Note: Hive 0.13.0 and later; SCHEMA added in Hive 0.14.0)

The uses of SCHEMA and DATABASE are interchangeable – they mean the same thing. ALTER SCHEMA was added in Hive 0.14 (HIVE-6601).

No other metadata about a database can be changed.

Use Database

USE database_name;

USE DEFAULT;

USE sets the current database for all subsequent HiveQL statements. To revert to the default database, use the keyword "default" instead of a database name.

USE database_name was added in Hive 0.6 (HIVE-675).

Create/Drop/Truncate Table

Create Table

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name   -- (Note: TEMPORARY available in Hive 0.14.0 and later)

[(col_name data_type [COMMENT col_comment], ...)]

[COMMENT table_comment]

[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]

[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]

[SKEWED BY (col_name, col_name, ...) ON ([(col_value, col_value, ...), ...|col_value, col_value, ...])

[STORED AS DIRECTORIES]    -- (Note: Available in Hive 0.10.0 and later)]

[

[ROW FORMAT row_format]

[STORED AS file_format]

| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]   -- (Note: Available in Hive 0.6.0 and later)

]

[LOCATION hdfs_path]

[TBLPROPERTIES (property_name=property_value, ...)]   -- (Note: Available in Hive 0.6.0 and later)

[AS select_statement];   -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name

LIKE existing_table_or_view_name

[LOCATION hdfs_path];

data_type

: primitive_type

| array_type

| map_type

| struct_type

| union_type  -- (Note: Available in Hive 0.7.0 and later)

primitive_type

: TINYINT

| SMALLINT

| INT

| BIGINT

| BOOLEAN

| FLOAT

| DOUBLE

| STRING

| BINARY      -- (Note: Available in Hive 0.8.0 and later)

| TIMESTAMP   -- (Note: Available in Hive 0.8.0 and later)

| DECIMAL     -- (Note: Available in Hive 0.11.0 and later)

| DECIMAL(precision, scale)  -- (Note: Available in Hive 0.13.0 and later)

| VARCHAR     -- (Note: Available in Hive 0.12.0 and later)

| CHAR        -- (Note: Available in Hive 0.13.0 and later)

array_type

: ARRAY < data_type >

map_type

: MAP < primitive_type, data_type >

struct_type

: STRUCT < col_name : data_type [COMMENT col_comment], ...>

union_type

: UNIONTYPE < data_type, data_type, ... >  -- (Note: Available in Hive 0.7.0 and later)

row_format

: DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]

[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]

[NULL DEFINED AS char]   -- (Note: Available in Hive 0.13 and later)

| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]

file_format:

: SEQUENCEFILE

| TEXTFILE    -- (Default, depending on hive.default.fileformat configuration)

| RCFILE      -- (Note: Available in Hive 0.6.0 and later)

| ORC         -- (Note: Available in Hive 0.11.0 and later)

| PARQUET     -- (Note: Available in Hive 0.13.0 and later)

| AVRO        -- (Note: Available in Hive 0.14.0 and later)

| INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname

CREATE TABLE creates a table with the given name. An error is thrown if a table or view with the same name already exists. You can use IF NOT EXISTS to skip the error.

  • Table names and column names are case insensitive but SerDe and property names are case sensitive.
  • In Hive 0.12 and earlier, only alphanumeric and underscore characters are allowed in table and column names.
  • In Hive 0.13 and later, column names can contain any Unicode character (see HIVE-6013). Any column name that is specified within backticks (`) is treated literally. Within a backtick string, use double backticks (``) to represent a backtick character.
  • To revert to pre-0.13.0 behavior and restrict column names to alphanumeric and underscore characters, set the configuration property hive.support.quoted.identifiers to none. In this configuration, backticked names are interpreted as regular expressions. For details, see Supporting Quoted Identifiers in Column Names.
  • Table and column comments are string literals (single-quoted).
  • The TBLPROPERTIES clause allows you to tag the table definition with your own metadata key/value pairs. Some predefined table properties also exist, such as last_modified_user and last_modified_time which are automatically added and managed by Hive. Other predefined table properties include:
  • TBLPROPERTIES ("comment"="table_comment")
  • TBLPROPERTIES ("hbase.table.name"="table_name") – see HBase Integration.
  • TBLPROPERTIES ("immutable"="true") or ("immutable"="false") in release 0.13.0+ (HIVE-6406) – see Inserting Data into Hive Tables from Queries.
  • TBLPROPERTIES ("orc.compress"="ZLIB") or ("orc.compress"="SNAPPY") or ("orc.compress"="NONE") and other ORC properties – see ORC Files.
  • TBLPROPERTIES ("transactional"="true") or ("transactional"="false") in release 0.14.0+, the default is "false" – see Hive Transactions.
  • TBLPROPERTIES ("NO_AUTO_COMPACTION"="true") or ("NO_AUTO_COMPACTION"="false"), the default is "false" – see Hive Transactions.
  • To specify a database for the table, either issue the USE database_name statement prior to the CREATE TABLE statement (in Hive 0.6 and later) or qualify the table name with a database name ("database_name.table.name" inHive 0.7 and later). The keyword "default" can be used for the default database.

See Alter Table below for more information about table comments, table properties, and SerDe properties.

See Type System and Hive Data Types for details about the primitive and complex data types.

Row Format, Storage Format, and SerDe

You can create tables with a custom SerDe or using a native SerDe. A native SerDe is used if ROW FORMAT is not specified or ROW FORMAT DELIMITED is specified. You can use the DELIMITED clause to read delimited files, and you can enable escaping for the delimiter characters by using the 'ESCAPED BY' clause (such as ESCAPED BY '\') – escaping is needed if you want to work with data that can contain these delimiter characters. A custom NULL format can also be specified using the 'NULL DEFINED AS' clause (default is '\N'). Use the SERDE clause to create a table with a custom SerDe. For more information on SerDes see:

You must specify a list of columns for tables that use a native SerDe. Refer to the Types part of the User Guide for the allowable column types. A list of columns for tables that use a custom SerDe may be specified but Hive will query the SerDe to determine the actual list of columns for this table.

Use STORED AS TEXTFILE if the data needs to be stored as plain text files. TEXTFILE is the default file format, unless the configuration parameter hive.default.fileformat has a different setting.

Use STORED AS SEQUENCEFILE if the data needs to be compressed. Please read more about CompressedStorage if you are planning to keep data compressed in your Hive tables.

Use STORED AS ORC if the data needs to be stored in ORC file format.

Use ROW FORMAT SERDE for the RegEx SerDe, as shown in the example Apache Weblog Data in Getting Started.

Use INPUTFORMAT and OUTPUTFORMAT in the file_format to specify the name of a corresponding InputFormat and OutputFormat class as a string literal, for example, 'org.apache.hadoop.hive.contrib.fileformat.base64.Base64TextInputFormat'. For LZO compression, the values to use are 'INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat" OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"' (see LZO Compression).

Use STORED AS PARQUET (without ROW FORMAT SERDE) for the Parquet columnar storage format in Hive 0.13.0 and later; or use ROW FORMAT SERDE ... STORED AS INPUTFORMAT ... OUTPUTFORMAT ... in Hive 0.10, 0.11, or 0.12.

Use STORED AS AVRO for Avro files in Hive 0.14.0 and later (see Avro SerDe).

Use STORED BY to create a non-native table, for example in HBase. See StorageHandlers for more information on this option.

To change a table's SerDe or SERDEPROPERTIES, use the ALTER TABLE statement as described below in Add SerDe Properties.

Partitioned Tables 分区表

Partitioned tables can be created using the PARTITIONED BY clause. A table can have one or more partition columns and a separate data directory is created for each distinct value combination in the partition columns. Further, tables or partitions can be bucketed using CLUSTERED BY columns, and data can be sorted within that bucket via SORT BY columns. This can improve performance on certain kinds of queries.

If, when creating a partitioned table, you get this error: "FAILED: Error in semantic analysis: Column repeated in partitioning columns," it means you are trying to include the partitioned column in the data of the table itself. You probably really do have the column defined. However, the partition you create makes a pseudocolumn on which you can query, so you must rename your table column to something else (that users should not query on!). 解决该错误的方法是,列名和分区列名不一样。

Here is an example. Suppose your original table was this:

id     int,

date   date,

name   varchar

Now you want to partition on date. Your Hive definition would be this:

create table table_name (

id                int,

dtDontQuery       string,

name              string

)

partitioned by (date string)

Now your users will still query on "where date = '...'" but the 2nd column will be the original values.

Here's an example statement to create a table:

CREATE TABLE page_view(viewTime INT, userid BIGINT,

page_url STRING, referrer_url STRING,

ip STRING COMMENT 'IP Address of the User')

COMMENT 'This is the page view table'

PARTITIONED BY(dt STRING, country STRING)

STORED AS SEQUENCEFILE;

The statement above creates the page_view table with viewTime, userid, page_url, referrer_url, and ip columns (including comments). The table is also partitioned and data is stored in sequence files. The data format in the files is assumed to be field-delimited by ctrl-A and row-delimited by newline.

CREATE TABLE page_view(viewTime INT, userid BIGINT,

page_url STRING, referrer_url STRING,

ip STRING COMMENT 'IP Address of the User')

COMMENT 'This is the page view table'

PARTITIONED BY(dt STRING, country STRING)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\001'

STORED AS SEQUENCEFILE;

The above statement lets you create the same table as the previous table.

In the previous examples the data is stored in <hive.metastore.warehouse.dir>/page_view. Specify a value for the key hive.metastore.warehouse.dir in Hive config file hive-site.xml.

External Tables  外部表

The EXTERNAL keyword lets you create a table and provide a LOCATION so that Hive does not use a default location for this table. This comes in handy if you already have data generated. When dropping an EXTERNAL table, data in the table is NOT deleted from the file system.

An EXTERNAL table points to any HDFS location for its storage, rather than being stored in a folder specified by the configuration property hive.metastore.warehouse.dir.

外部表的数据存储路径可以指向任意HDFS路径,而不是在配置文件中配置的数据仓库的地址。

CREATE EXTERNAL TABLE page_view(viewTime INT, userid BIGINT,

page_url STRING, referrer_url STRING,

ip STRING COMMENT 'IP Address of the User',

country STRING COMMENT 'country of origination')

COMMENT 'This is the staging page view table'

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054'

STORED AS TEXTFILE

LOCATION '<hdfs_location>';

You can use the above statement to create a page_view table which points to any hdfs location for its storage. But you still have to make sure that the data is delimited as specified in the CREATE statement above.

Create Table As Select (CTAS)   常见表使用AS Seelect 

Tables can also be created and populated by the results of a query in one create-table-as-select (CTAS) statement. The table created by CTAS is atomic, meaning that the table is not seen by other users until all the query results are populated. So other users will either see the table with the complete results of the query or will not see the table at all.

There are two parts in CTAS, the SELECT part can be any SELECT statement supported by HiveQL. The CREATE part of the CTAS takes the resulting schema from the SELECT part and creates the target table with other table properties such as the SerDe and storage format.

CTAS has these restrictions:  三个要求

  • The target table cannot be a partitioned table.
  • The target table cannot be an external table.
  • The target table cannot be a list bucketing table.

Example:

CREATE TABLE new_key_value_store

ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"

STORED AS RCFile

AS

SELECT (key % 1024) new_key, concat(key, value) key_value_pair

FROM key_value_store

SORT BY new_key, key_value_pair;

The above CTAS statement creates the target table new_key_value_store with the schema (new_key DOUBLE, key_value_pair STRING) derived from the results of the SELECT statement. If the SELECT statement does not specify column aliases, the column names will be automatically assigned to _col0, _col1, and _col2 etc. In addition, the new target table is created using a specific SerDe and a storage format independent of the source tables in the SELECT statement.

Starting with Hive 0.13.0, the SELECT statement can include one or more common table expressions (CTEs), as shown in the SELECT syntax. For an example, see Common Table Expression.

Being able to select data from one table to another is one of the most powerful features of Hive. Hive handles the conversion of the data from the source format to the destination format as the query is being executed.

Create Table Like  从现有的表中创建新表,没有数据,但是保留其他所有特性。

The LIKE form of CREATE TABLE allows you to copy an existing table definition exactly (without copying its data). In contrast to CTAS, the statement below creates a new empty_key_value_store table whose definition exactly matches the existing key_value_store in all particulars other than table name. The new table contains no rows.

CREATE TABLE empty_key_value_store

LIKE key_value_store;

Before Hive 0.8.0, CREATE TABLE LIKE view_name would make a copy of the view. In Hive 0.8.0 and later releases, CREATE TABLE LIKE view_name creates a table by adopting the schema of view_name (fields and partition columns) using defaults for SerDe and file formats.

Bucketed Sorted Tables  桶排序表

CREATE TABLE page_view(viewTime INT, userid BIGINT,

page_url STRING, referrer_url STRING,

ip STRING COMMENT 'IP Address of the User')

COMMENT 'This is the page view table'

PARTITIONED BY(dt STRING, country STRING)

CLUSTERED BY(userid) SORTED BY(viewTime) INTO 32 BUCKETS

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\001'

COLLECTION ITEMS TERMINATED BY '\002'

MAP KEYS TERMINATED BY '\003'

STORED AS SEQUENCEFILE;

In the example above, the page_view table is bucketed (clustered by) userid and within each bucket the data is sorted in increasing order of viewTime. Such an organization allows the user to do efficient sampling on the clustered column - in this case userid. The sorting property allows internal operators to take advantage of the better-known data structure while evaluating queries, also increasing efficiency. MAP KEYS and COLLECTION ITEMS keywords can be used if any of the columns are lists or maps.

The CLUSTERED BY and SORTED BY creation commands do not affect how data is inserted into a table – only how it is read. This means that users must be careful to insert data correctly by specifying the number of reducers to be equal to the number of buckets, and using CLUSTER BY and SORT BY commands in their query.

There is also an example of creating and populating bucketed tables.

Skewed Tables

Version information

Icon

As of Hive 0.10.0 (HIVE-3072 and HIVE-3649). See HIVE-3026 for additional JIRA tickets that implemented list buckeing in Hive 0.10.0 and 0.11.0.

Design document

Icon

Read the design document for more information.

This feature can be used to improve performance for tables where one or more columns have skewed values. By specifying the values that appear very often (heavy skew) Hive will split those out into separate files (directories in case oflist bucketing) automatically and take this fact into account during queries so that it can skip (or include) the whole file (directory in case of list bucketing) if possible.

This can be specified on a per-table level during table creation.

This is an example where we have one column with three skewed values, optionally with the STORED AS DIRECTORIES clause for list bucketing:

CREATE TABLE list_bucket_single (key STRING, value STRING)

SKEWED BY (key) ON (1,5,6) [STORED AS DIRECTORIES];

And here is an example of a table with two skewed columns:

CREATE TABLE list_bucket_multiple (col1 STRING, col2 int, col3 STRING)

SKEWED BY (col1, col2) ON (('s1',1), ('s3',3), ('s13',13), ('s78',78)) [STORED AS DIRECTORIES];

Temporary Tables  临时表

Version information

Icon

As of Hive 0.14.0 (HIVE-7090).

A table that has been created as a temporary table will only be visible to the current session. Data will be stored in the user's scratch directory, and deleted at the end of the session.

If a temporary table is created with a database/table name of a permanent table which already exists in the database, then within that session any references to that table will resolve to the temporary table, rather than to the permanent table. The user will not be able to access the original table within that session without either dropping the temporary table, or renaming it to a non-conflicting name.

Temporary tables have the following limitations:

  • Partition columns are not supported
  • No support for creation of indexes

Drop Table 删除表

DROP TABLE [IF EXISTS] table_name;

DROP TABLE removes metadata and data for this table. The data is actually moved to the .Trash/Current directory if Trash is configured. The metadata is completely lost. 删除表:首先会删除元数据,其次数据文件会被移动到用户配置好的垃圾文件夹中。

When dropping an EXTERNAL table, data in the table will NOT be deleted from the file system.

When dropping a table referenced by views, no warning is given (the views are left dangling as invalid and must be dropped or recreated by the user).

See the next section on ALTER TABLE for how to drop partitions.

Otherwise, the table information is removed from the metastore and the raw data is removed as if by 'hadoop dfs -rm'. In many cases, this results in the table data being moved into the user's .Trash folder in their home directory; users who mistakenly DROP TABLEs mistakenly may thus be able to recover their lost data by re-creating a table with the same schema, re-creating any necessary partitions, and then moving the data back into place manually using Hadoop. This solution is subject to change over time or across installations as it relies on the underlying implementation; users are strongly encouraged not to drop tables capriciously. 不要随便删除表

In Hive 0.7.0 or later, DROP returns an error if the table doesn't exist, unless IF EXISTS is specified or the configuration variable hive.exec.drop.ignorenonexistent is set to true.

Truncate Table  

Version information

Icon

As of Hive 0.11.0 (HIVE-446).

TRUNCATE TABLE table_name [PARTITION partition_spec];

partition_spec:

: (partition_column = partition_col_value, partition_column = partition_col_value, ...)

Removes all rows from a table or partition(s). Currently target table should be native/managed table or exception will be thrown. User can specify partial partition_spec for truncating multiple partitions at once and omitting partition_spec will truncate all partitions in the table.

[Hive - LanguageManual] Create/Drop/Alter Database Create/Drop/Truncate Table的更多相关文章

  1. [Hive - LanguageManual] Create/Drop/Alter -View、 Index 、 Function

    Create/Drop/Alter View Create View Drop View Alter View Properties Alter View As Select Version info ...

  2. flashback database(drop tablespace)

    1.首先记录时间 select to_char(systimestamp,'yyyy-mm-dd HH24:MI:SS') from dual;--2014-04-25 13:55:48 查看表sel ...

  3. SQL——CREATE、ALTER、DROP和VIEW

    CREATE DATABASE - 创建新数据库    语法:CREATE DATABASE database_nameALTER DATABASE - 修改数据库    CREATE TABLE - ...

  4. 关于alter database datafile offline和alter database datafile offline drop 的区别

    转: https://blog.csdn.net/killvoon/article/details/46913183 -----------------------2015-07-16-------- ...

  5. 使用alter database datafile 'XXX' offline drop 是否能够恢复(非归档模式下)

    今天在群里面听到一位网友在说使用了alter database datafile 'XXX' offline drop命令是否能够恢复数据,在非归档模式下,下面是用一个实验来验证一下 ######## ...

  6. [Hive - LanguageManual] Alter Table/Partition/Column

    Alter Table/Partition/Column Alter Table Rename Table Alter Table Properties Alter Table Comment Add ...

  7. python3 速查参考- python基础 9 -> MySQL基础概念、数据库create、alter、insert、update、delete、select等基础命令

    前置步骤: 下载一个绿色版的mysql数据库客户端连接工具 :http://wosn.net/821.html mysql平台为win7(以后会有CentOS上的) 学习目的: 掌握数据库的基本概念, ...

  8. sql--select into,create database,create table,Constraints

    SQL SELECT INTO 语句可用于创建表的备份复件.SELECT INTO 语句SELECT INTO 语句从一个表中选取数据,然后把数据插入另一个表中.SELECT INTO 语句常用于创建 ...

  9. 通过dbcp链接池对数据库操作报 Cannot create PoolableConnectionFactory (Could not create connection to database server. Attempted reconnect 3 times. Giving up.)--解决方案

    org.springframework.transaction.CannotCreateTransactionException: Could not open JDBC Connection for ...

随机推荐

  1. 安装nginx创建错误

    ./configure: error: the HTTP gzip module requires the zlib library. 解决: yum install -y zlib-devel -- ...

  2. Android 关于listView 显示不全的问题

    刚刚在项目中发现一个bug,我是用ScrollView 嵌套 ListView的,但是我的数据只能显示一条,开始我还以为是数据有错误,经过排查以后发现是正确的 百度发现 android的架构好像没有考 ...

  3. Android Service实时向Activity传递数据

    演示一个案例,需求如下:在Service组件中创建一个线程,该线程用来生产数值,每隔1秒数值自动加1,然后把更新后的数值在界面上实时显示. 步骤如下:1.新建一个android项目工程,取名为demo ...

  4. Android如何获取开机启动项列表

    static final String BOOT_START_PERMISSION = "android.permission.RECEIVE_BOOT_COMPLETED"; p ...

  5. 【Spring】Redis的两个典型应用场景--good

    原创 BOOT Redis简介 Redis是目前业界使用最广泛的内存数据存储.相比memcached,Redis支持更丰富的数据结构,例如hashes, lists, sets等,同时支持数据持久化. ...

  6. Checked&Unchecked Exception

    Java 中定义了两类异常: 1) Checked exception: 这类异常都是Exception的子类 .异常的向上抛出机制进行处理,如果子类可能产生A异常,那么在父类中也必须throws A ...

  7. getpeername

    定义: int getpeername(int s, struct sockaddr *name, socklen_t *namelen); 描述: 获取socket的对方地址   得到对方的地址 s ...

  8. IIS 无法打开页面,只能重启的问题

    最终解决方案: 要变通解决此问题,启用 EnableAggressiveMemoryUsage 注册表项在注册表中.当启用了 EnableAggressiveMemoryUsage 注册表项 Http ...

  9. Fucking "pkg-config not found"

    If you got pkg-config not found error in gnu auto tools then, you must install the related librarys ...

  10. ExtJs自学教程(1):一切从API开始

    题 记 该系列文章不侧重全方位的去介绍ExtJs的使用,只是侧重于解决ExtJs问题的思考方法.写的人不用长篇大论,学的人则能够自立更生.l  学习的人只要有一些CSS的javascript的基础知识 ...