LanguageManual DML

Hive Data Manipulation Language

There are multiple ways to modify data in Hive:

EXPORT and IMPORT commands are also available (as of Hive 0.8).

Loading files into tables

Hive does not do any transformation while loading data into tables. Load operations are currently pure copy/move (纯复制,移动) operations that move datafiles into locations corresponding to Hive tables.

Syntax 语法
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
Synopsis 简介

Load operations are currently pure copy/move operations that move datafiles into locations corresponding to Hive tables.

  • filepath can be:

    • a relative path, such as project/data1
    • an absolute path, such as /user/hive/project/data1
    • a full URI with scheme and (optionally) an authority, such as hdfs://namenode:9000/user/hive/project/data1
  • The target being loaded to can be a table or a partition. If the table is partitioned, then one must specify a specific partition of the table by specifying values for all of the partitioning columns.
  • filepath can refer to a file (in which case Hive will move the file into the table) or it can be a directory (in which case Hive will move all the files within that directory into the table). In either case,filepath addresses a set of files.
  • If the keyword LOCAL is specified, then:
    • the load command will look for filepath in the local file system. If a relative path is specified, it will be interpreted relative to the user's current working directory. The user can specify a full URI for local files as well - for example: file:///user/hive/project/data1
    • the load command will try to copy all the files addressed by filepath to the target filesystem. The target file system is inferred by looking at the location attribute of the table. The copied data files will then be moved to the table.
  • If the keyword LOCAL is not specified, then Hive will either use the full URI of filepath, if one is specified, or will apply the following rules:
    • If scheme or authority are not specified, Hive will use the scheme and authority from the hadoop configuration variable fs.default.name that specifies the Namenode URI.
    • If the path is not absolute, then Hive will interpret it relative to /user/<username>
    • Hive will move the files addressed by filepath into the table (or partition)
  • If the OVERWRITE keyword is used then the contents of the target table (or partition) will be deleted and replaced by the files referred to by filepath; otherwise the files referred by filepath will be added to the table.
    • Note that if the target table (or partition) already has a file whose name collides(冲突) with any of the filenames contained in filepath, then the existing file will be replaced with the new file.
Notes
  • filepath cannot contain subdirectories.(filepath可以是目录也可以是文件,但是不能包含子目录)
  • If the keyword LOCAL is not given, filepath must refer to files within the same filesystem as the table's (or partition's) location.
  • Hive does some minimal checks to make sure that the files being loaded match the target table. Currently it checks that if the table is stored in sequencefile format, the files being loaded are also sequencefiles, and vice versa.
  • A bug that prevented loading a file when its name includes the "+" character is fixed in release 0.13.0 (HIVE-6048).
  • Please read CompressedStorage if your datafile is compressed.

Inserting data into Hive Tables from queries

Query Results can be inserted into tables by using the insert clause.

Syntax 语法
Standard syntax:
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;
 
Hive extension (multiple inserts):
 
FROM from_statement
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2]
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] ...;
 
FROM from_statement
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2]
[INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] ...;
 
Hive extension (dynamic partition inserts):
INSERT OVERWRITE TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;
INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;
Synopsis
  • INSERT OVERWRITE will overwrite any existing data in the table or partition

    • unless IF NOT EXISTS is provided for a partition (as of Hive 0.9.0).
  • INSERT INTO will append to the table or partition, keeping the existing data intact(完整无缺的). (Note: INSERT INTO syntax is only available starting in version 0.8.)
    • As of Hive 0.13.0, a table can be made immutable(不可变的) by creating it with TBLPROPERTIES ("immutable"="true"). The default is "immutable"="false".
      INSERT INTO behavior into an immutable table is disallowed if any data is already present, although INSERT INTO still works if the immutable table is empty. The behavior of INSERT OVERWRITE is not affected by the "immutable" table property. (INSERT OVERWRITE不受immutable属性的限制)
      An immutable table is protected against accidental updates due to a script loading data into it being run multiple times by mistake. (避免多次插入和修改) The first insert into an immutable table succeeds and successive inserts fail, resulting in only one set of data in the table, instead of silently succeeding with multiple copies of the data in the table.
  • Inserts can be done to a table or a partition. If the table is partitioned, then one must specify a specific partition of the table by specifying values for all of the partitioning columns.
  • Multiple insert clauses (also known as Multi Table Insert) can be specified in the same query.
  • The output of each of the select statements is written to the chosen table (or partition). Currently the OVERWRITE keyword is mandatory(强制的) and implies(暗示,说明) that the contents of the chosen table or partition are replaced with the output of corresponding(适当的) select statement.
  • The output format and serialization class is determined by the table's metadata (as specified via DDL commands on the table).
  • As of Hive 0.14, if a table has an OutputFormat that implements AcidOutputFormat and the system is configured to use a transaction manager that implements ACID, then INSERT OVERWRITE will be disabled for that table.  This is to avoid users unintentionally overwriting transaction history.  The same functionality can be achieved by using TRUNCATE TABLE (for non-partitioned tables) or DROP PARTITION followed by INSERT INTO.
Notes
  • Multi Table Inserts minimize the number of data scans required. Hive can insert data into multiple tables by scanning the input data just once (and applying different query operators) to the input data.
  • Starting with Hive 0.13.0, the select statement can include one or more common table expressions (CTEs) as shown in the SELECT syntax. For an example, see Common Table Expression.
Dynamic Partition Inserts 动态分区插入

Version information

Icon

This information reflects the situation in Hive 0.12; dynamic partition inserts were added in Hive 0.6.

In the dynamic partition inserts, users can give partial partition specifications, which means just specifying the list of partition column names in the PARTITION clause. The column values are optional. If a partition column value is given, we call this a static partition, otherwise it is a dynamic partition. Each dynamic partition column has a corresponding input column from the select statement. This means that the dynamic partition creation is determined by the value of the input column. The dynamic partition columns must be specified last among the columns in the SELECT statement and in the same order in which they appear in the PARTITION() clause.

Dynamic Partition inserts are disabled by default. These are the relevant(相关的) configuration properties for dynamic partition inserts:

Configuration property

Default

Note

hive.exec.dynamic.partition

false

Needs to be set to true to enable dynamic partition inserts

hive.exec.dynamic.partition.mode

strict

In strict mode, the user must specify at least one static partition in case the user accidentally overwrites all partitions, in nonstrictmode all partitions are allowed to be dynamic

hive.exec.max.dynamic.partitions.pernode

100

Maximum number of dynamic partitions allowed to be created in each mapper/reducer node

hive.exec.max.dynamic.partitions

1000

Maximum number of dynamic partitions allowed to be created in total

hive.exec.max.created.files

100000

Maximum number of HDFS files created by all mappers/reducers in a MapReduce job

hive.error.on.empty.partition

false

Whether to throw an exception if dynamic partition insert generates empty results

Example
FROM page_view_stg pvs
INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country)
       SELECT pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url, nullnull, pvs.ip, pvs.cnt

Here the country partition will be dynamically created by the last column from the SELECT clause (i.e. pvs.cnt). Note that the name is not used. In nonstrict mode the dt partition could also be dynamically created.

Additional Documentation

Writing data into the filesystem from queries

Query results can be inserted into filesystem directories by using a slight variation (细微的变化)of the syntax above:

Syntax
Standard syntax:
INSERT OVERWRITE [LOCAL] DIRECTORY directory1
  [ROW FORMAT row_format] [STORED AS file_format] (Note: Only available starting with Hive 0.11.0)
  SELECT ... FROM ...
 
Hive extension (multiple inserts):
FROM from_statement
INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1
[INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ...
 
 
row_format
  : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
        [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
        [NULL DEFINED AS char] (Note: Only available starting with Hive 0.13)
Synopsis
  • Directory can be a full URI. If scheme or authority are not specified, Hive will use the scheme and authority from the hadoop configuration variable fs.default.name that specifies the Namenode URI.
  • If LOCAL keyword is used, Hive will write data to the directory on the local file system.
  • Data written to the filesystem is serialized as text with columns separated by ^A and rows separated by newlines. If any of the columns are not of primitive type, then those columns are serialized to JSON format.
Notes
  • INSERT OVERWRITE statements to directories, local directories, and tables (or partitions) can all be used together within the same query.
  • INSERT OVERWRITE statements to HDFS filesystem directories are the best way to extract large amounts of data from Hive. Hive can write to HDFS directories in parallel from within a map-reduce job.
  • The directory is, as you would expect, OVERWRITten; in other words, if the specified path exists, it is clobbered and replaced with the output.
  • As of Hive 0.11.0 the separator used can be specified, in earlier versions it was always the ^A character (\001)
  • In Hive 0.14, inserts into ACID compliant tables will deactivate vectorization for the duration of the select and insert.  This will be done automatically.  ACID tables that have data inserted into them can still be queried using vectorization.

Inserting values into tables from SQL

The INSERT...VALUES statement can be used to insert data into tables directly from SQL.

Version Information

Icon

INSERT...VALUES is available starting in Hive 0.14.

Inserting values from SQL statements can only be performed on tables that support ACID. See Hive Transactions for details.

Syntax
Standard Syntax:
INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] ...)] VALUES values_row [, values_row ...]
 
Where values_row is:
( value [, value ...] )
where a value is either null or any valid SQL literal
Synopsis
  • Each row listed in the VALUES clause is inserted into table tablename.
  • Values must be provided for every column in the table.  The standard SQL syntax that allows the user to insert values into only some columns is not yet supported.  To mimic the standard SQL, nulls can be provided for columns the user does not wish to assign a value to.
  • Dynamic partitioning is supported in the same way as for INSERT...SELECT.
  • If the table being inserted into supports ACID and a transaction manager that supports ACID is in use, this operation will be auto-committed upon successful completion.
  • Insert, update, delete operations are not supported on tables that are sorted (tables created with the SORTED BY clause).
  • Hive does not support literals for complex types, so it is not possible to use them in INSERT INTO...VALUES clauses. 
  • Means user cannot insert data into complex datatype [array, map, struct, union] columns using INSERT INTO...VALUES clause.
Examples
CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3, 2))
  CLUSTERED BY (age) INTO 2 BUCKETS STORED AS ORC;
 
INSERT INTO TABLE students
  VALUES ('fred flintstone', 35, 1.28), ('barney rubble', 32, 2.32);
 
 
 
CREATE TABLE pageviews (userid VARCHAR(64), link STRING, from STRING)
  PARTITIONED BY (datestamp STRING) CLUSTERED BY (userid) INTO 256 BUCKETS STORED AS ORC;
 
 
INSERT INTO TABLE pageviews PARTITION (datestamp = '2014-09-23')
  VALUES ('jsmith''mail.com''sports.com'), ('jdoe''mail.com'null);
 
 
INSERT INTO TABLE pageviews PARTITION (datestamp)
  VALUES ('tjohnson''sports.com''finance.com''2014-09-23'), ('tlee''finance.com'null'2014-09-21');

Update

Version Information

Icon

UPDATE is available starting in Hive 0.14.

Updates can only be performed on tables that support ACID. See Hive Transactions for details.

Syntax
Standard Syntax:
UPDATE tablename SET column = value [, column = value ...] [WHERE expression]
Synopsis
  • The referenced column must be a column of the table being updated.
  • The value assigned must be an expression that Hive supports in the select clause.  Thus arithmetic operators, UDFs, casts, literals, etc. are supported.  Subqueries are not supported.
  • Only rows that match the WHERE clause will be updated.
  • Partitioning columns cannot be updated.
  • Bucketing columns cannot be updated.
  • In Hive 0.14, upon successful completion of this operation the changes will be auto-committed.
Notes
  • Vectorization will be turned off for update operations.  This is automatic and requires no action on the part of the user.  Non-update operations are not affected.  Updated tables can still be queried using vectorization.
  • In version 0.14 it is recommended that you set hive.optimize.sort.dynamic.partition=false when doing updates, as this produces more efficient execution plans.

Delete

Version Information

Icon

DELETE is available starting in Hive 0.14.

Deletes can only be performed on tables that support ACID. See Hive Transactions for details.

Syntax
Standard Syntax:
DELETE FROM tablename [WHERE expression]
Synopsis
  • Only rows that match the WHERE clause will be deleted.
  • In Hive 0.14, upon successful completion of this operation the changes will be auto-committed.
Notes
    • Vectorization will be turned off for delete operations.  This is automatic and requires no action on the part of the user.  Non-delete operations are not affected.  Tables with deleted data can still be queried using vectorization.
    • In version 0.14 it is recommended that you set hive.optimize.sort.dynamic.partition=false when doing deletes, as this produces more efficient execution plans.

[Hive - LanguageManual] DML: Load, Insert, Update, Delete的更多相关文章

  1. MySQL之DML语句(insert update delete)

    DML主要针对数据库表对象的数据而言的,一般DML完成: 插入新数据 修改已添加的数据 删除不需要的数据 1.insert into插入语句 //主键自增可以不插入,所以用null代替 ); //指定 ...

  2. JDBC基础篇(MYSQL)——使用statement执行DML语句(insert/update/delete)

    注意:其中的JdbcUtil是我自定义的连接工具类:代码例子链接: package day02_statement; import java.sql.Connection; import java.s ...

  3. mysql数据恢复 insert\update\delete 工具MyFlash

    一.简介MyFlash是由美团点评公司技术工程部开发维护的一个回滚DML操作的工具.该工具通过解析v4版本的binlog,完成回滚操作.相对已有的回滚工具,其增加了更多的过滤选项,让回滚更加容易. 该 ...

  4. 关于MyBatis mapper的insert, update, delete返回值

    这里做了比较清晰的解释: http://mybatis.github.io/mybatis-3/java-api.html SqlSession As mentioned above, the Sql ...

  5. PHP5: mysqli 插入, 查询, 更新和删除 Insert Update Delete Using mysqli (CRUD)

    原文: PHP5: mysqli 插入, 查询, 更新和删除  Insert Update Delete Using mysqli (CRUD) PHP 5 及以上版本建议使用以下方式连接 MySQL ...

  6. mysql 事务是专门用来管理insert,update,delete语句的,和select语句一点不相干

    1.mysql 事务是专门用来管理insert,update,delete语句的,和select语句一点不相干 2.一般来说,事务是必须满足4个条件(ACID): Atomicity(原子性).Con ...

  7. insert update delete 语法 以及用法

    insert update delete 被称为 数据定义语句语句 也就是数据的增加 修改 删除 其中不包括查询 譬如: create database -创建数据库 alter database - ...

  8. sql中同一个Trigger里同时包含Insert,Update,Delete

    sql中同一个Trigger里同时包含Insert,Update,Delete SQLServer是靠Inserted表和Deleted表来处理的,判断一下就可以了,只不过比ORACLE麻烦一点 cr ...

  9. mybatis select/insert/update/delete

    这里做了比较清晰的解释: http://mybatis.github.io/mybatis-3/java-api.html SqlSession As mentioned above, the Sql ...

随机推荐

  1. Android:简单联网获取网页代码

    设置权限,在AndroidManifest.xml加入 <uses-permission android:name="android.permission.INTERNET" ...

  2. [iOS]修改开发者中心Bundle Identifier的一些配置

    登录开发者中心https://developer.apple.com 然后找到你的Bundle Identifier. 这里暂时只讲开启推送的功能,如果需要别的直接勾选前面的选择框 然后拉到最下面点击 ...

  3. C++:友元(非成员友元函数、成员友元函数、友元类)

    3.8  友元:友元函数和友元类 友元函数 :既可以是不属于任何类的非成员函数,也可以是另一个类的成员函数,统称为友元函数.友元函数不是当前类的成员函数,而是独立于类的外部函数,但它可以访问该类所有的 ...

  4. YASKAWA电机控制(2)---调试

    2015 5 23 基础调试—点动 上次接线由于没有接地,导致外壳带电,非常危险. 由于上次接线端子被弄坏,这次自己重做.由于没有压线钳,只用尖嘴钳把线压近端子,有可能会松动. 接线的时候Lc1.Lc ...

  5. Android zxing连续扫描

    initCamera(); if (mHandler != null) mHandler.restartPreviewAndDecode(); 在扫描完毕后执行这3句即可. 说明: 1.扫描处理方法为 ...

  6. Android Handler 避免内存泄漏的用法总结

    Android开发经常会用到handler,但是我们发现每次使用Handler都会出现:This Handler class should be static or leaks might occur ...

  7. 注意:C++中double的表示是有误差的

    注意:C++中double的表示是有误差的,直接通过下面的例子看一下 #include<iostream> using namespace std; int main() { double ...

  8. Android UI学习 - FrameLayou和布局优化(viewstub)

    原创作品,允许转载,转载时请务必以超链接形式标明文章 原始出处 .作者信息和本声明.否则将追究法律责任.http://android.blog.51cto.com/268543/308090 Fram ...

  9. ASP.NET中POST提交数据并跳转页面

    需求:先Post提交数据,然后跳转到目标页面 找了好久才发现这个神奇的类HttpHelper.原理很简单,利用html的from表单拼接,然后执行 使用方法: NameValueCollection ...

  10. HDU 4638 Group ★(树状数组)

    题意 询问一段区间里的数能组成多少段连续的数. 思路 先考虑从左往右一个数一个数添加,考虑当前添加了i - 1个数的答案是x,那么可以看出添加完i个数后的答案是根据a[i]-1和a[i]+1是否已经添 ...