Hadoop生态圈-Hive快速入门篇之HQL的基础语法

                                     作者:尹正杰

版权声明:原创作品,谢绝转载!否则将追究法律责任。

  本篇博客的重点是介绍Hive中常见的数据类型,DDL数据定义,DML数据操作以及常用的查询操作。如果你没有hive的安装环境的话,可以参考我之前分析搭建hive的笔记:https://www.cnblogs.com/yinzhengjie/p/9154324.html

一.Hive常见的属性配置

1>.Hive数据仓库位置配置

  1. >.Default数据仓库的最原始位置在“hdfs:/user/hive/warehouse/ ”路径下
  2. >.在仓库目录下,没有对默认的数据库default的创建文件夹(也就是说,如果有表属于default数据库,那么默认会存放在根路径下)。如果某张表属于default数据库,直接在数据仓库目录下创建一个文件夹
  3. >.修改default数据仓库原始位置(将默认配置文件“hive-defalut.xml.template”如下配置信息拷贝到hive-site.xml文件中
  4. <property>
  5. <name>hive.metastore.warehouse.dir</name>
  6. <value>/user/hive/warehouse</value>
  7. <description>location of default database for the warehouse</description>
  8. </property>

2>.配置当前数据库,以及查询表的头信息配置

  1. hive-site.xml文件中添加如下配置信息,即可以实现显示当前数据库,以及查询表的头信息配置。配置之后需要重启hive客户端
  2. <property>
  3. <name>hive.cli.print.header</name>
  4. <value>true</value>
  5. <description>Whether to print the names of the columns in query output.</description>
  6. </property>
  7.  
  8. <property>
  9. <name>hive.cli.print.current.db</name>
  10. <value>true</value>
  11. <description>Whether to include the current database in the Hive prompt.</description>
  12. </property>

  配置以上设置后,重启hive客户端,你会发现多了两个功能,可以查看表头以及当前所在的数据库:

3>.Hive运行日志信息配置

  1. >.Hivelog默认存放在"/tmp/atguigu/hive.log"目录下(当前用户名下)。
  2.  
  3. >.修改hivelog存放日志到"/home/yinzhengjie/hive/logs",我们可以修改hive-log4j2.properties进行配置,具体操作如下:
  4. [yinzhengjie@s101 ~]$ cd /soft/hive/conf/
  5. [yinzhengjie@s101 conf]$
  6. [yinzhengjie@s101 conf]$ cp hive-log4j2.properties.template hive-log4j2.properties #拷贝模板文件生成配置文件
  7. [yinzhengjie@s101 conf]$ grep property.hive.log.dir hive-log4j2.properties | grep -v ^#
  8. property.hive.log.dir = /home/yinzhengjie/hive/logs #指定log的存放位置
  9. [yinzhengjie@s101 conf]$
  10. [yinzhengjie@s101 conf]$ ll /home/yinzhengjie/hive/logs/hive.log
  11. -rw-rw-r-- yinzhengjie yinzhengjie Aug : /home/yinzhengjie/hive/logs/hive.log #重启hive,查看日志文件中的内容
  12. [yinzhengjie@s101 conf]$

4>.查看参数配置方式

  1. >.查看当前的所有配置信息(hive (yinzhengjie)> set;)
  2. 配置文件方式:
  3. 默认配置文件: hive-default.xml
  4. 用户自定义配置文件: hive-site.xml
  5. 注意:用户自定义配置会覆盖默认配置。另外,Hive也会读入Hadoop的配置,因为Hive是作为Hadoop的客户端启用的,Hive的配置会覆盖Hadoop的配置。配置文件的设定对本机启动的所有Hive进程都有效。
  6.  
  7. >.参数的配置三种方式以及优先级介绍
  8. 启动命令行时声明参数方式:
  9. [yinzhengjie@s101 ~]$ hive -hiveconf mapred.reduce.tasks=
  10. SLF4J: Class path contains multiple SLF4J bindings.
  11. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.-bin/lib/log4j-slf4j-impl-2.4..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  12. SLF4J: Found binding in [jar:file:/soft/hbase-1.2./lib/phoenix-4.10.-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  13. SLF4J: Found binding in [jar:file:/soft/hadoop-2.7./share/hadoop/common/lib/slf4j-log4j12-1.7..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  14. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  15.  
  16. Logging initialized using configuration in file:/soft/apache-hive-2.1.-bin/conf/hive-log4j2.properties Async: true
  17. Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  18. hive (default)> set mapred.reduce.tasks;
  19. mapred.reduce.tasks=
  20. hive (default)> quit;
  21. [yinzhengjie@s101 ~]$
  22. [yinzhengjie@s101 ~]$ hive
  23. SLF4J: Class path contains multiple SLF4J bindings.
  24. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.-bin/lib/log4j-slf4j-impl-2.4..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  25. SLF4J: Found binding in [jar:file:/soft/hbase-1.2./lib/phoenix-4.10.-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  26. SLF4J: Found binding in [jar:file:/soft/hadoop-2.7./share/hadoop/common/lib/slf4j-log4j12-1.7..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  27. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  28.  
  29. Logging initialized using configuration in file:/soft/apache-hive-2.1.-bin/conf/hive-log4j2.properties Async: true
  30. Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  31. hive (default)> set mapred.reduce.tasks;
  32. mapred.reduce.tasks=-
  33. hive (default)> exit;
  34. [yinzhengjie@s101 ~]$
  35.  
  36. 启动命令行后参数声明方式:
  37. [yinzhengjie@s101 ~]$ hive
  38. SLF4J: Class path contains multiple SLF4J bindings.
  39. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.-bin/lib/log4j-slf4j-impl-2.4..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  40. SLF4J: Found binding in [jar:file:/soft/hbase-1.2./lib/phoenix-4.10.-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  41. SLF4J: Found binding in [jar:file:/soft/hadoop-2.7./share/hadoop/common/lib/slf4j-log4j12-1.7..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  42. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  43.  
  44. Logging initialized using configuration in file:/soft/apache-hive-2.1.-bin/conf/hive-log4j2.properties Async: true
  45. Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  46. hive (default)> set mapred.reduce.tasks;
  47. mapred.reduce.tasks=-
  48. hive (default)> set mapred.reduce.tasks=;
  49. hive (default)> set mapred.reduce.tasks;
  50. mapred.reduce.tasks=
  51. hive (default)> quit;
  52. [yinzhengjie@s101 ~]$
  53.  
  54. 三种方式优先级温馨提示:
  55. 以上三种设定方式的优先级依次递增。即"配置文件"<"启动命令行时"<"启动命令行后"。注意某些系统级的参数,例如log4j相关的设定,必须用前两种方式设定,因为那些参数的读取在会话建立以前已经完成了。

二.Hive数据类型

1>.基本数据类型

  对于Hive的String类型相当于数据库的varchar类型,该类型是一个可变的字符串,不过它不能声明其中最多能存储多少个字符,理论上它可以存储2GB的字符数。

Hive数据类型

Java数据类型

长度

例子

TINYINT

byte

1byte有符号整数

20

SMALINT

short

2byte有符号整数

20

INT

int

4byte有符号整数

20

BIGINT

long

8byte有符号整数

20

BOOLEAN

boolean

布尔类型,true或者false

TRUE  FALSE

FLOAT

float

单精度浮点数

3.14159

DOUBLE

double

双精度浮点数

3.14159

STRING

string

字符系列。可以指定字符集。可以使用单引号或者双引号。

‘now is the time’ “for all good men”

TIMESTAMP

时间类型

BINARY

字节数组

2>.集合数据类型

  Hive有三种复杂数据类型ARRAY、MAP 和 STRUCT。ARRAY和MAP与Java中的Array和Map类似,而STRUCT与C语言中的Struct类似,它封装了一个命名字段集合,复杂数据类型允许任意层次的嵌套。

数据类型

描述

语法示例

STRUCT

和c语言中的struct类似,都可以通过“点”符号访问元素内容。例如,如果某个列的数据类型是STRUCT{first STRING, last STRING},那么第1个元素可以通过字段.first来引用。

struct()

MAP

MAP是一组键-值对元组集合,使用数组表示法可以访问数据。例如,如果某个列的数据类型是MAP,其中键->值对是’first’->’John’和’last’->’Doe’,那么可以通过字段名[‘last’]获取最后一个元素

map()

ARRAY

数组是一组具有相同类型和名称的变量的集合。这些变量称为数组的元素,每个数组元素都有一个编号,编号从零开始。例如,数组值为[‘John’, ‘Doe’],那么第2个元素可以通过数组名[1]进行引用。

Array()

3>类型转化

  Hive的原子数据类型是可以进行隐式转换的,类似于Java的类型转换,例如某表达式使用INT类型,TINYINT会自动转换为INT类型,但是Hive不会进行反向转化,例如,某表达式使用TINYINT类型,INT不会自动转换为TINYINT类型,它会返回错误,除非使用CAST操作。隐式类型转换规则如下。

    第一:任何整数类型都可以隐式地转换为一个范围更广的类型,如TINYINT可以转换成INT,INT可以转换成BIGINT。

    第二:所有整数类型、FLOAT和STRING类型都可以隐式地转换成DOUBLE。

    第三:TINYINT、SMALLINT、INT都可以转换为FLOAT。

    第四:BOOLEAN类型不可以转换为任何其它的类型。

  温馨提示:可以使用CAST操作显示进行数据类型转换,例如CAST('1' AS INT)将把字符串'1' 转换成整数1;如果强制类型转换失败,如执行CAST('X' AS INT),表达式返回空值 NULL。

4>.小试牛刀

  假设某表有如下一行,我们用JSON格式来表示其数据结构。在Hive下访问的格式为:

  基于上述数据结构,我们在Hive里创建对应的表,并导入数据。创建本地测试文件test.txt内容如下:(注意,MAP,STRUCT和ARRAY里的元素间关系都可以用同一个字符表示,这里用“_”。)

  1. [yinzhengjie@s101 download]$ cat /home/yinzhengjie/download/test.txt
  2. 漩涡鸣人,我爱罗_佐助,漩涡博人:18_漩涡向日葵:,一乐拉面附近_木业忍者村
  3. 宇智波富岳,宇智波美琴_志村团藏,宇智波鼬:28_宇智波佐助:,木叶警务部_木业忍者村
  4. [yinzhengjie@s101 download]$

  Hive上创建测试表test,如下:

  1. create table test(
  2. name string,
  3. friends array<string>,
  4. children map<string, int>,
  5. address struct<street:string, city:string>
  6. )
  7. row format delimited fields terminated by ','
  8. collection items terminated by '_'
  9. map keys terminated by ':'
  10. lines terminated by '\n';

  导入文本数据到测试表:

  1. hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/test.txt' into table test;
  2. Loading data to table yinzhengjie.test
  3. OK
  4. Time taken: 0.335 seconds
  5. hive (yinzhengjie)> select * from test;
  6. OK
  7. test.name test.friends test.children test.address
  8. 漩涡鸣人 ["我爱罗","佐助"] {"漩涡博人":,"漩涡向日葵":} {"street":"一乐拉面附近","city":"木业忍者村"}
  9. 宇智波富岳 ["宇智波美琴","志村团藏"] {"宇智波鼬":,"宇智波佐助":} {"street":"木叶警务部","city":"木业忍者村"}
  10. Time taken: 0.099 seconds, Fetched: row(s)
  11. hive (yinzhengjie)>

  访问三种集合列里的数据,以下分别是ARRAY,MAP,STRUCT的访问方式:

  1. hive (yinzhengjie)> select * from test;
  2. OK
  3. test.name test.friends test.children test.address
  4. 漩涡鸣人 ["我爱罗","佐助"] {"漩涡博人":,"漩涡向日葵":} {"street":"一乐拉面附近","city":"木业忍者村"}
  5. 宇智波富岳 ["宇智波美琴","志村团藏"] {"宇智波鼬":,"宇智波佐助":} {"street":"木叶警务部","city":"木业忍者村"}
  6. Time taken: 0.085 seconds, Fetched: row(s)
  7. hive (yinzhengjie)> select friends[],children['漩涡博人'],address.city from test where name="漩涡鸣人";
  8. OK
  9. _c0 _c1 city
  10. 我爱罗 木业忍者村
  11. Time taken: 0.096 seconds, Fetched: row(s)
  12. hive (yinzhengjie)> select friends[],children['漩涡向日葵'],address.city from test where name="漩涡鸣人";
  13. OK
  14. _c0 _c1 city
  15. 佐助 木业忍者村
  16. Time taken: 0.1 seconds, Fetched: row(s)
  17. hive (yinzhengjie)>

三.Hive的常用命令(HQL)用法展示

  温馨提示:在使用Hive交互命令或是执行HQL语句时都会启动Hive,而hive依赖于Hadoop的hdfs提供存储和MapReduce提供计算,因此在启动Hive之前,需要启动Hadoop集群哟。

  1. [yinzhengjie@s101 ~]$ more `which xcall.sh`
  2. #!/bin/bash
  3. #@author :yinzhengjie
  4. #blog:http://www.cnblogs.com/yinzhengjie
  5. #EMAIL:y1053419035@qq.com
  6.  
  7. #判断用户是否传参
  8. if [ $# -lt ];then
  9. echo "请输入参数"
  10. exit
  11. fi
  12.  
  13. #获取用户输入的命令
  14. cmd=$@
  15.  
  16. for (( i=;i<=;i++ ))
  17. do
  18. #使终端变绿色
  19. tput setaf
  20. echo ============= s$i $cmd ============
  21. #使终端变回原来的颜色,即白灰色
  22. tput setaf
  23. #远程执行命令
  24. ssh s$i $cmd
  25. #判断命令是否执行成功
  26. if [ $? == ];then
  27. echo "命令执行成功"
  28. fi
  29. done
  30. [yinzhengjie@s101 ~]$

查看集群的命令脚本([yinzhengjie@s101 ~]$ more `which xcall.sh`)

  1. [yinzhengjie@s101 ~]$ more `which start-dfs.sh` | grep -v ^# | grep -v ^$
  2. usage="Usage: start-dfs.sh [-upgrade|-rollback] [other options such as -clusterId]"
  3. bin=`dirname "${BASH_SOURCE-$0}"`
  4. bin=`cd "$bin"; pwd`
  5. DEFAULT_LIBEXEC_DIR="$bin"/../libexec
  6. HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
  7. . $HADOOP_LIBEXEC_DIR/hdfs-config.sh
  8. if [[ $# -ge ]]; then
  9. startOpt="$1"
  10. shift
  11. case "$startOpt" in
  12. -upgrade)
  13. nameStartOpt="$startOpt"
  14. ;;
  15. -rollback)
  16. dataStartOpt="$startOpt"
  17. ;;
  18. *)
  19. echo $usage
  20. exit
  21. ;;
  22. esac
  23. fi
  24. nameStartOpt="$nameStartOpt $@"
  25. NAMENODES=$($HADOOP_PREFIX/bin/hdfs getconf -namenodes)
  26. echo "Starting namenodes on [$NAMENODES]"
  27. "$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \
  28. --config "$HADOOP_CONF_DIR" \
  29. --hostnames "$NAMENODES" \
  30. --script "$bin/hdfs" start namenode $nameStartOpt
  31. if [ -n "$HADOOP_SECURE_DN_USER" ]; then
  32. echo \
  33. "Attempting to start secure cluster, skipping datanodes. " \
  34. "Run start-secure-dns.sh as root to complete startup."
  35. else
  36. "$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \
  37. --config "$HADOOP_CONF_DIR" \
  38. --script "$bin/hdfs" start datanode $dataStartOpt
  39. fi
  40. SECONDARY_NAMENODES=$($HADOOP_PREFIX/bin/hdfs getconf -secondarynamenodes >/dev/null)
  41. if [ -n "$SECONDARY_NAMENODES" ]; then
  42. echo "Starting secondary namenodes [$SECONDARY_NAMENODES]"
  43. "$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \
  44. --config "$HADOOP_CONF_DIR" \
  45. --hostnames "$SECONDARY_NAMENODES" \
  46. --script "$bin/hdfs" start secondarynamenode
  47. fi
  48. SHARED_EDITS_DIR=$($HADOOP_PREFIX/bin/hdfs getconf -confKey dfs.namenode.shared.edits.dir >&-)
  49. case "$SHARED_EDITS_DIR" in
  50. qjournal://*)
  51. JOURNAL_NODES=$(echo "$SHARED_EDITS_DIR" | sed 's,qjournal://\([^/]*\)/.*,\1,g; s/;/ /g; s/:[0-9]*//g')
  52. echo "Starting journal nodes [$JOURNAL_NODES]"
  53. "$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \
  54. --config "$HADOOP_CONF_DIR" \
  55. --hostnames "$JOURNAL_NODES" \
  56. --script "$bin/hdfs" start journalnode ;;
  57. esac
  58. AUTOHA_ENABLED=$($HADOOP_PREFIX/bin/hdfs getconf -confKey dfs.ha.automatic-failover.enabled)
  59. if [ "$(echo "$AUTOHA_ENABLED" | tr A-Z a-z)" = "true" ]; then
  60. echo "Starting ZK Failover Controllers on NN hosts [$NAMENODES]"
  61. "$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \
  62. --config "$HADOOP_CONF_DIR" \
  63. --hostnames "$NAMENODES" \
  64. --script "$bin/hdfs" start zkfc
  65. fi
  66. [yinzhengjie@s101 ~]$

HDFS分布式文件系统启动脚本([yinzhengjie@s101 ~]$ more `which start-dfs.sh` | grep -v ^# | grep -v ^$)

  1. [yinzhengjie@s101 ~]$ cat /soft/hadoop/sbin/start-yarn.sh | grep -v ^# | grep -v ^$
  2. echo "starting yarn daemons"
  3. bin=`dirname "${BASH_SOURCE-$0}"`
  4. bin=`cd "$bin"; pwd`
  5. DEFAULT_LIBEXEC_DIR="$bin"/../libexec
  6. HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
  7. . $HADOOP_LIBEXEC_DIR/yarn-config.sh
  8. "$bin"/yarn-daemon.sh --config $YARN_CONF_DIR start resourcemanager
  9. "$bin"/yarn-daemons.sh --config $YARN_CONF_DIR start nodemanager
  10. [yinzhengjie@s101 ~]$

Yarn启动脚本([yinzhengjie@s101 ~]$ cat /soft/hadoop/sbin/start-yarn.sh | grep -v ^# | grep -v ^$)

  1. [yinzhengjie@s101 ~]$ more `which xzk.sh`
  2. #!/bin/bash
  3. #@author :yinzhengjie
  4. #blog:http://www.cnblogs.com/yinzhengjie
  5. #EMAIL:y1053419035@qq.com
  6.  
  7. #判断用户是否传参
  8. if [ $# -ne ];then
  9. echo "无效参数,用法为: $0 {start|stop|restart|status}"
  10. exit
  11. fi
  12.  
  13. #获取用户输入的命令
  14. cmd=$
  15.  
  16. #定义函数功能
  17. function zookeeperManger(){
  18. case $cmd in
  19. start)
  20. echo "启动服务"
  21. remoteExecution start
  22. ;;
  23. stop)
  24. echo "停止服务"
  25. remoteExecution stop
  26. ;;
  27. restart)
  28. echo "重启服务"
  29. remoteExecution restart
  30. ;;
  31. status)
  32. echo "查看状态"
  33. remoteExecution status
  34. ;;
  35. *)
  36. echo "无效参数,用法为: $0 {start|stop|restart|status}"
  37. ;;
  38. esac
  39. }
  40.  
  41. #定义执行的命令
  42. function remoteExecution(){
  43. for (( i= ; i<= ; i++ )) ; do
  44. tput setaf
  45. echo ========== s$i zkServer.sh $ ================
  46. tput setaf
  47. ssh s$i "source /etc/profile ; zkServer.sh $1"
  48. done
  49. }
  50.  
  51. #调用函数
  52. zookeeperManger
  53. [yinzhengjie@s101 ~]$

zookeeper启动脚本([yinzhengjie@s101 ~]$ more `which xzk.sh`)

  1. [yinzhengjie@s101 ~]$ xzk.sh start
  2. 启动服务
  3. ========== s102 zkServer.sh start ================
  4. ZooKeeper JMX enabled by default
  5. Using config: /soft/zk/bin/../conf/zoo.cfg
  6. Starting zookeeper ... STARTED
  7. ========== s103 zkServer.sh start ================
  8. ZooKeeper JMX enabled by default
  9. Starting zookeeper ... Using config: /soft/zk/bin/../conf/zoo.cfg
  10. STARTED
  11. ========== s104 zkServer.sh start ================
  12. ZooKeeper JMX enabled by default
  13. Using config: /soft/zk/bin/../conf/zoo.cfg
  14. Starting zookeeper ... STARTED
  15. [yinzhengjie@s101 ~]$
  16. [yinzhengjie@s101 ~]$ xcall.sh jps
  17. ============= s101 jps ============
  18. Jps
  19. 命令执行成功
  20. ============= s102 jps ============
  21. QuorumPeerMain
  22. Jps
  23. 命令执行成功
  24. ============= s103 jps ============
  25. QuorumPeerMain
  26. Jps
  27. 命令执行成功
  28. ============= s104 jps ============
  29. Jps
  30. QuorumPeerMain
  31. 命令执行成功
  32. ============= s105 jps ============
  33. Jps
  34. 命令执行成功
  35. [yinzhengjie@s101 ~]$

启动zookeeper([yinzhengjie@s101 ~]$ xzk.sh start)

  1. [yinzhengjie@s101 ~]$ start-dfs.sh
  2. SLF4J: Class path contains multiple SLF4J bindings.
  3. SLF4J: Found binding in [jar:file:/soft/hadoop-2.7./share/hadoop/common/lib/slf4j-log4j12-1.7..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  4. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.-bin/lib/log4j-slf4j-impl-2.4..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  5. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  6. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
  7. Starting namenodes on [s101 s105]
  8. s101: starting namenode, logging to /soft/hadoop-2.7./logs/hadoop-yinzhengjie-namenode-s101.out
  9. s105: starting namenode, logging to /soft/hadoop-2.7./logs/hadoop-yinzhengjie-namenode-s105.out
  10. s103: starting datanode, logging to /soft/hadoop-2.7./logs/hadoop-yinzhengjie-datanode-s103.out
  11. s102: starting datanode, logging to /soft/hadoop-2.7./logs/hadoop-yinzhengjie-datanode-s102.out
  12. s104: starting datanode, logging to /soft/hadoop-2.7./logs/hadoop-yinzhengjie-datanode-s104.out
  13. Starting journal nodes [s102 s103 s104]
  14. s102: starting journalnode, logging to /soft/hadoop-2.7./logs/hadoop-yinzhengjie-journalnode-s102.out
  15. s103: starting journalnode, logging to /soft/hadoop-2.7./logs/hadoop-yinzhengjie-journalnode-s103.out
  16. s104: starting journalnode, logging to /soft/hadoop-2.7./logs/hadoop-yinzhengjie-journalnode-s104.out
  17. SLF4J: Class path contains multiple SLF4J bindings.
  18. SLF4J: Found binding in [jar:file:/soft/hadoop-2.7./share/hadoop/common/lib/slf4j-log4j12-1.7..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  19. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.-bin/lib/log4j-slf4j-impl-2.4..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  20. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  21. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
  22. Starting ZK Failover Controllers on NN hosts [s101 s105]
  23. s101: starting zkfc, logging to /soft/hadoop-2.7./logs/hadoop-yinzhengjie-zkfc-s101.out
  24. s105: starting zkfc, logging to /soft/hadoop-2.7./logs/hadoop-yinzhengjie-zkfc-s105.out
  25. [yinzhengjie@s101 ~]$
  26. [yinzhengjie@s101 ~]$
  27. [yinzhengjie@s101 ~]$ xcall.sh jps
  28. ============= s101 jps ============
  29. Jps
  30. NameNode
  31. DFSZKFailoverController
  32. 命令执行成功
  33. ============= s102 jps ============
  34. JournalNode
  35. QuorumPeerMain
  36. DataNode
  37. Jps
  38. 命令执行成功
  39. ============= s103 jps ============
  40. Jps
  41. DataNode
  42. JournalNode
  43. QuorumPeerMain
  44. 命令执行成功
  45. ============= s104 jps ============
  46. Jps
  47. DataNode
  48. QuorumPeerMain
  49. JournalNode
  50. 命令执行成功
  51. ============= s105 jps ============
  52. DFSZKFailoverController
  53. NameNode
  54. Jps
  55. 命令执行成功
  56. [yinzhengjie@s101 ~]$

启动HDFS分布式文件系统([yinzhengjie@s101 ~]$ start-dfs.sh )

  1. [yinzhengjie@s101 ~]$ start-yarn.sh
  2. starting yarn daemons
  3. s101: starting resourcemanager, logging to /soft/hadoop-2.7./logs/yarn-yinzhengjie-resourcemanager-s101.out
  4. s105: starting resourcemanager, logging to /soft/hadoop-2.7./logs/yarn-yinzhengjie-resourcemanager-s105.out
  5. s103: starting nodemanager, logging to /soft/hadoop-2.7./logs/yarn-yinzhengjie-nodemanager-s103.out
  6. s102: starting nodemanager, logging to /soft/hadoop-2.7./logs/yarn-yinzhengjie-nodemanager-s102.out
  7. s104: starting nodemanager, logging to /soft/hadoop-2.7./logs/yarn-yinzhengjie-nodemanager-s104.out
  8. [yinzhengjie@s101 ~]$
  9. [yinzhengjie@s101 ~]$
  10. [yinzhengjie@s101 ~]$ xcall.sh jps
  11. ============= s101 jps ============
  12. ResourceManager
  13. Jps
  14. NameNode
  15. DFSZKFailoverController
  16. 命令执行成功
  17. ============= s102 jps ============
  18. JournalNode
  19. QuorumPeerMain
  20. NodeManager
  21. Jps
  22. DataNode
  23. 命令执行成功
  24. ============= s103 jps ============
  25. DataNode
  26. JournalNode
  27. NodeManager
  28. Jps
  29. QuorumPeerMain
  30. 命令执行成功
  31. ============= s104 jps ============
  32. NodeManager
  33. Jps
  34. DataNode
  35. QuorumPeerMain
  36. JournalNode
  37. 命令执行成功
  38. ============= s105 jps ============
  39. DFSZKFailoverController
  40. NameNode
  41. Jps
  42. 命令执行成功
  43. [yinzhengjie@s101 ~]$

启动yarn资源调度([yinzhengjie@s101 ~]$ start-yarn.sh )

1>.hive交互命令

  1. [yinzhengjie@s101 download]$ cat teachers.txt
  2. Dennis MacAlistair Ritchie
  3. Linus Benedict Torvalds
  4. Bjarne Stroustrup
  5. Guido van Rossum
  6. James Gosling
  7. Martin Odersky
  8. Rob Pike
  9. Rasmus Lerdorf
  10. Brendan Eich
  11. [yinzhengjie@s101 download]$

[yinzhengjie@s101 download]$ cat teachers.txt

  1. [yinzhengjie@s101 ~]$ hive -help
  2. SLF4J: Class path contains multiple SLF4J bindings.
  3. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.-bin/lib/log4j-slf4j-impl-2.4..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  4. SLF4J: Found binding in [jar:file:/soft/hbase-1.2./lib/phoenix-4.10.-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  5. SLF4J: Found binding in [jar:file:/soft/hadoop-2.7./share/hadoop/common/lib/slf4j-log4j12-1.7..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  6. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  7. usage: hive
  8. -d,--define <key=value> Variable subsitution to apply to hive
  9. commands. e.g. -d A=B or --define A=B
  10. --database <databasename> Specify the database to use
  11. -e <quoted-query-string> SQL from command line
  12. -f <filename> SQL from files
  13. -H,--help Print help information
  14. --hiveconf <property=value> Use value for given property
  15. --hivevar <key=value> Variable subsitution to apply to hive
  16. commands. e.g. --hivevar A=B
  17. -i <filename> Initialization SQL file
  18. -S,--silent Silent mode in interactive shell
  19. -v,--verbose Verbose mode (echo executed SQL to the
  20. console)
  21. [yinzhengjie@s101 ~]$

查看帮助信息([yinzhengjie@s101 ~]$ hive -help)

  1. [yinzhengjie@s101 ~]$ hive
  2. SLF4J: Class path contains multiple SLF4J bindings.
  3. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.-bin/lib/log4j-slf4j-impl-2.4..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  4. SLF4J: Found binding in [jar:file:/soft/hbase-1.2./lib/phoenix-4.10.-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  5. SLF4J: Found binding in [jar:file:/soft/hadoop-2.7./share/hadoop/common/lib/slf4j-log4j12-1.7..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  6. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  7.  
  8. Logging initialized using configuration in jar:file:/soft/apache-hive-2.1.-bin/lib/hive-common-2.1..jar!/hive-log4j2.properties Async: true
  9. default
  10. yinzhengjie
  11. Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  12. hive>

登录hive的shell命令行交互界面([yinzhengjie@s101 ~]$ hive)

  1. [yinzhengjie@s101 ~]$ hive
  2. SLF4J: Class path contains multiple SLF4J bindings.
  3. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.-bin/lib/log4j-slf4j-impl-2.4..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  4. SLF4J: Found binding in [jar:file:/soft/hbase-1.2./lib/phoenix-4.10.-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  5. SLF4J: Found binding in [jar:file:/soft/hadoop-2.7./share/hadoop/common/lib/slf4j-log4j12-1.7..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  6. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  7.  
  8. Logging initialized using configuration in jar:file:/soft/apache-hive-2.1.-bin/lib/hive-common-2.1..jar!/hive-log4j2.properties Async: true
  9. default
  10. yinzhengjie
  11. Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  12. hive> show databases;
  13. OK
  14. default
  15. yinzhengjie
  16. Time taken: 0.01 seconds, Fetched: row(s)
  17. hive>

查看已经存在的库名(hive> show databases;)

  1. [yinzhengjie@s101 ~]$ hive
  2. SLF4J: Class path contains multiple SLF4J bindings.
  3. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.-bin/lib/log4j-slf4j-impl-2.4..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  4. SLF4J: Found binding in [jar:file:/soft/hbase-1.2./lib/phoenix-4.10.-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  5. SLF4J: Found binding in [jar:file:/soft/hadoop-2.7./share/hadoop/common/lib/slf4j-log4j12-1.7..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  6. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  7.  
  8. Logging initialized using configuration in jar:file:/soft/apache-hive-2.1.-bin/lib/hive-common-2.1..jar!/hive-log4j2.properties Async: true
  9. default
  10. yinzhengjie
  11. Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  12. hive> show databases;
  13. OK
  14. default
  15. yinzhengjie
  16. Time taken: 0.008 seconds, Fetched: row(s)
  17. hive> use yinzhengjie;
  18. OK
  19. Time taken: 0.018 seconds
  20. hive>

使用已经存在的数据库(hive> use yinzhengjie;)

  1. hive> show databases;
  2. OK
  3. default
  4. yinzhengjie
  5. Time taken: 0.008 seconds, Fetched: row(s)
  6. hive> use yinzhengjie;
  7. OK
  8. Time taken: 0.018 seconds
  9. hive> show tables;
  10. OK
  11. az_top3
  12. az_wc
  13. test1
  14. test2
  15. test3
  16. test4
  17. yzj
  18. Time taken: 0.025 seconds, Fetched: row(s)
  19. hive>

查看当前库已经存在的表(hive> show tables;)

  1. hive> show databases;
  2. OK
  3. default
  4. yinzhengjie
  5. Time taken: 0.008 seconds, Fetched: row(s)
  6. hive> use yinzhengjie;
  7. OK
  8. Time taken: 0.018 seconds
  9. hive> show tables;
  10. OK
  11. az_top3
  12. az_wc
  13. test1
  14. test2
  15. test3
  16. test4
  17. yzj
  18. Time taken: 0.025 seconds, Fetched: row(s)
  19. hive> create table Teacher(id int,name string)row format delimited fields terminated by '\t';
  20. OK
  21. Time taken: 0.626 seconds
  22. hive> show tables;
  23. OK
  24. az_top3
  25. az_wc
  26. teacher
  27. test1
  28. test2
  29. test3
  30. test4
  31. yzj
  32. Time taken: 0.028 seconds, Fetched: row(s)
  33. hive>

创建一个teacher表(hive> create table Teacher(id int,name string)row format delimited fields terminated by '\t';)

  1. hive> show tables;
  2. OK
  3. teacher
  4. yzj
  5. Time taken: 0.022 seconds, Fetched: row(s)
  6. hive> select * from teacher;
  7. OK
  8. Time taken: 0.105 seconds
  9. hive> load data local inpath '/home/yinzhengjie/download/teachers.txt' into table yinzhengjie.teacher;
  10. Loading data to table yinzhengjie.teacher
  11. OK
  12. Time taken: 0.256 seconds
  13. hive> select * from teacher;
  14. OK
  15. Dennis MacAlistair Ritchie
  16. Linus Benedict Torvalds
  17. Bjarne Stroustrup
  18. Guido van Rossum
  19. James Gosling
  20. Martin Odersky
  21. Rob Pike
  22. Rasmus Lerdorf
  23. Brendan Eich
  24. Time taken: 0.104 seconds, Fetched: row(s)
  25. hive>

从本地加载数据到hive中已经存在的表(hive> load data local inpath '/home/yinzhengjie/download/teachers.txt' into table yinzhengjie.teacher;)

  1. hive (yinzhengjie)> load data inpath '/home/yinzhengjie/data/logs/umeng/raw-log/201808/06/2346' into table raw_logs partition(ym=201808 , day=06 ,hm=2346);
  2. Loading data to table yinzhengjie.raw_logs partition (ym=201808, day=6, hm=2346)
  3. OK
  4. Time taken: 1.846 seconds
  5. hive (yinzhengjie)>

从hdfs上加载数据到hive中已经存在的表(hive (yinzhengjie)> load data inpath '/home/yinzhengjie/data/logs/umeng/raw-log/201808/06/2346' into table raw_logs partition(ym=201808 , day=06 ,hm=2346);)

  1. [yinzhengjie@s101 download]$ cat /home/yinzhengjie/download/umeng_create_logs_ddl.sql
  2. use yinzhengjie ;
  3.  
  4. --startuplogs
  5. create table if not exists startuplogs
  6. (
  7. appChannel string ,
  8. appId string ,
  9. appPlatform string ,
  10. appVersion string ,
  11. brand string ,
  12. carrier string ,
  13. country string ,
  14. createdAtMs bigint ,
  15. deviceId string ,
  16. deviceStyle string ,
  17. ipAddress string ,
  18. network string ,
  19. osType string ,
  20. province string ,
  21. screenSize string ,
  22. tenantId string
  23. )
  24. partitioned by (ym int ,day int , hm int)
  25. stored as parquet ;
  26.  
  27. --eventlogs
  28. create table if not exists eventlogs
  29. (
  30. appChannel string ,
  31. appId string ,
  32. appPlatform string ,
  33. appVersion string ,
  34. createdAtMs bigint ,
  35. deviceId string ,
  36. deviceStyle string ,
  37. eventDurationSecs bigint ,
  38. eventId string ,
  39. osType string ,
  40. tenantId string
  41. )
  42. partitioned by (ym int ,day int , hm int)
  43. stored as parquet ;
  44.  
  45. --errorlogs
  46. create table if not exists errorlogs
  47. (
  48. appChannel string ,
  49. appId string ,
  50. appPlatform string ,
  51. appVersion string ,
  52. createdAtMs bigint ,
  53. deviceId string ,
  54. deviceStyle string ,
  55. errorBrief string ,
  56. errorDetail string ,
  57. osType string ,
  58. tenantId string
  59. )
  60. partitioned by (ym int ,day int , hm int)
  61. stored as parquet ;
  62.  
  63. --usagelogs
  64. create table if not exists usagelogs
  65. (
  66. appChannel string ,
  67. appId string ,
  68. appPlatform string ,
  69. appVersion string ,
  70. createdAtMs bigint ,
  71. deviceId string ,
  72. deviceStyle string ,
  73. osType string ,
  74. singleDownloadTraffic bigint ,
  75. singleUploadTraffic bigint ,
  76. singleUseDurationSecs bigint ,
  77. tenantId string
  78. )
  79. partitioned by (ym int ,day int , hm int)
  80. stored as parquet ;
  81.  
  82. --pagelogs
  83. create table if not exists pagelogs
  84. (
  85. appChannel string ,
  86. appId string ,
  87. appPlatform string ,
  88. appVersion string ,
  89. createdAtMs bigint ,
  90. deviceId string ,
  91. deviceStyle string ,
  92. nextPage string ,
  93. osType string ,
  94. pageId string ,
  95. pageViewCntInSession int ,
  96. stayDurationSecs bigint ,
  97. tenantId string ,
  98. visitIndex int
  99. )
  100. partitioned by (ym int ,day int , hm int)
  101. stored as parquet ;
  102. [yinzhengjie@s101 download]$

HQL测试语句([yinzhengjie@s101 download]$ cat /home/yinzhengjie/download/umeng_create_logs_ddl.sql)

  1. hive (yinzhengjie)> show tables;
  2. OK
  3. tab_name
  4. myusers
  5. raw_logs
  6. student
  7. teacher
  8. teacherbak
  9. teachercopy
  10. Time taken: 0.044 seconds, Fetched: 6 row(s)
  11. hive (yinzhengjie)>
  12. hive (yinzhengjie)> source /home/yinzhengjie/download/umeng_create_logs_ddl.sql;
  13. OK
  14. Time taken: 0.008 seconds
  15. OK
  16. Time taken: 0.257 seconds
  17. OK
  18. Time taken: 0.058 seconds
  19. OK
  20. Time taken: 0.073 seconds
  21. OK
  22. Time taken: 0.065 seconds
  23. OK
  24. Time taken: 0.053 seconds
  25. hive (yinzhengjie)> show tables;
  26. OK
  27. tab_name
  28. errorlogs
  29. eventlogs
  30. myusers
  31. pagelogs
  32. raw_logs
  33. startuplogs
  34. student
  35. teacher
  36. teacherbak
  37. teachercopy
  38. usagelogs
  39. Time taken: 0.014 seconds, Fetched: 11 row(s)
  40. hive (yinzhengjie)>

在hive中执行HQL语句文本文件(hive (yinzhengjie)> source /home/yinzhengjie/download/umeng_create_logs_ddl.sql;)

  1. [yinzhengjie@s101 ~]$ hive
  2. SLF4J: Class path contains multiple SLF4J bindings.
  3. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.-bin/lib/log4j-slf4j-impl-2.4..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  4. SLF4J: Found binding in [jar:file:/soft/hbase-1.2./lib/phoenix-4.10.-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  5. SLF4J: Found binding in [jar:file:/soft/hadoop-2.7./share/hadoop/common/lib/slf4j-log4j12-1.7..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  6. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  7.  
  8. Logging initialized using configuration in jar:file:/soft/apache-hive-2.1.-bin/lib/hive-common-2.1..jar!/hive-log4j2.properties Async: true
  9. Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  10. hive> dfs -cat /user/hive/warehouse/yinzhengjie.db/teacher/teachers.txt;
  11. Dennis MacAlistair Ritchie
  12. Linus Benedict Torvalds
  13. Bjarne Stroustrup
  14. Guido van Rossum
  15. James Gosling
  16. Martin Odersky
  17. Rob Pike
  18. Rasmus Lerdorf
  19. Brendan Eich
  20. hive>

在hive的命令行窗口中查看hdfs文件系统中的文件内容(hive> dfs -cat /user/hive/warehouse/yinzhengjie.db/teacher/teachers.txt;)

  1. [yinzhengjie@s101 ~]$ hive
  2. SLF4J: Class path contains multiple SLF4J bindings.
  3. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.-bin/lib/log4j-slf4j-impl-2.4..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  4. SLF4J: Found binding in [jar:file:/soft/hbase-1.2./lib/phoenix-4.10.-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  5. SLF4J: Found binding in [jar:file:/soft/hadoop-2.7./share/hadoop/common/lib/slf4j-log4j12-1.7..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  6. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  7.  
  8. Logging initialized using configuration in jar:file:/soft/apache-hive-2.1.-bin/lib/hive-common-2.1..jar!/hive-log4j2.properties Async: true
  9. Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  10. hive> ! ls /home/yinzhengjie/download;
  11.  
  12. derby.log
  13. hivef.sql
  14. metastore_db
  15. MySpark.jar
  16. spark-2.1.-bin-hadoop2..tgz
  17. teachers.txt
  18. temp
  19. hive>

在hive命令行窗口查看Linux文件系统中的文件内容(hive> ! ls /home/yinzhengjie/download;)

  1. [yinzhengjie@s101 download]$ cat ~/.hivehistory
  2. show databases;
  3. quit;
  4. show databases;
  5. quit
  6. ;
  7. create table(id int,name string) row format delimited
  8. fields terminated by '\t'
  9. lines terminated by '\n'
  10. stored as textfile;
  11. create table users(id int , name string) row format delimited
  12. fields terminated by '\t'
  13. lines terminated by '\n'
  14. stored as textfile;
  15. load data local inpath 'user.txt' into table users;
  16. !pwd
  17. ;
  18. !cd /home/yinzhengjie
  19. ;
  20. !pwd
  21. ;
  22. quit;
  23. load data local inpath 'user.txt' into table users;
  24. load data inpath 'user.txt' into table users;
  25. hdfs dfs -put 'user.txt';
  26. hdfs dfs put 'user.txt';
  27. dfs put 'user.txt';
  28. dfs -put 'user.txt';
  29. dfs -put 'user.txt' /;
  30. dfs -put user.txt ;
  31. dfs -put user.txt /;
  32. load data inpath 'user.txt' into table users;
  33. load data inpath '/user.txt' into table users;
  34. ;;
  35. ;
  36. ;;
  37. ipconfig
  38. ;
  39. quit
  40. quit;
  41. exit
  42. exit;
  43. show databases;
  44. use yinzhengjie
  45. ;
  46. show tables;
  47. SET hive.support.concurrency = true;
  48. show tables;
  49. use yinzhengjie;
  50. show tables;
  51. select * from yzj;
  52. SET hive.support.concurrency = true;
  53. SET hive.enforce.bucketing = true;
  54. SET hive.exec.dynamic.partition.mode = nonstrict;
  55. SET hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
  56. SET hive.compactor.initiator.on = true;
  57. SET hive.compactor.worker.threads = ;
  58. select * from yzj;
  59. use yinzhengjie;
  60. SET hive.support.concurrency = true;
  61. SET hive.enforce.bucketing = true;
  62. SET hive.exec.dynamic.partition.mode = nonstrict;
  63. SET hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
  64. SET hive.compactor.initiator.on = true;
  65. SET hive.compactor.worker.threads = ;
  66. show tables;
  67. select * from yzj;
  68. show databases;
  69. use yinzhengjie;
  70. show tables;
  71. hive
  72. show databases;
  73. use yinzhengjie;
  74. show tables;
  75. select * from az_top3;
  76. quit;
  77. show databases;
  78. use yinzhengjie
  79. ;
  80. show tables;
  81. use yinzhengjie;
  82. show databases;
  83. use yinzhengjie;
  84. show tables;
  85. create table Teacher(id int,name string)row format delimited fields terminated by '\t';
  86. show tables;
  87. load data local inpath '/home/yinzhengjie/download/teachers.txt'
  88. ;
  89. show tables;
  90. drop table taacher;
  91. show databases;
  92. use yinzhengjie;
  93. show tables;
  94. drop table teacher;
  95. show tables;
  96. ;
  97. show tables;
  98. create table Teacher(id int,name string)row format delimited fields terminated by '\t';
  99. show tables;
  100. drop table test1,test2,test3;
  101. drop table test1;
  102. drop table test2;
  103. drop table test3;
  104. drop table test4;
  105. show tables;
  106. drop table az_top3;
  107. drop table az_wc;
  108. show tbales;
  109. show databasers;
  110. show databases;
  111. drop database yinzhengjie;
  112. ;
  113. use yinzhengjie;
  114. show tables;
  115. drop table teacher;
  116. show tables;
  117. create table Teacher(id int,name string)row format delimited fields terminated by '\t';
  118. show tables;
  119. load data local inpath '/home/yinzhengjie/download/teachers.txt';
  120. load data local inpath `/home/yinzhengjie/download/teachers.txt`;
  121. use yinzhengjie
  122. ;
  123. show tables;
  124. load data local inpath '/home/yinzhengjie/download/teachers.txt' into table yinzhengjie.teacher;
  125. select * from teacher;
  126. drop table teacher;
  127. ;
  128. create table Teacher(id int,name string)row format delimited fields terminated by '\t';
  129. show tables;
  130. select * from teacher;
  131. load data local inpath '/home/yinzhengjie/download/teachers.txt' into table yinzhengjie.teacher;
  132. select * from teacher;
  133. quit;
  134. exit;
  135. exit
  136. ;
  137. dfs -cat /user/hive/warehouse/yinzhengjie.db/teacher/teachers.txt;
  138. dfs -lsr /;
  139. ;
  140. ! ls /home/yinzhengjie;
  141. ! ls /home/yinzhengjie/download;
  142. [yinzhengjie@s101 download]$

查看hive中输入的所有历史命令([yinzhengjie@s101 download]$ cat ~/.hivehistory )

  1. [yinzhengjie@s101 download]$ hive -e "select * from yinzhengjie.teacher;"
  2. SLF4J: Class path contains multiple SLF4J bindings.
  3. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.-bin/lib/log4j-slf4j-impl-2.4..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  4. SLF4J: Found binding in [jar:file:/soft/hbase-1.2./lib/phoenix-4.10.-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  5. SLF4J: Found binding in [jar:file:/soft/hadoop-2.7./share/hadoop/common/lib/slf4j-log4j12-1.7..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  6. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  7.  
  8. Logging initialized using configuration in jar:file:/soft/apache-hive-2.1.-bin/lib/hive-common-2.1..jar!/hive-log4j2.properties Async: true
  9. default
  10. yinzhengjie
  11. OK
  12. Dennis MacAlistair Ritchie
  13. Linus Benedict Torvalds
  14. Bjarne Stroustrup
  15. Guido van Rossum
  16. James Gosling
  17. Martin Odersky
  18. Rob Pike
  19. Rasmus Lerdorf
  20. Brendan Eich
  21. Time taken: 3.414 seconds, Fetched: row(s)
  22. [yinzhengjie@s101 download]$

在shell命令行中执行HQL语句([yinzhengjie@s101 download]$ hive -e "select * from yinzhengjie.teacher;")

  1. [yinzhengjie@s101 download]$ hive -f /home/yinzhengjie/download/hivef.sql
  2. SLF4J: Class path contains multiple SLF4J bindings.
  3. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.-bin/lib/log4j-slf4j-impl-2.4..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  4. SLF4J: Found binding in [jar:file:/soft/hbase-1.2./lib/phoenix-4.10.-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  5. SLF4J: Found binding in [jar:file:/soft/hadoop-2.7./share/hadoop/common/lib/slf4j-log4j12-1.7..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  6. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  7.  
  8. Logging initialized using configuration in jar:file:/soft/apache-hive-2.1.-bin/lib/hive-common-2.1..jar!/hive-log4j2.properties Async: true
  9. default
  10. yinzhengjie
  11. OK
  12. Time taken: 0.023 seconds
  13. OK
  14. teacher
  15. yzj
  16. Time taken: 0.085 seconds, Fetched: row(s)
  17. OK
  18. Dennis MacAlistair Ritchie
  19. Linus Benedict Torvalds
  20. Bjarne Stroustrup
  21. Guido van Rossum
  22. James Gosling
  23. Martin Odersky
  24. Rob Pike
  25. Rasmus Lerdorf
  26. Brendan Eich
  27. Time taken: 2.044 seconds, Fetched: row(s)
  28. [yinzhengjie@s101 download]$

执行HQL语句的脚本文件([yinzhengjie@s101 download]$ hive -f /home/yinzhengjie/download/hivef.sql )

  1. [yinzhengjie@s101 ~]$ hive
  2. SLF4J: Class path contains multiple SLF4J bindings.
  3. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.-bin/lib/log4j-slf4j-impl-2.4..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  4. SLF4J: Found binding in [jar:file:/soft/hbase-1.2./lib/phoenix-4.10.-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  5. SLF4J: Found binding in [jar:file:/soft/hadoop-2.7./share/hadoop/common/lib/slf4j-log4j12-1.7..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  6. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  7.  
  8. Logging initialized using configuration in jar:file:/soft/apache-hive-2.1.-bin/lib/hive-common-2.1..jar!/hive-log4j2.properties Async: true
  9. Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  10. hive> quit;
  11. [yinzhengjie@s101 ~]$
  12. [yinzhengjie@s101 ~]$ hive
  13. SLF4J: Class path contains multiple SLF4J bindings.
  14. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.-bin/lib/log4j-slf4j-impl-2.4..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  15. SLF4J: Found binding in [jar:file:/soft/hbase-1.2./lib/phoenix-4.10.-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  16. SLF4J: Found binding in [jar:file:/soft/hadoop-2.7./share/hadoop/common/lib/slf4j-log4j12-1.7..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  17. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  18.  
  19. Logging initialized using configuration in jar:file:/soft/apache-hive-2.1.-bin/lib/hive-common-2.1..jar!/hive-log4j2.properties Async: true
  20. Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  21. hive> exit;
  22. [yinzhengjie@s101 ~]$

退出hive窗口(hive> exit;或者hive> quit;)

2>.DDL数据定义

  1. hive (yinzhengjie)> show databases;
  2. OK
  3. database_name
  4. default
  5. yinzhengjie
  6. Time taken: 0.007 seconds, Fetched: row(s)
  7. hive (yinzhengjie)> create database if not exists db_hive;
  8. OK
  9. Time taken: 0.034 seconds
  10. hive (yinzhengjie)> show databases;
  11. OK
  12. database_name
  13. db_hive
  14. default
  15. yinzhengjie
  16. Time taken: 0.009 seconds, Fetched: row(s)
  17. hive (yinzhengjie)>

创建一个数据库的标准写法(hive (yinzhengjie)> create database if not exists db_hive;),创建的数据库默认存放在hdfs中的“/user/hive/warehouse”

  1. hive (yinzhengjie)> show databases;
  2. OK
  3. database_name
  4. db_hive
  5. default
  6. yinzhengjie
  7. Time taken: 0.008 seconds, Fetched: row(s)
  8. hive (yinzhengjie)> create database if not exists db_hive2 location "/db_hive2";
  9. OK
  10. Time taken: 0.04 seconds
  11. hive (yinzhengjie)> show databases;
  12. OK
  13. database_name
  14. db_hive
  15. db_hive2
  16. default
  17. yinzhengjie
  18. Time taken: 0.006 seconds, Fetched: row(s)
  19. hive (yinzhengjie)>

创建一个数据库,使用location关键字指定数据库在HDFS上的存放位置并起一个别名(hive (yinzhengjie)> create database if not exists db_hive2 location "/db_hive2";),这种方式我不推荐大家使用,因为它和defalut数据库的存储方式很像

  1. 用户可以使用ALTER DATABASE 命令为某个数据库的DBPROPERTIES设置键-值对属性值,来描述这个数据库的属性信息。
  2. 数据库的其他元数据信息都是不可更改的,包括数据库名和数据库所在的目录位置。
  3.  
  4. hive (yinzhengjie)> show databases;
  5. OK
  6. database_name
  7. db_hive
  8. db_hive2
  9. default
  10. yinzhengjie
  11. Time taken: 0.007 seconds, Fetched: row(s)
  12. hive (yinzhengjie)> ALTER DATABASE db_hive set dbproperties('Owner'='yinzhengjie'); #给数据库添加额外的属性,注意,这里并没有修改数据库里的元数据!
  13. OK
  14. Time taken: 0.03 seconds
  15. hive (yinzhengjie)> desc database db_hive; #使用这条命令是查不到的咱们定义的属性的哟!
  16. OK
  17. db_name comment location owner_name owner_type parameters
  18. db_hive hdfs://mycluster/user/hive/warehouse/db_hive.db yinzhengjie USER
  19. Time taken: 0.017 seconds, Fetched: row(s)
  20. hive (yinzhengjie)> desc database extended db_hive; #我们需要在数据库前加一个extended关键字,就能查看到我们定义的数据库属性。
  21. OK
  22. db_name comment location owner_name owner_type parameters
  23. db_hive hdfs://mycluster/user/hive/warehouse/db_hive.db yinzhengjie USER {Owner=yinzhengjie}
  24. Time taken: 0.011 seconds, Fetched: row(s)
  25. hive (yinzhengjie)>

修改数据库属性( hive (yinzhengjie)> ALTER DATABASE db_hive set dbproperties('Owner'='yinzhengjie'); )

  1. hive (yinzhengjie)> show databases; #显示所有的数据库
  2. OK
  3. database_name
  4. db_hive
  5. db_hive2
  6. default
  7. yinzhengjie
  8. Time taken: 0.008 seconds, Fetched: row(s)
  9. hive (yinzhengjie)>
  10. hive (yinzhengjie)> show databases like 'yin*'; #过滤显示的查询的数据库
  11. OK
  12. database_name
  13. yinzhengjie
  14. Time taken: 0.009 seconds, Fetched: row(s)
  15. hive (yinzhengjie)>
  16. hive (yinzhengjie)> desc database db_hive; #显示数据库信息
  17. OK
  18. db_name comment location owner_name owner_type parameters
  19. db_hive hdfs://mycluster/user/hive/warehouse/db_hive.db yinzhengjie USER
  20. Time taken: 0.012 seconds, Fetched: row(s)
  21. hive (yinzhengjie)> desc database extended db_hive; #显示数据库详细信息,使用关键字:extended
  22. OK
  23. db_name comment location owner_name owner_type parameters
  24. db_hive hdfs://mycluster/user/hive/warehouse/db_hive.db yinzhengjie USER {Owner=yinzhengjie}
  25. Time taken: 0.013 seconds, Fetched: row(s)
  26. hive (yinzhengjie)>
  27. hive (yinzhengjie)> show databases;
  28. OK
  29. database_name
  30. db_hive
  31. db_hive2
  32. default
  33. yinzhengjie
  34. Time taken: 0.006 seconds, Fetched: row(s)
  35. hive (yinzhengjie)> use default; #使用数据库
  36. OK
  37. Time taken: 0.012 seconds
  38. hive (default)>

查询数据库的常用姿势介绍(hive (yinzhengjie)> show databases like 'yin*';)

  1. hive (yinzhengjie)> show databases;
  2. OK
  3. database_name
  4. db_hive
  5. db_hive2
  6. default
  7. yinzhengjie
  8. Time taken: 0.006 seconds, Fetched: row(s)
  9. hive (yinzhengjie)> use db_hive2; #使用db_hive2数据库
  10. OK
  11. Time taken: 0.014 seconds
  12. hive (db_hive2)> show tables; #db_hive2数据库中没有任何表
  13. OK
  14. tab_name
  15. Time taken: 0.015 seconds
  16. hive (db_hive2)> drop database if exists db_hive2; #删除空的数据库db_hive2
  17. OK
  18. Time taken: 0.05 seconds
  19. hive (db_hive2)> show databases;
  20. OK
  21. database_name
  22. db_hive
  23. default
  24. yinzhengjie
  25. Time taken: 0.006 seconds, Fetched: row(s)
  26. hive (db_hive2)> use db_hive; #使用db_hive数据库
  27. OK
  28. Time taken: 0.012 seconds
  29. hive (db_hive)> show tables; #db_hive2数据库中是有数据表的
  30. OK
  31. tab_name
  32. classlist
  33. student
  34. teacher
  35. Time taken: 0.016 seconds, Fetched: row(s)
  36. hive (db_hive)> drop database db_hive cascade; #使用关键字cascade强制删除有数据的数据库db_hive
  37. OK
  38. Time taken: 0.304 seconds
  39. hive (db_hive)> use yinzhengjie;
  40. OK
  41. Time taken: 0.016 seconds
  42. hive (yinzhengjie)> show databases;
  43. OK
  44. database_name
  45. default
  46. yinzhengjie
  47. Time taken: 0.007 seconds, Fetched: row(s)
  48. hive (yinzhengjie)>

删除数据库的常用姿势介绍(hive (db_hive)> drop database db_hive cascade;)

  1. 一.建表语法以及字段解释
  2. >.建表语句如下:
  3. CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
  4. [(col_name data_type [COMMENT col_comment], ...)]
  5. [COMMENT table_comment]
  6. [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
  7. [CLUSTERED BY (col_name, col_name, ...)
  8. [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
  9. [ROW FORMAT row_format]
  10. [STORED AS file_format]
  11. [LOCATION hdfs_path]
  12. >.字段解释说明:
  13. a>.CREATE TABLE 创建一个指定名字的表。如果相同名字的表已经存在,则抛出异常;用户可以用 IF NOT EXISTS 选项来忽略这个异常。
  14. b>.EXTERNAL关键字可以让用户创建一个外部表,在建表的同时指定一个指向实际数据的路径(LOCATION),Hive创建内部表时,会将数据移动到数据仓库指向的路径;若创建外部表,仅记录数据所在的路径,不对数据的位置做任何改变。在删除表的时候,内部表的元数据和数据会被一起删除,而外部表只删除元数据,不删除数据。
  15. c>.COMMENT:为表和列添加注释。
  16. d>.PARTITIONED BY创建分区表
  17. e>.CLUSTERED BY创建分桶表
  18. f>.SORTED BY不常用
  19. g>.ROW FORMAT
  20. DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char]
  21. [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
  22. | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
  23. 用户在建表的时候可以自定义SerDe或者使用自带的SerDe。如果没有指定ROW FORMAT 或者ROW FORMAT DELIMITED,将会使用自带的SerDe。在建表的时候,用户还需要为表指定列,用户在指定表的列的同时也会指定自定义的SerDeHive通过SerDe确定表的具体的列的数据。
  24. h>.STORED AS指定存储文件类型
  25. 常用的存储文件类型:SEQUENCEFILE(二进制序列文件)、TEXTFILE(文本)、RCFILE(列式存储格式文件)
  26. 如果文件数据是纯文本,可以使用STORED AS TEXTFILE。如果数据需要压缩,使用 STORED AS SEQUENCEFILE
  27. i>.LOCATION :指定表在HDFS上的存储位置。
  28. j>.LIKE允许用户复制现有的表结构,但是不复制数据。
  29.  
  30. 二.管理表(内部表)理论
  31. 默认创建的表都是所谓的管理表,有时也被称为内部表。因为这种表,Hive会(或多或少地)控制着数据的生命周期。Hive默认情况下会将这些表的数据存储在由配置项hive.metastore.warehouse.dir(例如,/user/hive/warehouse)所定义的目录的子目录下。 当我们删除一个管理表时,Hive也会删除这个表中数据。管理表不适合和其他工具共享数据。
  32.  
  33. 三.外部表
  34. >.理论
  35. 因为表是外部表,所以Hive并非认为其完全拥有这份数据。删除该表并不会删除掉这份数据,不过描述表的元数据信息会被删除掉。
  36. >.管理表和外部表的使用场景:
  37. 每天将收集到的网站日志定期流入HDFS文本文件。在外部表(原始日志表)的基础上做大量的统计分析,用到的中间表、结果表使用内部表存储,数据通过SELECT+INSERT进入内部表。
  38.  
  39. 四.分区表
  40. 分区表实际上就是对应一个HDFS文件系统上的独立的文件夹,该文件夹下是该分区所有的数据文件。Hive中的分区就是分目录,把一个大的数据集根据业务需要分割成小的数据集。在查询时通过WHERE子句中的表达式选择查询所需要的指定的分区,这样的查询效率会提高很多。

建表语法与管理表(内部表),外部表以及分区理论知识扫描,如果你小白,这里的内容强烈推荐你看三遍!!!

管理表-普通创建表的标准写法,指定存储方式以及表创建的数据库名称(hive (yinzhengjie)> create table if not exists Student(id int,name string)row format delimited fields terminated by '\t' stored as textfile location '/user/hive/warehouse/yinzhengjie.db';)
  1. hive (yinzhengjie)> show tables;
  2. OK
  3. tab_name
  4. student
  5. teacher
  6. Time taken: 0.015 seconds, Fetched: row(s)
  7. hive (yinzhengjie)> create table if not exists teacherbak as select id, name from teacher; #根据查询结果创建表,即查询的结果会添加到新创建的表中,它会自动启用一个job
  8. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  9. Query ID = yinzhengjie_20180806000505_71d796a2----b5d39abd58c9
  10. Total jobs =
  11. Launching Job out of
  12. Number of reduce tasks is set to since there's no reduce operator
  13. Starting Job = job_1533518652134_0001, Tracking URL = http://s101:8088/proxy/application_1533518652134_0001/
  14. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533518652134_0001
  15. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  16. -- ::, Stage- map = %, reduce = %
  17. -- ::, Stage- map = %, reduce = %, Cumulative CPU 2.02 sec
  18. MapReduce Total cumulative CPU time: seconds msec
  19. Ended Job = job_1533518652134_0001
  20. Stage- is selected by condition resolver.
  21. Stage- is filtered out by condition resolver.
  22. Stage- is filtered out by condition resolver.
  23. Moving data to directory hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/.hive-staging_hive_2018-08-06_00-05-05_947_8165112419833752968-1/-ext-10002
  24. Moving data to directory hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/teacherbak
  25. MapReduce Jobs Launched:
  26. Stage-Stage-: Map: Cumulative CPU: 2.02 sec HDFS Read: HDFS Write: SUCCESS
  27. Total MapReduce CPU Time Spent: seconds msec
  28. OK
  29. id name
  30. Time taken: 33.117 seconds
  31. hive (yinzhengjie)> show tables;
  32. OK
  33. tab_name
  34. student
  35. teacher
  36. teacherbak
  37. Time taken: 0.014 seconds, Fetched: row(s)
  38. hive (yinzhengjie)> select id, name from teacher; #查看teacher表中的数据
  39. OK
  40. id name
  41. Dennis MacAlistair Ritchie
  42. Linus Benedict Torvalds
  43. Bjarne Stroustrup
  44. Guido van Rossum
  45. James Gosling
  46. Martin Odersky
  47. Rob Pike
  48. Rasmus Lerdorf
  49. Brendan Eich
  50. Time taken: 0.093 seconds, Fetched: row(s)
  51. hive (yinzhengjie)>
  52. hive (yinzhengjie)> select id, name from teacherbak; #查看teacherbak表中的数据,我们会发现其内容和teacher一致
  53. OK
  54. id name
  55. Dennis MacAlistair Ritchie
  56. Linus Benedict Torvalds
  57. Bjarne Stroustrup
  58. Guido van Rossum
  59. James Gosling
  60. Martin Odersky
  61. Rob Pike
  62. Rasmus Lerdorf
  63. Brendan Eich
  64. Time taken: 0.083 seconds, Fetched: row(s)
  65. hive (yinzhengjie)>

管理表(内部表)-根据查询结果创建表,即查询的结果会添加到新创建的表中(hive (yinzhengjie)> create table if not exists teacherbak as select id, name from teacher;)

  1. hive (yinzhengjie)> show tables;
  2. OK
  3. tab_name
  4. student
  5. teacher
  6. teacherbak
  7. Time taken: 0.013 seconds, Fetched: row(s)
  8. hive (yinzhengjie)> desc teacher;
  9. OK
  10. col_name data_type comment
  11. id int
  12. name string
  13. Time taken: 0.029 seconds, Fetched: row(s)
  14. hive (yinzhengjie)> select * from teacher;
  15. OK
  16. teacher.id teacher.name
  17. Dennis MacAlistair Ritchie
  18. Linus Benedict Torvalds
  19. Bjarne Stroustrup
  20. Guido van Rossum
  21. James Gosling
  22. Martin Odersky
  23. Rob Pike
  24. Rasmus Lerdorf
  25. Brendan Eich
  26. Time taken: 0.1 seconds, Fetched: row(s)
  27. hive (yinzhengjie)> create table if not exists teacherCopy like teacher; #根据已经存在的表结构创建表,即只复制表结构,并不会复制表中的数据
  28. OK
  29. Time taken: 0.181 seconds
  30. hive (yinzhengjie)> show tables;
  31. OK
  32. tab_name
  33. student
  34. teacher
  35. teacherbak
  36. teachercopy
  37. Time taken: 0.014 seconds, Fetched: row(s)
  38. hive (yinzhengjie)> select * from teachercopy;
  39. OK
  40. teachercopy.id teachercopy.name
  41. Time taken: 0.103 seconds
  42. hive (yinzhengjie)> desc teachercopy;
  43. OK
  44. col_name data_type comment
  45. id int
  46. name string
  47. Time taken: 0.03 seconds, Fetched: row(s)
  48. hive (yinzhengjie)>

管理表(内部表)-根据已经存在的表结构创建表,即只复制表结构,并不会复制表中的数据(hive (yinzhengjie)> create table if not exists teacherCopy like teacher;)

  1. hive (yinzhengjie)> show tables;
  2. OK
  3. tab_name
  4. student
  5. teacher
  6. teacherbak
  7. teachercopy
  8. Time taken: 0.012 seconds, Fetched: row(s)
  9. hive (yinzhengjie)> desc formatted teacher; #查询表的类型
  10. OK
  11. col_name data_type comment
  12. # col_name data_type comment
  13.  
  14. id int
  15. name string
  16.  
  17. # Detailed Table Information
  18. Database: yinzhengjie
  19. Owner: yinzhengjie
  20. CreateTime: Sun Aug :: PDT
  21. LastAccessTime: UNKNOWN
  22. Retention:
  23. Location: hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/teacher
  24. Table Type: MANAGED_TABLE #显示器对面的小哥哥小姐姐往这里看,这里可以查看当前表的类型哟,这里明显是管理表,也称为内部表。
  25. Table Parameters:
  26. numFiles
  27. numRows
  28. rawDataSize
  29. totalSize
  30. transient_lastDdlTime
  31.  
  32. # Storage Information
  33. SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  34. InputFormat: org.apache.hadoop.mapred.TextInputFormat
  35. OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  36. Compressed: No
  37. Num Buckets: -
  38. Bucket Columns: []
  39. Sort Columns: []
  40. Storage Desc Params:
  41. field.delim \t
  42. serialization.format \t
  43. Time taken: 0.036 seconds, Fetched: row(s)
  44. hive (yinzhengjie)>

管理表(内部表)-查询表的类型(hive (yinzhengjie)> desc formatted teacher;)

  1. 一.查看原始数据
  2. [yinzhengjie@s101 download]$ pwd
  3. /home/yinzhengjie/download
  4. [yinzhengjie@s101 download]$
  5. [yinzhengjie@s101 download]$ cat dept.dat
  6. ACCOUNTING
  7. RESEARCH
  8. SALES
  9. OPERATIONS
  10. [yinzhengjie@s101 download]$
  11. [yinzhengjie@s101 download]$ more emp.dat
  12. SMITH CLERK -- 800.00
  13. ALLEN SALESMAN -- 1600.00 300.00
  14. WARD SALESMAN -- 1250.00 500.00
  15. JONES MANAGER -- 2975.00
  16. MARTIN SALESMAN -- 1250.00 1400.00
  17. BLAKE MANAGER -- 2850.00
  18. CLARK MANAGER -- 2450.00
  19. SCOTT ANALYST -- 3000.00
  20. KING PRESIDENT -- 5000.00
  21. TURNER SALESMAN -- 1500.00 0.00
  22. ADAMS CLERK -- 1100.00
  23. JAMES CLERK -- 950.00
  24. FORD ANALYST -- 3000.00
  25. MILLER CLERK -- 1300.00
  26. [yinzhengjie@s101 download]$
  27.  
  28. 二.使用关键字external创建外部表语句
  29. >.创建部门表
  30. hive (yinzhengjie)> create external table if not exists yinzhengjie.dept(
  31. > deptno int,
  32. > dname string,
  33. > loc int
  34. > )
  35. > row format delimited fields terminated by '\t';
  36. OK
  37. Time taken: 0.096 seconds
  38. hive (yinzhengjie)>
  39.  
  40. >.创建员工表
  41. hive (yinzhengjie)> create external table if not exists yinzhengjie.emp(
  42. > empno int,
  43. > ename string,
  44. > job string,
  45. > mgr int,
  46. > hiredate string,
  47. > sal double,
  48. > comm double,
  49. > deptno int
  50. > )
  51. > row format delimited fields terminated by '\t';
  52. OK
  53. Time taken: 0.064 seconds
  54. hive (yinzhengjie)>
  55.  
  56. >.向外部表中导入数据
  57. hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/dept.dat' into table yinzhengjie.dept;
  58. Loading data to table yinzhengjie.dept
  59. OK
  60. Time taken: 0.222 seconds
  61. hive (yinzhengjie)>
  62. hive (yinzhengjie)> select * from dept; #导入成功后需要查看dept表中是否有数据
  63. OK
  64. dept.deptno dept.dname dept.loc
  65. ACCOUNTING
  66. RESEARCH
  67. SALES
  68. OPERATIONS
  69. Time taken: 0.088 seconds, Fetched: row(s)
  70. hive (yinzhengjie)>
  71. hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/emp.dat' into table yinzhengjie.emp;
  72. Loading data to table yinzhengjie.emp
  73. OK
  74. Time taken: 0.21 seconds
  75. hive (yinzhengjie)>
  76. hive (yinzhengjie)> select * from emp; #导入成功后需要查看emp表中是否有数据
  77. OK
  78. emp.empno emp.ename emp.job emp.mgr emp.hiredate emp.sal emp.comm emp.deptno
  79. SMITH CLERK -- 800.0 NULL
  80. ALLEN SALESMAN -- 1600.0 300.0
  81. WARD SALESMAN -- 1250.0 500.0
  82. JONES MANAGER -- 2975.0 NULL
  83. MARTIN SALESMAN -- 1250.0 1400.0
  84. BLAKE MANAGER -- 2850.0 NULL
  85. CLARK MANAGER -- 2450.0 NULL
  86. SCOTT ANALYST -- 3000.0 NULL
  87. KING PRESIDENT NULL -- 5000.0 NULL
  88. TURNER SALESMAN -- 1500.0 0.0
  89. ADAMS CLERK -- 1100.0 NULL
  90. JAMES CLERK -- 950.0 NULL
  91. FORD ANALYST -- 3000.0 NULL
  92. MILLER CLERK -- 1300.0 NULL
  93. Time taken: 0.079 seconds, Fetched: row(s)
  94. hive (yinzhengjie)>
  95.  
  96. >.查看表类型
  97. hive (yinzhengjie)> desc formatted dept; #查看dept表格式化数据
  98. OK
  99. col_name data_type comment
  100. # col_name data_type comment
  101.  
  102. deptno int
  103. dname string
  104. loc int
  105.  
  106. # Detailed Table Information
  107. Database: yinzhengjie
  108. Owner: yinzhengjie
  109. CreateTime: Mon Aug :: PDT
  110. LastAccessTime: UNKNOWN
  111. Retention:
  112. Location: hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/dept
  113. Table Type: EXTERNAL_TABLE #Duang~显示器面前的小哥哥小姐姐往这看,这里有查看dept表的的类型是外部表哟!
  114. Table Parameters:
  115. EXTERNAL TRUE
  116. numFiles
  117. numRows
  118. rawDataSize
  119. totalSize
  120. transient_lastDdlTime
  121.  
  122. # Storage Information
  123. SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  124. InputFormat: org.apache.hadoop.mapred.TextInputFormat
  125. OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  126. Compressed: No
  127. Num Buckets: -
  128. Bucket Columns: []
  129. Sort Columns: []
  130. Storage Desc Params:
  131. field.delim \t
  132. serialization.format \t
  133. Time taken: 0.036 seconds, Fetched: row(s)
  134. hive (yinzhengjie)> desc formatted emp; #查看emp表格式化数据
  135. OK
  136. col_name data_type comment
  137. # col_name data_type comment
  138.  
  139. empno int
  140. ename string
  141. job string
  142. mgr int
  143. hiredate string
  144. sal double
  145. comm double
  146. deptno int
  147.  
  148. # Detailed Table Information
  149. Database: yinzhengjie
  150. Owner: yinzhengjie
  151. CreateTime: Mon Aug :: PDT
  152. LastAccessTime: UNKNOWN
  153. Retention:
  154. Location: hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/emp
  155. Table Type: EXTERNAL_TABLE #Duang~显示器面前的小哥哥小姐姐往这看,这里有查看emp表的的类型是外部表哟!
  156. Table Parameters:
  157. EXTERNAL TRUE
  158. numFiles
  159. numRows
  160. rawDataSize
  161. totalSize
  162. transient_lastDdlTime
  163.  
  164. # Storage Information
  165. SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  166. InputFormat: org.apache.hadoop.mapred.TextInputFormat
  167. OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  168. Compressed: No
  169. Num Buckets: -
  170. Bucket Columns: []
  171. Sort Columns: []
  172. Storage Desc Params:
  173. field.delim \t
  174. serialization.format \t
  175. Time taken: 0.036 seconds, Fetched: row(s)
  176. hive (yinzhengjie)>
  177.  
  178. >.在hive中删除外部表并不会删除hdfs的真实数据
  179. hive (yinzhengjie)> show tables;
  180. OK
  181. tab_name
  182. dept
  183. emp
  184. student
  185. teacher
  186. teacherbak
  187. teachercopy
  188. Time taken: 0.014 seconds, Fetched: row(s)
  189. hive (yinzhengjie)> drop table dept;
  190. OK
  191. Time taken: 0.122 seconds
  192. hive (yinzhengjie)> drop table emp;
  193. OK
  194. Time taken: 0.079 seconds
  195. hive (yinzhengjie)> show tables; #你会发现删除了元数据表,并没有删除真实数据,我们可以在hive中通过dfs命令查看真实数据
  196. OK
  197. tab_name
  198. student
  199. teacher
  200. teacherbak
  201. teachercopy
  202. Time taken: 0.013 seconds, Fetched: row(s)
  203. hive (yinzhengjie)>
  204. hive (yinzhengjie)> dfs -cat /user/hive/warehouse/yinzhengjie.db/dept/dept.dat; #怎么样?hdfs中的文件内容依旧存在,并没有删除,hive只是删除了元数据而已。
  205. ACCOUNTING
  206. RESEARCH
  207. SALES
  208. OPERATIONS
  209. hive (yinzhengjie)>
  210. > dfs -cat /user/hive/warehouse/yinzhengjie.db/emp/emp.dat; #怎么样?hdfs中的文件内容依旧存在,并没有删除,hive只是删除了元数据而已。
  211. SMITH CLERK -- 800.00
  212. ALLEN SALESMAN -- 1600.00 300.00
  213. WARD SALESMAN -- 1250.00 500.00
  214. JONES MANAGER -- 2975.00
  215. MARTIN SALESMAN -- 1250.00 1400.00
  216. BLAKE MANAGER -- 2850.00
  217. CLARK MANAGER -- 2450.00
  218. SCOTT ANALYST -- 3000.00
  219. KING PRESIDENT -- 5000.00
  220. TURNER SALESMAN -- 1500.00 0.00
  221. ADAMS CLERK -- 1100.00
  222. JAMES CLERK -- 950.00
  223. FORD ANALYST -- 3000.00
  224. MILLER CLERK -- 1300.00
  225. hive (yinzhengjie)>

外部表案例实操-分别创建部门和员工外部表,并向表中导入数据。

  1. 分区表的特点总结如下:
  2. >.分区表实际上就是对应一个HDFS文件系统上的独立的文件夹,该文件夹下是该分区所有的数据文件。
  3. >.Hive中的分区就是对应一个HDFS文件系统上分目录,把一个大的数据集根据业务的需要分割成小的数据集。
  4. >.在查询时通过where子句中的表达式选择查询所需要的指定分区,这样的查询效率会提高很多。
  5.  
  6. [yinzhengjie@s101 download]$ cat users.txt
  7. yinzhengjie
  8. Guido van Rossum
  9. Martin Odersky
  10. Rasmus Lerdorf
  11. [yinzhengjie@s101 download]$
  12. [yinzhengjie@s101 download]$ cat dept.txt
  13. 开发部门
  14. 运维部门
  15. 测试部门
  16. 产品部门
  17. 销售部门
  18. 财务部门
  19. 人事部门
  20. [yinzhengjie@s101 download]$

分区表的特点总结以及测试数据“dept.txt”和"users.txt"文本内容

  1. hive (yinzhengjie)> show tables;
  2. OK
  3. tab_name
  4. Time taken: 0.038 seconds
  5. hive (yinzhengjie)> create table dept_partition(
  6. > deptno int,
  7. > dname string,
  8. > loc string
  9. > )
  10. > partitioned by (month string)
  11. > row format delimited fields terminated by '\t';
  12. OK
  13. Time taken: 0.262 seconds
  14. hive (yinzhengjie)>
  15. hive (yinzhengjie)> show tables;
  16. OK
  17. tab_name
  18. dept_partition
  19. Time taken: 0.035 seconds, Fetched: row(s)
  20. hive (yinzhengjie)>

分区表-创建一个分区表语法(hive (yinzhengjie)> create table dept_partition(deptno int,dname string,loc string) partitioned by (month string)row format delimited fields terminated by '\t';)

  1. hive (yinzhengjie)> show tables;
  2. OK
  3. tab_name
  4. dept_partition
  5. raw_logs
  6. student
  7. teacher
  8. teacherbak
  9. teachercopy
  10. Time taken: 0.016 seconds, Fetched: 6 row(s)
  11. hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/dept.txt' into table yinzhengjie.dept_partition partition(month=''); #加载数据指定分区
  12. Loading data to table yinzhengjie.dept_partition partition (month=201803)
  13. OK
  14. Time taken: 0.609 seconds
  15. hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/dept.txt' into table yinzhengjie.dept_partition partition(month='');
  16. Loading data to table yinzhengjie.dept_partition partition (month=201804)
  17. OK
  18. Time taken: 0.868 seconds
  19. hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/dept.txt' into table yinzhengjie.dept_partition partition(month='');
  20. Loading data to table yinzhengjie.dept_partition partition (month=201805)
  21. OK
  22. Time taken: 0.462 seconds
  23. hive (yinzhengjie)> select * from dept_partition;
  24. OK
  25. dept_partition.deptno dept_partition.dname dept_partition.loc dept_partition.month
  26. 10 开发部门 20000 201803
  27. 20 运维部门 13000 201803
  28. 30 测试部门 8000 201803
  29. 40 产品部门 6000 201803
  30. 50 销售部门 15000 201803
  31. 60 财务部门 17000 201803
  32. 70 人事部门 16000 201803
  33. 10 开发部门 20000 201804
  34. 20 运维部门 13000 201804
  35. 30 测试部门 8000 201804
  36. 40 产品部门 6000 201804
  37. 50 销售部门 15000 201804
  38. 60 财务部门 17000 201804
  39. 70 人事部门 16000 201804
  40. 10 开发部门 20000 201805
  41. 20 运维部门 13000 201805
  42. 30 测试部门 8000 201805
  43. 40 产品部门 6000 201805
  44. 50 销售部门 15000 201805
  45. 60 财务部门 17000 201805
  46. 70 人事部门 16000 201805
  47. Time taken: 0.129 seconds, Fetched: 21 row(s)
  48. hive (yinzhengjie)> select * from dept_partition where month='';
  49. OK
  50. dept_partition.deptno dept_partition.dname dept_partition.loc dept_partition.month
  51. 10 开发部门 20000 201805
  52. 20 运维部门 13000 201805
  53. 30 测试部门 8000 201805
  54. 40 产品部门 6000 201805
  55. 50 销售部门 15000 201805
  56. 60 财务部门 17000 201805
  57. 70 人事部门 16000 201805
  58. Time taken: 1.017 seconds, Fetched: 7 row(s)
  59. hive (yinzhengjie)>

分区表-加载数据指定一个分区表(hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/dept.txt' into table yinzhengjie.dept_partition partition(month='201805');)

  1. hive (yinzhengjie)> show partitions dept_partition;
  2. OK
  3. partition
  4. month=201803
  5. month=201804
  6. month=201805
  7. Time taken: 0.563 seconds, Fetched: 3 row(s)
  8. hive (yinzhengjie)>

分区表-查看分区表现有的分区个数(hive (yinzhengjie)> show partitions dept_partition;)

  1. hive (yinzhengjie)> select * from dept_partition where month=''; #单分区查询
  2. OK
  3. dept_partition.deptno dept_partition.dname dept_partition.loc dept_partition.month
  4. 10 开发部门 20000 201805
  5. 20 运维部门 13000 201805
  6. 30 测试部门 8000 201805
  7. 40 产品部门 6000 201805
  8. 50 销售部门 15000 201805
  9. 60 财务部门 17000 201805
  10. 70 人事部门 16000 201805
  11. Time taken: 1.017 seconds, Fetched: 7 row(s)
  12. hive (yinzhengjie)>
  13. hive (yinzhengjie)> select * from dept_partition where month=''
  14. > union
  15. > select * from dept_partition where month=''
  16. > union
  17. > select * from dept_partition where month=''; #多分区联合查询,你会发现它的速度还不如select * from dept_partition;
  18. WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
  19. Query ID = yinzhengjie_20180808214447_1a70bd61-3355-4f99-ba74-de7503593798
  20. Total jobs = 2
  21. Launching Job 1 out of 2
  22. Number of reduce tasks not specified. Estimated from input data size: 1
  23. In order to change the average load for a reducer (in bytes):
  24. set hive.exec.reducers.bytes.per.reducer=<number>
  25. In order to limit the maximum number of reducers:
  26. set hive.exec.reducers.max=<number>
  27. In order to set a constant number of reducers:
  28. set mapreduce.job.reduces=<number>
  29. Starting Job = job_1533789743141_0001, Tracking URL = http://s101:8088/proxy/application_1533789743141_0001/
  30. Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0001
  31. Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
  32. 2018-08-08 21:45:46,855 Stage-1 map = 0%, reduce = 0%
  33. 2018-08-08 21:46:32,103 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 6.11 sec
  34. 2018-08-08 21:47:09,769 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 8.95 sec
  35. MapReduce Total cumulative CPU time: 8 seconds 950 msec
  36. Ended Job = job_1533789743141_0001
  37. Launching Job 2 out of 2
  38. Number of reduce tasks not specified. Estimated from input data size: 1
  39. In order to change the average load for a reducer (in bytes):
  40. set hive.exec.reducers.bytes.per.reducer=<number>
  41. In order to limit the maximum number of reducers:
  42. set hive.exec.reducers.max=<number>
  43. In order to set a constant number of reducers:
  44. set mapreduce.job.reduces=<number>
  45. Starting Job = job_1533789743141_0002, Tracking URL = http://s101:8088/proxy/application_1533789743141_0002/
  46. Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0002
  47. Hadoop job information for Stage-2: number of mappers: 2; number of reducers: 1
  48. 2018-08-08 21:47:41,300 Stage-2 map = 0%, reduce = 0%
  49. 2018-08-08 21:48:41,349 Stage-2 map = 0%, reduce = 0%, Cumulative CPU 5.88 sec
  50. 2018-08-08 21:48:42,776 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 7.33 sec
  51. 2018-08-08 21:49:23,133 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 10.41 sec
  52. MapReduce Total cumulative CPU time: 10 seconds 410 msec
  53. Ended Job = job_1533789743141_0002
  54. MapReduce Jobs Launched:
  55. Stage-Stage-1: Map: 2 Reduce: 1 Cumulative CPU: 8.95 sec HDFS Read: 17348 HDFS Write: 708 SUCCESS
  56. Stage-Stage-2: Map: 2 Reduce: 1 Cumulative CPU: 10.41 sec HDFS Read: 17496 HDFS Write: 1194 SUCCESS
  57. Total MapReduce CPU Time Spent: 19 seconds 360 msec
  58. OK
  59. u3.deptno u3.dname u3.loc u3.month
  60. 10 开发部门 20000 201803
  61. 10 开发部门 20000 201804
  62. 10 开发部门 20000 201805
  63. 20 运维部门 13000 201803
  64. 20 运维部门 13000 201804
  65. 20 运维部门 13000 201805
  66. 30 测试部门 8000 201803
  67. 30 测试部门 8000 201804
  68. 30 测试部门 8000 201805
  69. 40 产品部门 6000 201803
  70. 40 产品部门 6000 201804
  71. 40 产品部门 6000 201805
  72. 50 销售部门 15000 201803
  73. 50 销售部门 15000 201804
  74. 50 销售部门 15000 201805
  75. 60 财务部门 17000 201803
  76. 60 财务部门 17000 201804
  77. 60 财务部门 17000 201805
  78. 70 人事部门 16000 201803
  79. 70 人事部门 16000 201804
  80. 70 人事部门 16000 201805
  81. Time taken: 278.849 seconds, Fetched: 21 row(s)
  82. hive (yinzhengjie)>

分区表-查询分区表数据之单分区查询个多分区联合查询(hive (yinzhengjie)> select * from dept_partition where month='201803' union select * from dept_partition where month='201804' union select * from dept_partition where month='201805'; )

  1. hive (yinzhengjie)> show partitions dept_partition; #查看分区表中已经有的分区数
  2. OK
  3. partition
  4. month=201803
  5. month=201804
  6. month=201805
  7. Time taken: 0.563 seconds, Fetched: 3 row(s)
  8. hive (yinzhengjie)> ALTER TABLE dept_partition ADD PARTITION(month=''); #添加单个分区
  9. OK
  10. Time taken: 0.562 seconds
  11. hive (yinzhengjie)> show partitions dept_partition;
  12. OK
  13. partition
  14. month=201803
  15. month=201804
  16. month=201805
  17. month=201806
  18. Time taken: 0.096 seconds, Fetched: 4 row(s)
  19. hive (yinzhengjie)> ALTER TABLE dept_partition ADD PARTITION(month='') PARTITION(month='') PARTITION(month=''); #添加多个分区
  20. OK
  21. Time taken: 0.22 seconds
  22. hive (yinzhengjie)> show partitions dept_partition;
  23. OK
  24. partition
  25. month=201803
  26. month=201804
  27. month=201805
  28. month=201806
  29. month=201807
  30. month=201808
  31. month=201809
  32. Time taken: 0.097 seconds, Fetched: 7 row(s)
  33. hive (yinzhengjie)>

分区表-增加分区之创建单个分区和同时创建多个分区案例展示(hive (yinzhengjie)> ALTER TABLE dept_partition ADD PARTITION(month='201807') PARTITION(month='201808') PARTITION(month='201809');)

  1. hive (yinzhengjie)>
  2. hive (yinzhengjie)> show partitions dept_partition; #查看当前已经有的分区数
  3. OK
  4. partition
  5. month=201803
  6. month=201804
  7. month=201805
  8. month=201806
  9. month=201807
  10. month=201808
  11. month=201809
  12. Time taken: 0.114 seconds, Fetched: 7 row(s)
  13. hive (yinzhengjie)> ALTER TABLE dept_partition DROP PARTITION(month=''); #删除单个分区
  14. Dropped the partition month=201807
  15. OK
  16. Time taken: 0.893 seconds
  17. hive (yinzhengjie)> show partitions dept_partition;
  18. OK
  19. partition
  20. month=201803
  21. month=201804
  22. month=201805
  23. month=201806
  24. month=201808
  25. month=201809
  26. Time taken: 0.083 seconds, Fetched: 6 row(s)
  27. hive (yinzhengjie)> ALTER TABLE dept_partition DROP PARTITION(month=''),PARTITION(month=''); #同时删除多个分区
  28. Dropped the partition month=201808
  29. Dropped the partition month=201809
  30. OK
  31. Time taken: 0.364 seconds
  32. hive (yinzhengjie)> show partitions dept_partition;
  33. OK
  34. partition
  35. month=201803
  36. month=201804
  37. month=201805
  38. month=201806
  39. Time taken: 0.104 seconds, Fetched: 4 row(s)
  40. hive (yinzhengjie)>

分区表-删除分区之删除单个分区和同时删除多个分区案例展示(hive (yinzhengjie)> ALTER TABLE dept_partition DROP PARTITION(month='201808'),PARTITION(month='201809');)

  1. hive (yinzhengjie)> DESC FORMATTED dept_partition;
  2. OK
  3. col_name data_type comment
  4. # col_name data_type comment
  5.  
  6. deptno int
  7. dname string
  8. loc string
  9.  
  10. # Partition Information #这里是分区的详细信息
  11. # col_name data_type comment
  12.  
  13. month string
  14.  
  15. # Detailed Table Information
  16. Database: yinzhengjie
  17. Owner: yinzhengjie
  18. CreateTime: Wed Aug 08 21:08:14 PDT 2018
  19. LastAccessTime: UNKNOWN
  20. Retention: 0
  21. Location: hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/dept_partition
  22. Table Type: MANAGED_TABLE
  23. Table Parameters:
  24. transient_lastDdlTime 1533787694
  25.  
  26. # Storage Information
  27. SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  28. InputFormat: org.apache.hadoop.mapred.TextInputFormat
  29. OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  30. Compressed: No
  31. Num Buckets: -1
  32. Bucket Columns: []
  33. Sort Columns: []
  34. Storage Desc Params:
  35. field.delim \t
  36. serialization.format \t
  37. Time taken: 1.813 seconds, Fetched: 33 row(s)
  38. hive (yinzhengjie)>

分区表-查看分区表的结构(hive (yinzhengjie)> DESC FORMATTED dept_partition;)

  1. hive (yinzhengjie)> create table users (
  2. > id int,
  3. > name string,
  4. > age int
  5. > )
  6. > partitioned by (province string, city string)
  7. > row format delimited fields terminated by '\t';
  8. OK
  9. Time taken: 1.046 seconds
  10. hive (yinzhengjie)> show tables;
  11. OK
  12. tab_name
  13. dept_partition
  14. raw_logs
  15. student
  16. teacher
  17. teacherbak
  18. teachercopy
  19. users
  20. Time taken: 0.26 seconds, Fetched: 7 row(s)
  21. hive (yinzhengjie)>

分区表-创建二级分区表语法(hive (yinzhengjie)> create table users (id int,name string, age int) partitioned by (province string, city string) row format delimited fields terminated by '\t';)

  1. hive (yinzhengjie)> create table users (id int,name string, age int) partitioned by (province string, city string) row format delimited fields terminated by '\t'; #创建二级分区
  2. OK
  3. Time taken: 0.071 seconds
  4. hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/users.txt' into table users partition(province='hebei',city='shijiazhuang'); #加载数到擦创建的二级分区中
  5. Loading data to table yinzhengjie.users partition (province=hebei, city=shijiazhuang)
  6. OK
  7. Time taken: 0.482 seconds
  8. hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/users.txt' into table users partition(province='shanxi',city='xian');
  9. Loading data to table yinzhengjie.users partition (province=shanxi, city=xian)
  10. OK
  11. Time taken: 0.414 seconds
  12. hive (yinzhengjie)> select * from users;
  13. OK
  14. users.id users.name users.age users.province users.city
  15. 1 yinzhengjie 26 hebei shijiazhuang
  16. 2 Guido van Rossum 62 hebei shijiazhuang
  17. 3 Martin Odersky 60 hebei shijiazhuang
  18. 4 Rasmus Lerdorf 50 hebei shijiazhuang
  19. 1 yinzhengjie 26 shanxi xian
  20. 2 Guido van Rossum 62 shanxi xian
  21. 3 Martin Odersky 60 shanxi xian
  22. 4 Rasmus Lerdorf 50 shanxi xian
  23. Time taken: 0.101 seconds, Fetched: 8 row(s)
  24. hive (yinzhengjie)> select * from users where province='hebei'; #查询分区表中仅含有province='hebei'的数据
  25. OK
  26. users.id users.name users.age users.province users.city
  27. 1 yinzhengjie 26 hebei shijiazhuang
  28. 2 Guido van Rossum 62 hebei shijiazhuang
  29. 3 Martin Odersky 60 hebei shijiazhuang
  30. 4 Rasmus Lerdorf 50 hebei shijiazhuang
  31. Time taken: 1.775 seconds, Fetched: 4 row(s)
  32. hive (yinzhengjie)>

分区表-加载数据到二级分区表中(hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/users.txt' into table users partition(province='hebei',city='shijiazhuang');)

  1. hive (yinzhengjie)> dfs -mkdir -p /user/hive/warehouse/yinzhengjie.db/users/province=hebei/city=handan; #在hdfs上创建目录
  2. hive (yinzhengjie)>
  3. hive (yinzhengjie)> dfs -put /home/yinzhengjie/download/users.txt /user/hive/warehouse/yinzhengjie.db/users/province=hebei/city=handan; #将本地文件的数据上传到hdfs上
  4. hive (yinzhengjie)>
  5. hive (yinzhengjie)> select * from users where province='hebei' and city='handan'; #很显然,查看数据是没有的
  6. OK
  7. users.id users.name users.age users.province users.city
  8. Time taken: 0.304 seconds
  9. hive (yinzhengjie)>
  10. hive (yinzhengjie)> msck repair table users; #手动执行修复命令
  11. OK
  12. Partitions not in metastore: users:province=hebei/city=handan
  13. Repair: Added partition to metastore users:province=hebei/city=handan
  14. Time taken: 0.487 seconds, Fetched: 2 row(s)
  15. hive (yinzhengjie)> select * from users where province='hebei' and city='handan'; #再次查看数据,发现已经是有数据的
  16. OK
  17. users.id users.name users.age users.province users.city
  18. 1 yinzhengjie 26 hebei handan
  19. 2 Guido van Rossum 62 hebei handan
  20. 3 Martin Odersky 60 hebei handan
  21. 4 Rasmus Lerdorf 50 hebei handan
  22. Time taken: 0.156 seconds, Fetched: 4 row(s)
  23. hive (yinzhengjie)>

分区表-把数据直接上传到分区目录上,让分区表和数据产生关联的方式一:上传数据后修复(hive (yinzhengjie)> msck repair table users;)

  1. hive (yinzhengjie)> dfs -mkdir -p /user/hive/warehouse/yinzhengjie.db/users/province=shanxi/city=ankang;
  2. hive (yinzhengjie)> dfs -put /home/yinzhengjie/download/users.txt /user/hive/warehouse/yinzhengjie.db/users/province=shanxi/city=ankang;
  3. hive (yinzhengjie)> select * from users where province='shanxi' and city='ankang'; #查询数据,此时数据是没有查到的
  4. OK
  5. users.id users.name users.age users.province users.city
  6. Time taken: 0.112 seconds
  7. hive (yinzhengjie)>
  8. hive (yinzhengjie)> ALTER TABLE users add partition(province='shanxi',city='ankang'); #上传数据后添加分区
  9. OK
  10. Time taken: 0.14 seconds
  11. hive (yinzhengjie)> select * from users where province='shanxi' and city='ankang'; #再次查询数据,你会发现数据又有了
  12. OK
  13. users.id users.name users.age users.province users.city
  14. 1 yinzhengjie 26 shanxi ankang
  15. 2 Guido van Rossum 62 shanxi ankang
  16. 3 Martin Odersky 60 shanxi ankang
  17. 4 Rasmus Lerdorf 50 shanxi ankang
  18. Time taken: 0.156 seconds, Fetched: 4 row(s)
  19. hive (yinzhengjie)>

分区表-把数据直接上传到分区目录上,让分区表和数据产生关联的方式二:上传数据后添加分区(hive (yinzhengjie)> ALTER TABLE users add partition(province='shanxi',city='ankang'); )

  1. hive (yinzhengjie)> dfs -mkdir -p /user/hive/warehouse/yinzhengjie.db/users/province=shanxi/city=hanzhong; #在hdfs上创建目录
  2. hive (yinzhengjie)> select * from users where province='shanxi' and city='hanzhong'; #很显然,查看数据是没有的
  3. OK
  4. users.id users.name users.age users.province users.city
  5. Time taken: 0.148 seconds
  6. hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/users.txt' into table users partition(province='shanxi',city='hanzhong'); #上传数据后load数据到分区
  7. Loading data to table yinzhengjie.users partition (province=shanxi, city=hanzhong)
  8. OK
  9. Time taken: 0.593 seconds
  10. hive (yinzhengjie)> select * from users where province='shanxi' and city='hanzhong'; #再次查看数据,发现已经是有数据的
  11. OK
  12. users.id users.name users.age users.province users.city
  13. 1 yinzhengjie 26 shanxi hanzhong
  14. 2 Guido van Rossum 62 shanxi hanzhong
  15. 3 Martin Odersky 60 shanxi hanzhong
  16. 4 Rasmus Lerdorf 50 shanxi hanzhong
  17. Time taken: 0.104 seconds, Fetched: 4 row(s)
  18. hive (yinzhengjie)>

分区表-把数据直接上传到分区目录上,让分区表和数据产生关联的方式三:上传数据后load数据到分区(hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/users.txt' into table users partition(province='shanxi',city='hanzhong');)

  1. 分桶表-创建分桶表(hive (yinzhengjie)> create table stu_buck(id int,name string) clustered by(id) into buckets row format delimited fields terminated by '\t';)
  2. >.分区针对的是数据的存储路径;分桶针对的是数据文件。
  3. >.分区提供一个隔离数据和优化查询的便利方式。不过,并非所有的数据集都可形成合理的分区,特别是之前所提到过的要确定合适的划分大小这个疑虑。分桶是将数据集分解成更容易管理的若干部分的另一个技术。
  4.  
  5. hive (yinzhengjie)> create table stu_buck(
  6. > id int,
  7. > name string
  8. > )
  9. > clustered by(id)
  10. > into buckets
  11. > row format delimited fields terminated by '\t'; #创建分桶表
  12. OK
  13. Time taken: 0.246 seconds
  14. hive (yinzhengjie)>
  15. hive (yinzhengjie)> desc formatted stu_buck; #查看表结构
  16. OK
  17. col_name data_type comment
  18. # col_name data_type comment
  19.  
  20. id int
  21. name string
  22.  
  23. # Detailed Table Information
  24. Database: yinzhengjie
  25. Owner: yinzhengjie
  26. CreateTime: Fri Aug :: PDT
  27. LastAccessTime: UNKNOWN
  28. Retention:
  29. Location: hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/stu_buck
  30. Table Type: MANAGED_TABLE
  31. Table Parameters:
  32. COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"}
  33. numFiles
  34. numRows
  35. rawDataSize
  36. totalSize
  37. transient_lastDdlTime
  38.  
  39. # Storage Information
  40. SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  41. InputFormat: org.apache.hadoop.mapred.TextInputFormat
  42. OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  43. Compressed: No
  44. Num Buckets: #小哥哥小姐姐们,快看这里,这是4个分桶表。
  45. Bucket Columns: [id]
  46. Sort Columns: []
  47. Storage Desc Params:
  48. field.delim \t
  49. serialization.format \t
  50. Time taken: 0.128 seconds, Fetched: row(s)
  51. hive (yinzhengjie)>

分桶表-创建分桶表(hive (yinzhengjie)> create table stu_buck(id int,name string) clustered by(id) into 4 buckets row format delimited fields terminated by '\t';)

  1. 分桶表-导入数据到分桶表中(hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/stu_buck.txt' into table stu_buck;)
  2. >.分区针对的是数据的存储路径;分桶针对的是数据文件。
  3. >.分区提供一个隔离数据和优化查询的便利方式。不过,并非所有的数据集都可形成合理的分区,特别是之前所提到过的要确定合适的划分大小这个疑虑。分桶是将数据集分解成更容易管理的若干部分的另一个技术。
  4.  
  5. hive (yinzhengjie)> ! cat /home/yinzhengjie/download/stu_buck.txt; #查看本地文件内容
  6. ss1
  7. ss2
  8. ss3
  9. ss4
  10. ss5
  11. ss6
  12. ss7
  13. ss8
  14. ss9
  15. ss10
  16. ss11
  17. ss12
  18. ss13
  19. ss14
  20. ss15
  21. ss16
  22. hive (yinzhengjie)>
  23. hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/stu_buck.txt' into table stu_buck; #将本地文件内容导入到hive表中
  24. Loading data to table yinzhengjie.stu_buck
  25. OK
  26. Time taken: 0.306 seconds
  27. hive (yinzhengjie)>
  28. hive (yinzhengjie)> select * from stu_buck; #查询桶表的内容
  29. OK
  30. stu_buck.id stu_buck.name
  31. ss1
  32. ss2
  33. ss3
  34. ss4
  35. ss5
  36. ss6
  37. ss7
  38. ss8
  39. ss9
  40. ss10
  41. ss11
  42. ss12
  43. ss13
  44. ss14
  45. ss15
  46. ss16
  47. Time taken: 0.088 seconds, Fetched: row(s)
  48. hive (yinzhengjie)>

分桶表-导入数据到分桶表中(hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/stu_buck.txt' into table stu_buck;)

  1. 分桶表-创建分桶表时,数据通过子查询的方式导入(hive (yinzhengjie)> insert into table stu_buck select id, name from stu;)
  2.  
  3. hive (yinzhengjie)> create table stu(
  4. > id int,
  5. > name string
  6. > )
  7. > row format delimited fields terminated by '\t'; #先建一个普通的stu表
  8. OK
  9. Time taken: 0.148 seconds
  10. hive (yinzhengjie)>
  11. hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/stu_buck.txt' into table stu; #向普通的stu表中导入数据
  12. Loading data to table yinzhengjie.stu
  13. OK
  14. Time taken: 0.186 seconds
  15. hive (yinzhengjie)>
  16. hive (yinzhengjie)> truncate table stu_buck; #清空stu_buck表中数据
  17. OK
  18. Time taken: 0.098 seconds
  19. hive (yinzhengjie)> select * from stu_buck; #导入数据到分桶表,通过子查询的方式
  20. OK
  21. stu_buck.id stu_buck.name
  22. Time taken: 0.103 seconds
  23. hive (yinzhengjie)>
  24. hive (yinzhengjie)> insert into table stu_buck select id, name from stu; #导入数据到分桶表,通过子查询的方式
  25. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  26. Query ID = yinzhengjie_20180810010832_901bd21c-690c-48b5--c3900c960245
  27. Total jobs =
  28. Launching Job out of
  29. Number of reduce tasks determined at compile time:
  30. In order to change the average load for a reducer (in bytes):
  31. set hive.exec.reducers.bytes.per.reducer=<number>
  32. In order to limit the maximum number of reducers:
  33. set hive.exec.reducers.max=<number>
  34. In order to set a constant number of reducers:
  35. set mapreduce.job.reduces=<number>
  36. Starting Job = job_1533789743141_0049, Tracking URL = http://s101:8088/proxy/application_1533789743141_0049/
  37. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0049
  38. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  39. -- ::, Stage- map = %, reduce = %
  40. -- ::, Stage- map = %, reduce = %, Cumulative CPU 2.52 sec
  41. -- ::, Stage- map = %, reduce = %, Cumulative CPU 5.3 sec
  42. -- ::, Stage- map = %, reduce = %, Cumulative CPU 8.01 sec
  43. MapReduce Total cumulative CPU time: seconds msec
  44. Ended Job = job_1533789743141_0049
  45. Loading data to table yinzhengjie.stu_buck
  46. MapReduce Jobs Launched:
  47. Stage-Stage-: Map: Reduce: Cumulative CPU: 8.01 sec HDFS Read: HDFS Write: SUCCESS
  48. Total MapReduce CPU Time Spent: seconds msec
  49. OK
  50. id name
  51. Time taken: 95.111 seconds
  52. hive (yinzhengjie)>
  53. hive (yinzhengjie)> select * from stu_buck; #查询分桶的数据
  54. OK
  55. stu_buck.id stu_buck.name
  56. ss16
  57. ss12
  58. ss8
  59. ss4
  60. ss1
  61. ss13
  62. ss5
  63. ss9
  64. ss14
  65. ss10
  66. ss6
  67. ss2
  68. ss15
  69. ss7
  70. ss3
  71. ss11
  72. Time taken: 0.073 seconds, Fetched: row(s)
  73. hive (yinzhengjie)>

分桶表-创建分桶表时,数据通过子查询的方式导入(hive (yinzhengjie)> insert into table stu_buck select id, name from stu;)

  1. hive (yinzhengjie)> show tables; #查看当前数据库已经存在的表
  2. OK
  3. tab_name
  4. dept_partition
  5. raw_logs
  6. student
  7. teacher
  8. teacherbak
  9. teachercopy
  10. users
  11. Time taken: 0.071 seconds, Fetched: 7 row(s)
  12. hive (yinzhengjie)> ALTER TABLE users RENAME TO myusers; #重命名表,将users表名改为myusers
  13. OK
  14. Time taken: 0.341 seconds
  15. hive (yinzhengjie)> show tables; #再次查看当前数据库已经存在的表,发现表名称已经修改了
  16. OK
  17. tab_name
  18. dept_partition
  19. myusers
  20. raw_logs
  21. student
  22. teacher
  23. teacherbak
  24. teachercopy
  25. Time taken: 0.011 seconds, Fetched: 7 row(s)
  26. hive (yinzhengjie)>

修改表-重名名表实操案例(hive (yinzhengjie)> ALTER TABLE users RENAME TO myusers;)

  1. hive (yinzhengjie)> desc dept_partition; #查看表结构
  2. OK
  3. col_name data_type comment
  4. deptno int
  5. dname string
  6. loc string
  7. month string
  8.  
  9. # Partition Information
  10. # col_name data_type comment
  11.  
  12. month string
  13. Time taken: 0.054 seconds, Fetched: 9 row(s)
  14. hive (yinzhengjie)> ALTER TABLE dept_partition ADD COLUMNS(desc string); #添加新字段(列),温馨提示:ADD是代表新增一字段,字段位置在所有列后面(partition列前),REPLACE则是表示替换表中所有字段。
  15. OK
  16. Time taken: 0.176 seconds
  17. hive (yinzhengjie)> desc dept_partition; #再次查看表结构
  18. OK
  19. col_name data_type comment
  20. deptno int
  21. dname string
  22. loc string
  23. desc string
  24. month string
  25.  
  26. # Partition Information
  27. # col_name data_type comment
  28.  
  29. month string
  30. Time taken: 0.059 seconds, Fetched: 10 row(s)
  31. hive (yinzhengjie)>

修改表-添加列实操案例(hive (yinzhengjie)> ALTER TABLE dept_partition ADD COLUMNS(desc string);)

  1. hive (yinzhengjie)> desc dept_partition; #查看表结构
  2. OK
  3. col_name data_type comment
  4. deptno int
  5. dname string
  6. loc string
  7. month string
  8.  
  9. # Partition Information
  10. # col_name data_type comment
  11.  
  12. month string
  13. Time taken: 0.054 seconds, Fetched: 9 row(s)
  14. hive (yinzhengjie)>
  15. hive (yinzhengjie)> alter table dept_partition change column desc deptdesc string; #修改列名实操案例
  16. OK
  17. Time taken: 0.153 seconds
  18. hive (yinzhengjie)> desc dept_partition;
  19. OK
  20. col_name data_type comment
  21. deptno int
  22. dname string
  23. loc string
  24. deptdesc string
  25. month string
  26.  
  27. # Partition Information
  28. # col_name data_type comment
  29.  
  30. month string
  31. Time taken: 0.027 seconds, Fetched: 10 row(s)
  32. hive (yinzhengjie)>

修改表-修改列名实操案例(hive (yinzhengjie)> alter table dept_partition change column desc deptdesc string;)

  1. hive (yinzhengjie)> desc dept_partition;
  2. OK
  3. col_name data_type comment
  4. deptno int
  5. dname string
  6. loc string
  7. deptdesc string
  8. month string
  9.  
  10. # Partition Information
  11. # col_name data_type comment
  12.  
  13. month string
  14. Time taken: 0.031 seconds, Fetched: 10 row(s)
  15. hive (yinzhengjie)> alter table dept_partition replace columns(deptno string, dname string, loc string); #替换列名,温馨提示:ADD是代表新增一字段,字段位置在所有列后面(partition列前),REPLACE则是表示替换表中所有字段。
  16. OK
  17. Time taken: 0.152 seconds
  18. hive (yinzhengjie)> desc dept_partition;
  19. OK
  20. col_name data_type comment
  21. deptno string
  22. dname string
  23. loc string
  24. month string
  25.  
  26. # Partition Information
  27. # col_name data_type comment
  28.  
  29. month string
  30. Time taken: 0.027 seconds, Fetched: 9 row(s)
  31. hive (yinzhengjie)>

修改表-替换列名实操案例(hive (yinzhengjie)> alter table dept_partition replace columns(deptno string, dname string, loc string);)

  1. hive (yinzhengjie)> show tables;
  2. OK
  3. tab_name
  4. dept_partition
  5. myusers
  6. raw_logs
  7. student
  8. teacher
  9. teacherbak
  10. teachercopy
  11. Time taken: 0.015 seconds, Fetched: 7 row(s)
  12. hive (yinzhengjie)> DROP TABLE dept_partition; #删除指定的表
  13. OK
  14. Time taken: 0.214 seconds
  15. hive (yinzhengjie)> show tables;
  16. OK
  17. tab_name
  18. myusers
  19. raw_logs
  20. student
  21. teacher
  22. teacherbak
  23. teachercopy
  24. Time taken: 0.015 seconds, Fetched: 6 row(s)
  25. hive (yinzhengjie)>

修改表-删除指定的表(hive (yinzhengjie)> DROP TABLE dept_partition; )

3>.DML数据操作

  1. 数据导入-向表中装载数据(Load)语法
  2. hive>load data [local] inpath '/home/yinzhengjie/download/user.txt' [overwrite] into table student [partition (partcol1=val1,…)];
  3.  
  4. 以上参数说明:
  5. >.load data:表示加载数据
  6. >.local:表示从本地加载数据到hive表;否则从HDFS加载数据到hive
  7. >.inpath:表示加载数据的路径
  8. >.overwrite:表示覆盖表中已有数据,否则表示追加
  9. >.into table:表示加载到哪张表
  10. >.student:表示具体的表
  11. >.partition:表示上传到指定分区

数据导入-向表中装载数据(Load)语法(hive>load data [local] inpath '/home/yinzhengjie/download/user.txt' [overwrite] into table student [partition (partcol1=val1,…)];)

  1. [yinzhengjie@s101 download]$ cat /home/yinzhengjie/download/students.txt
  2. sunwukong
  3. zhubajie
  4. shaheshang
  5. bailongma
  6. tangsanzang
  7. [yinzhengjie@s101 download]$
  8.  
  9. 登录hive创建表并将数据导入进去:
  10.  
  11. hive (yinzhengjie)> create table xiyouji(
  12. > id string,
  13. > name string
  14. > )
  15. > row format delimited fields terminated by '\t';
  16. OK
  17. Time taken: 0.635 seconds
  18. hive (yinzhengjie)>
  19. hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/students.txt' into table yinzhengjie.xiyouji;
  20. Loading data to table yinzhengjie.xiyouji
  21. OK
  22. Time taken: 10.337 seconds
  23. hive (yinzhengjie)>
  24. hive (yinzhengjie)> select * from xiyouji;
  25. OK
  26. xiyouji.id xiyouji.name
  27. sunwukong
  28. zhubajie
  29. shaheshang
  30. bailongma
  31. tangsanzang
  32. Time taken: 0.131 seconds, Fetched: row(s)
  33. hive (yinzhengjie)>

数据导入-向表中装载数据(Load)实操案例之从本地导入数据(hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/students.txt' into table yinzhengjie.xiyouji;)

  1. hive (yinzhengjie)> select * from xiyouji;
  2. OK
  3. xiyouji.id xiyouji.name
  4. sunwukong
  5. zhubajie
  6. shaheshang
  7. bailongma
  8. tangsanzang
  9. Time taken: 0.207 seconds, Fetched: row(s)
  10. hive (yinzhengjie)> truncate table xiyouji; #温馨提示:Truncate只能删除管理表,不能删除外部表中数据
  11. OK
  12. Time taken: 0.169 seconds
  13. hive (yinzhengjie)> select * from xiyouji;
  14. OK
  15. xiyouji.id xiyouji.name
  16. Time taken: 0.086 seconds
  17. hive (yinzhengjie)>

清除表中数据(hive (yinzhengjie)> truncate table xiyouji;)

  1. hive (yinzhengjie)> select * from xiyouji; #查看表中数据是空的
  2. OK
  3. xiyouji.id xiyouji.name
  4. Time taken: 0.077 seconds
  5. hive (yinzhengjie)> dfs -put /home/yinzhengjie/download/students.txt /home/yinzhengjie/data; #上传文件到HDFS
  6. hive (yinzhengjie)> dfs -cat /home/yinzhengjie/data/students.txt;
  7. sunwukong
  8. zhubajie
  9. shaheshang
  10. bailongma
  11. tangsanzang
  12. hive (yinzhengjie)> load data inpath '/home/yinzhengjie/data/students.txt' into table yinzhengjie.xiyouji; #加载HDFS上数据,注意数据会被剪切走哟
  13. Loading data to table yinzhengjie.xiyouji
  14. OK
  15. Time taken: 0.228 seconds
  16. hive (yinzhengjie)> select * from xiyouji; #再次查看表中数据
  17. OK
  18. xiyouji.id xiyouji.name
  19. sunwukong
  20. zhubajie
  21. shaheshang
  22. bailongma
  23. tangsanzang
  24. Time taken: 0.073 seconds, Fetched: row(s)
  25. hive (yinzhengjie)>

数据导入-向表中装载数据(Load)实操案例之从HDFS导入数据(hive (yinzhengjie)> load data inpath '/home/yinzhengjie/data/students.txt' into table yinzhengjie.xiyouji;)

  1. hive (yinzhengjie)> select * from xiyouji; #查看上传之前表中数据
  2. OK
  3. xiyouji.id xiyouji.name
  4. sunwukong
  5. zhubajie
  6. shaheshang
  7. bailongma
  8. tangsanzang
  9. sunwukong
  10. zhubajie
  11. shaheshang
  12. bailongma
  13. tangsanzang
  14. sunwukong
  15. zhubajie
  16. shaheshang
  17. bailongma
  18. tangsanzang
  19. Time taken: 0.077 seconds, Fetched: row(s)
  20. hive (yinzhengjie)> dfs -put /home/yinzhengjie/download/students.txt /home/yinzhengjie/data; #上传文件到HDFS
  21. hive (yinzhengjie)> dfs -cat /home/yinzhengjie/data/students.txt; #查看上传到HDFS的文件内容
  22. sunwukong
  23. zhubajie
  24. shaheshang
  25. bailongma
  26. tangsanzang
  27. hive (yinzhengjie)> load data inpath '/home/yinzhengjie/data/students.txt' overwrite into table yinzhengjie.xiyouji; #加载HDFS上数据覆盖表中已有的数据,注意数据会被剪切走哟
  28. Loading data to table yinzhengjie.xiyouji
  29. OK
  30. Time taken: 0.346 seconds
  31. hive (yinzhengjie)> select * from xiyouji; #再次查看表中数据。发现之前的数据已经被覆盖了
  32. OK
  33. xiyouji.id xiyouji.name
  34. sunwukong
  35. zhubajie
  36. shaheshang
  37. bailongma
  38. tangsanzang
  39. Time taken: 0.086 seconds, Fetched: row(s)
  40. hive (yinzhengjie)>

数据导入-向表中装载数据(Load)实操案例之加载数据覆盖表中已有的数据(hive (yinzhengjie)> load data inpath '/home/yinzhengjie/data/students.txt' overwrite into table yinzhengjie.xiyouji;)

  1. hive (yinzhengjie)> drop table xiyouji; #删除之前的测试表
  2. OK
  3. Time taken: 1.645 seconds
  4. hive (yinzhengjie)>
  5. hive (yinzhengjie)> create table xiyouji(
  6. > id int,
  7. > name string
  8. > )
  9. > partitioned by (position string)
  10. > row format delimited fields terminated by '\t'; #创建一张分区表
  11. OK
  12. Time taken: 0.137 seconds
  13. hive (yinzhengjie)>
  14. hive (yinzhengjie)> insert into table xiyouji partition(position='wuzhishan') values(,'孙悟空'); #基本插入数据
  15. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  16. Query ID = yinzhengjie_20180809181325_1275bf7f--4d56-afaf-ecd310467701
  17. Total jobs =
  18. Launching Job out of
  19. Number of reduce tasks is set to since there's no reduce operator
  20. Starting Job = job_1533789743141_0004, Tracking URL = http://s101:8088/proxy/application_1533789743141_0004/
  21. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0004
  22. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  23. -- ::, Stage- map = %, reduce = %
  24. -- ::, Stage- map = %, reduce = %
  25. -- ::, Stage- map = %, reduce = %, Cumulative CPU 2.12 sec
  26. MapReduce Total cumulative CPU time: seconds msec
  27. Ended Job = job_1533789743141_0004
  28. Stage- is selected by condition resolver.
  29. Stage- is filtered out by condition resolver.
  30. Stage- is filtered out by condition resolver.
  31. Moving data to directory hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=wuzhishan/.hive-staging_hive_2018-08-09_18-13-25_269_2859222729747025112-1/-ext-10000
  32. Loading data to table yinzhengjie.xiyouji partition (position=wuzhishan)
  33. MapReduce Jobs Launched:
  34. Stage-Stage-: Map: Cumulative CPU: 2.62 sec HDFS Read: HDFS Write: SUCCESS
  35. Total MapReduce CPU Time Spent: seconds msec
  36. OK
  37. _col0 _col1
  38. Time taken: 136.695 seconds
  39. hive (yinzhengjie)> select * from xiyouji;
  40. OK
  41. xiyouji.id xiyouji.name xiyouji.position
  42. 孙悟空 wuzhishan
  43. Time taken: 0.169 seconds, Fetched: row(s)
  44. hive (yinzhengjie)>

数据导入-基本插入数据(hive (yinzhengjie)> insert into table xiyouji partition(position='wuzhishan') values(1,'孙悟空');)温馨提示:position的值最好不要设置成中文!!!

  1. hive (yinzhengjie)> select * from xiyouji; #查看表中的数据
  2. OK
  3. xiyouji.id xiyouji.name xiyouji.position
  4. 孙悟空 wuzhishan
  5. Time taken: 0.117 seconds, Fetched: row(s)
  6. hive (yinzhengjie)> insert overwrite table xiyouji partition(position='sandabaigujing') select id, name from xiyouji where position='wuzhishan'; #根据单张表查询结果向表中插入数据
  7. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  8. Query ID = yinzhengjie_20180809182335_4f9c3b89-bc30-4afb-95f7-bd294520afe9
  9. Total jobs =
  10. Launching Job out of
  11. Number of reduce tasks is set to since there's no reduce operator
  12. Starting Job = job_1533789743141_0005, Tracking URL = http://s101:8088/proxy/application_1533789743141_0005/
  13. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0005
  14. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  15. -- ::, Stage- map = %, reduce = %
  16. -- ::, Stage- map = %, reduce = %, Cumulative CPU 2.61 sec
  17. MapReduce Total cumulative CPU time: seconds msec
  18. Ended Job = job_1533789743141_0005
  19. Stage- is selected by condition resolver.
  20. Stage- is filtered out by condition resolver.
  21. Stage- is filtered out by condition resolver.
  22. Moving data to directory hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=sandabaigujing/.hive-staging_hive_2018-08-09_18-23-35_915_1607485649232911242-1/-ext-10000
  23. Loading data to table yinzhengjie.xiyouji partition (position=sandabaigujing)
  24. MapReduce Jobs Launched:
  25. Stage-Stage-: Map: Cumulative CPU: 2.61 sec HDFS Read: HDFS Write: SUCCESS
  26. Total MapReduce CPU Time Spent: seconds msec
  27. OK
  28. id name
  29. Time taken: 50.478 seconds
  30. hive (yinzhengjie)> select * from xiyouji; #再次查看表中的数据,你会发现多了一条数据,只不过position的值发生了变化
  31. OK
  32. xiyouji.id xiyouji.name xiyouji.position
  33. 孙悟空 sandabaigujing
  34. 孙悟空 wuzhishan
  35. Time taken: 0.105 seconds, Fetched: row(s)
  36. hive (yinzhengjie)>

数据导入-根据单张表查询结果向表中插入数据(hive (yinzhengjie)> insert overwrite table xiyouji partition(position='sandabaigujing') select id, name from xiyouji where position='wuzhishan';)

  1. hive (yinzhengjie)> select * from xiyouji; #查看数据表当前的数据
  2. OK
  3. xiyouji.id xiyouji.name xiyouji.position
  4. 孙悟空 sandabaigujing
  5. 孙悟空 wuzhishan
  6. Time taken: 0.14 seconds, Fetched: row(s)
  7. hive (yinzhengjie)> from xiyouji
  8. > insert overwrite table xiyouji partition(position='nverguo')
  9. > select id, name where position='wuzhishan'
  10. > insert overwrite table xiyouji partition(position='zhenjiameihouwang')
  11. > select id, name where position='wuzhishan'; #根据多张表查询结果多插入模式,我测试时只插入了2条数据
  12. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  13. Query ID = yinzhengjie_20180809183740_ef71ba4e-acec-4ef7--0f01c57bd49d
  14. Total jobs =
  15. Launching Job out of
  16. Number of reduce tasks is set to since there's no reduce operator
  17. Starting Job = job_1533789743141_0009, Tracking URL = http://s101:8088/proxy/application_1533789743141_0009/
  18. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0009
  19. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  20. -- ::, Stage- map = %, reduce = %
  21. -- ::, Stage- map = %, reduce = %, Cumulative CPU 2.08 sec
  22. MapReduce Total cumulative CPU time: seconds msec
  23. Ended Job = job_1533789743141_0009
  24. Stage- is selected by condition resolver.
  25. Stage- is filtered out by condition resolver.
  26. Stage- is filtered out by condition resolver.
  27. Stage- is selected by condition resolver.
  28. Stage- is filtered out by condition resolver.
  29. Stage- is filtered out by condition resolver.
  30. Moving data to directory hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=nverguo/.hive-staging_hive_2018-08-09_18-37-40_573_1576742180177937358-1/-ext-10000
  31. Moving data to directory hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=zhenjiameihouwang/.hive-staging_hive_2018-08-09_18-37-40_573_1576742180177937358-1/-ext-10002
  32. Loading data to table yinzhengjie.xiyouji partition (position=nverguo)
  33. Loading data to table yinzhengjie.xiyouji partition (position=zhenjiameihouwang)
  34. MapReduce Jobs Launched:
  35. Stage-Stage-: Map: Cumulative CPU: 2.08 sec HDFS Read: HDFS Write: SUCCESS
  36. Total MapReduce CPU Time Spent: seconds msec
  37. OK
  38. id name
  39. Time taken: 63.367 seconds
  40. hive (yinzhengjie)> select * from xiyouji; #再次查看数据表当前的数据,你会发现又多了2条数据
  41. OK
  42. xiyouji.id xiyouji.name xiyouji.position
  43. 孙悟空 nverguo
  44. 孙悟空 sandabaigujing
  45. 孙悟空 wuzhishan
  46. 孙悟空 zhenjiameihouwang
  47. Time taken: 0.141 seconds, Fetched: row(s)
  48. hive (yinzhengjie)>

数据导入-多插入模式(根据多张表查询结果)案例展示

  1. hive (yinzhengjie)> select * from xiyouji; #查看表中的数据
  2. OK
  3. xiyouji.id xiyouji.name xiyouji.position
  4. 孙悟空 nverguo
  5. 孙悟空 sandabaigujing
  6. 孙悟空 wuzhishan
  7. 孙悟空 zhenjiameihouwang
  8. Time taken: 0.087 seconds, Fetched: row(s)
  9. hive (yinzhengjie)> create table if not exists xiyouji2 as select id, name from xiyouji; #根据查询结果创建表(查询的结果会添加到新创建的表中)
  10. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  11. Query ID = yinzhengjie_20180809184435_d18b1d0b--4fbe-bffa-ec501fa5fd09
  12. Total jobs =
  13. Launching Job out of
  14. Number of reduce tasks is set to since there's no reduce operator
  15. Starting Job = job_1533789743141_0010, Tracking URL = http://s101:8088/proxy/application_1533789743141_0010/
  16. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0010
  17. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  18. -- ::, Stage- map = %, reduce = %
  19. -- ::, Stage- map = %, reduce = %, Cumulative CPU 2.39 sec
  20. MapReduce Total cumulative CPU time: seconds msec
  21. Ended Job = job_1533789743141_0010
  22. Stage- is selected by condition resolver.
  23. Stage- is filtered out by condition resolver.
  24. Stage- is filtered out by condition resolver.
  25. Moving data to directory hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/.hive-staging_hive_2018-08-09_18-44-35_127_6564594081639052485-1/-ext-10002
  26. Moving data to directory hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji2
  27. MapReduce Jobs Launched:
  28. Stage-Stage-: Map: Cumulative CPU: 2.39 sec HDFS Read: HDFS Write: SUCCESS
  29. Total MapReduce CPU Time Spent: seconds msec
  30. OK
  31. id name
  32. Time taken: 54.907 seconds
  33. hive (yinzhengjie)> select * from xiyouji2; #查看新生成表的数据
  34. OK
  35. xiyouji2.id xiyouji2.name
  36. 孙悟空
  37. 孙悟空
  38. 孙悟空
  39. 孙悟空
  40. Time taken: 0.065 seconds, Fetched: row(s)
  41. hive (yinzhengjie)>

数据导入-查询语句中创建表并加载数据(hive (yinzhengjie)> create table if not exists xiyouji2 as select id, name from xiyouji;)

  1. hive (yinzhengjie)> create table if not exists Student(
  2. > id int,
  3. > name string
  4. > )
  5. > row format delimited fields terminated by '\t'
  6. > location '/home/yinzhengjie/data/students.txt'; #创建表,并指定在hdfs上的加载数据路径
  7. OK
  8. Time taken: 0.017 seconds
  9. hive (yinzhengjie)>
  10. hive (yinzhengjie)> dfs -put /home/yinzhengjie/download/students.txt /home/yinzhengjie/data/students.txt; #上传数据到hdfs上
  11. hive (yinzhengjie)> dfs -cat /home/yinzhengjie/data/students.txt; #查看上传到hdfs上的数据,这个数据会被Student表自动加载。
  12. sunwukong
  13. zhubajie
  14. shaheshang
  15. bailongma
  16. tangsanzang
  17. hive (yinzhengjie)>
  18. hive (yinzhengjie)> select * from Student; #我们会发现Student表会自动加载数据,神奇不?
  19. OK
  20. student.id student.name
  21. sunwukong
  22. zhubajie
  23. shaheshang
  24. bailongma
  25. tangsanzang
  26. Time taken: 0.054 seconds, Fetched: row(s)
  27. hive (yinzhengjie)>

数据导入-创建表时通过Location指定加载数据路径案例展示

  1. hive (yinzhengjie)> import table xiyoujihouzhuan partition(position='zhenjiameihouwang') from '/home/yinzhengjie/data/xiyouji2'; #从hdfs中导入指定的分区到指定的表中
  2. Copying data from hdfs://mycluster/home/yinzhengjie/data/xiyouji2/position=zhenjiameihouwang
  3. Copying file: hdfs://mycluster/home/yinzhengjie/data/xiyouji2/position=zhenjiameihouwang/000000_0
  4. Loading data to table yinzhengjie.xiyoujihouzhuan partition (position=zhenjiameihouwang)
  5. OK
  6. Time taken: 3.966 seconds
  7. hive (yinzhengjie)> select * from xiyoujihouzhuan; #查看是否导入成功
  8. OK
  9. xiyoujihouzhuan.id xiyoujihouzhuan.name xiyoujihouzhuan.position
  10. 孙悟空 zhenjiameihouwang
  11. Time taken: 0.293 seconds, Fetched: row(s)
  12. hive (yinzhengjie)> import table xiyoujihouzhuan partition(position='nverguo') from '/home/yinzhengjie/data/xiyouji2';
  13. Copying data from hdfs://mycluster/home/yinzhengjie/data/xiyouji2/position=nverguo
  14. Copying file: hdfs://mycluster/home/yinzhengjie/data/xiyouji2/position=nverguo/000000_0
  15. Loading data to table yinzhengjie.xiyoujihouzhuan partition (position=nverguo)
  16. OK
  17. Time taken: 0.751 seconds
  18. hive (yinzhengjie)> import table xiyoujihouzhuan partition(position='wuzhishan') from '/home/yinzhengjie/data/xiyouji2';
  19. Copying data from hdfs://mycluster/home/yinzhengjie/data/xiyouji2/position=wuzhishan
  20. Copying file: hdfs://mycluster/home/yinzhengjie/data/xiyouji2/position=wuzhishan/000000_0
  21. Loading data to table yinzhengjie.xiyoujihouzhuan partition (position=wuzhishan)
  22. OK
  23. Time taken: 1.363 seconds
  24. hive (yinzhengjie)> select * from xiyoujihouzhuan;
  25. OK
  26. xiyoujihouzhuan.id xiyoujihouzhuan.name xiyoujihouzhuan.position
  27. 孙悟空 nverguo
  28. 孙悟空 wuzhishan
  29. 孙悟空 zhenjiameihouwang
  30. Time taken: 0.488 seconds, Fetched: row(s)
  31. hive (yinzhengjie)>

数据导入-Import数据到指定Hive表中,温馨提示:先用export导出后,再将数据导入。(hive (yinzhengjie)> import table xiyoujihouzhuan partition(position='wuzhishan') from '/home/yinzhengjie/data/xiyouji2';)

  1. hive (yinzhengjie)> insert overwrite local directory '/home/yinzhengjie/download/xiyouji' select * from xiyouji; #将查询的结果导出到本地路径,注意这里导出的是一个目录哟
  2. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  3. Query ID = yinzhengjie_20180809190854_cc079ee4-1d8b-43a0-b360-89ff65fb39fb
  4. Total jobs =
  5. Launching Job out of
  6. Number of reduce tasks is set to since there's no reduce operator
  7. Starting Job = job_1533789743141_0011, Tracking URL = http://s101:8088/proxy/application_1533789743141_0011/
  8. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0011
  9. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  10. -- ::, Stage- map = %, reduce = %
  11. -- ::, Stage- map = %, reduce = %, Cumulative CPU 1.96 sec
  12. -- ::, Stage- map = %, reduce = %, Cumulative CPU 2.36 sec
  13. MapReduce Total cumulative CPU time: seconds msec
  14. Ended Job = job_1533789743141_0011
  15. Moving data to local directory /home/yinzhengjie/download/xiyouji
  16. MapReduce Jobs Launched:
  17. Stage-Stage-: Map: Cumulative CPU: 2.36 sec HDFS Read: HDFS Write: SUCCESS
  18. Total MapReduce CPU Time Spent: seconds msec
  19. OK
  20. xiyouji.id xiyouji.name xiyouji.position
  21. Time taken: 77.687 seconds
  22. hive (yinzhengjie)> ! cat /home/yinzhengjie/download/xiyouji/000000_0; #查看导出到本地的文本信息
  23. 1孙悟空nverguo
  24. 1孙悟空sandabaigujing
  25. 1孙悟空wuzhishan
  26. 1孙悟空zhenjiameihouwang
  27. hive (yinzhengjie)>

数据导出-将查询的结果导出到本地(hive (yinzhengjie)> insert overwrite local directory '/home/yinzhengjie/download/xiyouji' select * from xiyouji;)

  1. hive (yinzhengjie)> insert overwrite local directory '/home/yinzhengjie/download/xiyouji2'
  2. > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
  3. > select * from xiyouji; #我们指定以"\t"进行风格字段
  4. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  5. Query ID = yinzhengjie_20180809191439_7461de80--4e07-82ac-fd54b85a0891
  6. Total jobs =
  7. Launching Job out of
  8. Number of reduce tasks is set to since there's no reduce operator
  9. Starting Job = job_1533789743141_0012, Tracking URL = http://s101:8088/proxy/application_1533789743141_0012/
  10. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0012
  11. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  12. -- ::, Stage- map = %, reduce = %
  13. -- ::, Stage- map = %, reduce = %, Cumulative CPU 2.31 sec
  14. MapReduce Total cumulative CPU time: seconds msec
  15. Ended Job = job_1533789743141_0012
  16. Moving data to local directory /home/yinzhengjie/download/xiyouji2
  17. MapReduce Jobs Launched:
  18. Stage-Stage-: Map: Cumulative CPU: 2.31 sec HDFS Read: HDFS Write: SUCCESS
  19. Total MapReduce CPU Time Spent: seconds msec
  20. OK
  21. xiyouji.id xiyouji.name xiyouji.position
  22. Time taken: 100.57 seconds
  23. hive (yinzhengjie)> ! cat /home/yinzhengjie/download/xiyouji2/000000_0; #查看导出的数据内容
  24. 孙悟空 nverguo
  25. 孙悟空 sandabaigujing
  26. 孙悟空 wuzhishan
  27. 孙悟空 zhenjiameihouwang
  28. hive (yinzhengjie)>

数据导出-将查询的结果格式化导出到本地(hive (yinzhengjie)> insert overwrite local directory '/home/yinzhengjie/download/xiyouji2' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' select * from xiyouji;)

  1. hive (yinzhengjie)> insert overwrite directory '/home/yinzhengjie/data/xiyouji'
  2. > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
  3. > select * from xiyouji; #将查询的结果导出到HDFS上
  4. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  5. Query ID = yinzhengjie_20180809192105_183285e8-bf4e--93c5-4312a8a31716
  6. Total jobs =
  7. Launching Job out of
  8. Number of reduce tasks is set to since there's no reduce operator
  9. Starting Job = job_1533789743141_0013, Tracking URL = http://s101:8088/proxy/application_1533789743141_0013/
  10. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0013
  11. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  12. -- ::, Stage- map = %, reduce = %
  13. -- ::, Stage- map = %, reduce = %, Cumulative CPU 2.38 sec
  14. MapReduce Total cumulative CPU time: seconds msec
  15. Ended Job = job_1533789743141_0013
  16. Stage- is selected by condition resolver.
  17. Stage- is filtered out by condition resolver.
  18. Stage- is filtered out by condition resolver.
  19. Moving data to directory hdfs://mycluster/home/yinzhengjie/data/xiyouji/.hive-staging_hive_2018-08-09_19-21-05_012_3955068750863516339-1/-ext-10000
  20. Moving data to directory /home/yinzhengjie/data/xiyouji
  21. MapReduce Jobs Launched:
  22. Stage-Stage-: Map: Cumulative CPU: 2.38 sec HDFS Read: HDFS Write: SUCCESS
  23. Total MapReduce CPU Time Spent: seconds msec
  24. OK
  25. xiyouji.id xiyouji.name xiyouji.position
  26. Time taken: 88.306 seconds
  27. hive (yinzhengjie)> dfs -cat /home/yinzhengjie/data/xiyouji/000000_0; #查询导出在hdfs上的数据
  28. 孙悟空 nverguo
  29. 孙悟空 sandabaigujing
  30. 孙悟空 wuzhishan
  31. 孙悟空 zhenjiameihouwang
  32. hive (yinzhengjie)>

数据导出-将查询的结果导出到HDFS上(hive (yinzhengjie)> insert overwrite directory '/home/yinzhengjie/data/xiyouji' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' select * from xiyouji;)

  1. hive (yinzhengjie)> dfs -get /home/yinzhengjie/data/xiyouji/000000_0 /home/yinzhengjie/download/xiyouji3; #通过Hadoop命令将数据导出到本地
  2. hive (yinzhengjie)> ! cat /home/yinzhengjie/download/xiyouji3; #查看导出到Linux的文本信息
  3. 孙悟空 nverguo
  4. 孙悟空 sandabaigujing
  5. 孙悟空 wuzhishan
  6. 孙悟空 zhenjiameihouwang
  7. hive (yinzhengjie)>

数据导出-Hadoop命令导出到本地(hive (yinzhengjie)> dfs -get /home/yinzhengjie/data/xiyouji/000000_0 /home/yinzhengjie/download/xiyouji3;)

  1. hive (yinzhengjie)>
  2. hive (yinzhengjie)> export table yinzhengjie.xiyouji to '/home/yinzhengjie/data/xiyouji2'; #通过Export将数据导出到HDFS上
  3. Copying data from file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30--6b0d3d62c227/hive_2018--09_19--58_906_1594217512913959561-/-local-/_metadata
  4. Copying file: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30--6b0d3d62c227/hive_2018--09_19--58_906_1594217512913959561-/-local-/_metadata
  5. Copying data from hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=???
  6. Copying file: hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=五指山/000000_0
  7. Copying data from hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=nverguo
  8. Copying file: hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=nverguo/000000_0
  9. Copying data from hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=sandabaigujing
  10. Copying file: hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=sandabaigujing/000000_0
  11. Copying data from hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=wuzhishan
  12. Copying file: hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=wuzhishan/000000_0
  13. Copying data from hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=zhenjiameihouwang
  14. Copying file: hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=zhenjiameihouwang/000000_0
  15. OK
  16. Time taken: 0.978 seconds
  17. hive (yinzhengjie)> dfs -cat /home/yinzhengjie/data/xiyouji2/position=wuzhishan/000000_0;
  18. 孙悟空
  19. hive (yinzhengjie)> dfs -cat /home/yinzhengjie/data/xiyouji2/position=nverguo/000000_0;
  20. 孙悟空
  21. hive (yinzhengjie)> dfs -cat /home/yinzhengjie/data/xiyouji2/position=sandabaigujing/000000_0;
  22. 孙悟空
  23. hive (yinzhengjie)> dfs -cat /home/yinzhengjie/data/xiyouji2/position=zhenjiameihouwang/000000_0;
  24. 孙悟空
  25. hive (yinzhengjie)>

数据导出-通过Export将数据导出到HDFS上(hive (yinzhengjie)> export table yinzhengjie.xiyouji to '/home/yinzhengjie/data/xiyouji2';)

  1. [yinzhengjie@s101 ~]$ hive -e 'select * from yinzhengjie.xiyouji;' > /home/yinzhengjie/download/xiyouji6 #通过命令行访问hive,并将数据重定向到本地的一个文件中。
  2. SLF4J: Class path contains multiple SLF4J bindings.
  3. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.-bin/lib/log4j-slf4j-impl-2.4..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  4. SLF4J: Found binding in [jar:file:/soft/hbase-1.2./lib/phoenix-4.10.-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  5. SLF4J: Found binding in [jar:file:/soft/hadoop-2.7./share/hadoop/common/lib/slf4j-log4j12-1.7..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  6. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  7.  
  8. Logging initialized using configuration in file:/soft/apache-hive-2.1.-bin/conf/hive-log4j2.properties Async: true
  9. OK
  10. Time taken: 20.367 seconds, Fetched: row(s)
  11. [yinzhengjie@s101 ~]$
  12. [yinzhengjie@s101 ~]$ cat /home/yinzhengjie/download/xiyouji6 #查看查询的结果
  13. xiyouji.id xiyouji.name xiyouji.position
  14. 孙悟空 nverguo
  15. 孙悟空 sandabaigujing
  16. 孙悟空 wuzhishan
  17. 孙悟空 zhenjiameihouwang
  18. [yinzhengjie@s101 ~]$

数据导出-Hive Shell 命令导出([yinzhengjie@s101 ~]$ hive -e 'select * from yinzhengjie.xiyouji;' > /home/yinzhengjie/download/xiyouji6)

4>.查询

  关于HQL的查询(select)语法,官网已经进行了详细说明,我这里就不搬运了,详情请参考:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select。

  1. hive (yinzhengjie)> select * from teacher; #全表查询
  2. OK
  3. teacher.id teacher.name
  4. Dennis MacAlistair Ritchie
  5. Linus Benedict Torvalds
  6. Bjarne Stroustrup
  7. Guido van Rossum
  8. James Gosling
  9. Martin Odersky
  10. Rob Pike
  11. Rasmus Lerdorf
  12. Brendan Eich
  13. Time taken: 0.108 seconds, Fetched: row(s)
  14. hive (yinzhengjie)>
  15. hive (yinzhengjie)> select name from teacher; #选择特定列查询
  16. OK
  17. name
  18. Dennis MacAlistair Ritchie
  19. Linus Benedict Torvalds
  20. Bjarne Stroustrup
  21. Guido van Rossum
  22. James Gosling
  23. Martin Odersky
  24. Rob Pike
  25. Rasmus Lerdorf
  26. Brendan Eich
  27. Time taken: 0.1 seconds, Fetched: row(s)
  28. hive (yinzhengjie)>
  29.  
  30. 温馨提示:
  31. >.SQL 语言大小写不敏感。
  32. >.SQL 可以写在一行或者多行
  33. >.关键字不能被缩写也不能分行
  34. >.各子句一般要分行写。
  35. >.使用缩进提高语句的可读性。

基本查询- 全表和特定列查询(hive (yinzhengjie)> select name from teacher;)

  1. hive (yinzhengjie)> select id AS tid, name AS Tname from teacher;
  2. OK
  3. tid tname
  4. Dennis MacAlistair Ritchie
  5. Linus Benedict Torvalds
  6. Bjarne Stroustrup
  7. Guido van Rossum
  8. James Gosling
  9. Martin Odersky
  10. Rob Pike
  11. Rasmus Lerdorf
  12. Brendan Eich
  13. Time taken: 0.088 seconds, Fetched: row(s)
  14. hive (yinzhengjie)>
  15.  
  16. 温馨提示:
  17. >.重命名一个列。
  18. >.便于计算。
  19. >.紧跟列名,也可以在列名和别名之间加入关键字‘AS

基本查询- 列别名操作案例(hive (yinzhengjie)> select id AS tid, name AS Tname from teacher;)

  1. hive (yinzhengjie)> select id AS age, name AS Tname from teacher;
  2. OK
  3. age tname
  4. Dennis MacAlistair Ritchie
  5. Linus Benedict Torvalds
  6. Bjarne Stroustrup
  7. Guido van Rossum
  8. James Gosling
  9. Martin Odersky
  10. Rob Pike
  11. Rasmus Lerdorf
  12. Brendan Eich
  13. Time taken: 0.157 seconds, Fetched: row(s)
  14. hive (yinzhengjie)> select id+ AS age, name AS Tname from teacher;
  15. OK
  16. age tname
  17. Dennis MacAlistair Ritchie
  18. Linus Benedict Torvalds
  19. Bjarne Stroustrup
  20. Guido van Rossum
  21. James Gosling
  22. Martin Odersky
  23. Rob Pike
  24. Rasmus Lerdorf
  25. Brendan Eich
  26. Time taken: 0.091 seconds, Fetched: row(s)
  27. hive (yinzhengjie)>
  28.  
  29. 算术运算符 描述
  30. A+B AB 相加
  31. A-B A减去B
  32. A*B AB 相乘
  33. A/B A除以B
  34. A%B AB取余
  35. A&B AB按位取与
  36. A|B AB按位取或
  37. A^B AB按位取异或
  38. ~A A按位取反

基本查询-通过算术运算符将查询结果的数据加20后在显示(hive (yinzhengjie)> select id+20 AS age, name AS Tname from teacher;)

  1. hive (yinzhengjie)> select count(*)cnt from teacher;
  2. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  3. Query ID = yinzhengjie_20180809202019_6a4b05d8--410b-af4e-3c1839e0bdc6
  4. Total jobs =
  5. Launching Job out of
  6. Number of reduce tasks determined at compile time:
  7. In order to change the average load for a reducer (in bytes):
  8. set hive.exec.reducers.bytes.per.reducer=<number>
  9. In order to limit the maximum number of reducers:
  10. set hive.exec.reducers.max=<number>
  11. In order to set a constant number of reducers:
  12. set mapreduce.job.reduces=<number>
  13. Starting Job = job_1533789743141_0014, Tracking URL = http://s101:8088/proxy/application_1533789743141_0014/
  14. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0014
  15. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  16. -- ::, Stage- map = %, reduce = %
  17. -- ::, Stage- map = %, reduce = %, Cumulative CPU 2.61 sec
  18. -- ::, Stage- map = %, reduce = %, Cumulative CPU 5.51 sec
  19. MapReduce Total cumulative CPU time: seconds msec
  20. Ended Job = job_1533789743141_0014
  21. MapReduce Jobs Launched:
  22. Stage-Stage-: Map: Reduce: Cumulative CPU: 5.51 sec HDFS Read: HDFS Write: SUCCESS
  23. Total MapReduce CPU Time Spent: seconds msec
  24. OK
  25. cnt
  26.  
  27. Time taken: 123.864 seconds, Fetched: row(s)
  28. hive (yinzhengjie)>

基本查询- 常用函数之求总行数(hive (yinzhengjie)> select count(*)cnt from teacher;)

  1. hive (yinzhengjie)> select max(id) max_age from teacher;
  2. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  3. Query ID = yinzhengjie_20180809202410_0146f895-4c54-440f-aa1b-bee4fb566b91
  4. Total jobs =
  5. Launching Job out of
  6. Number of reduce tasks determined at compile time:
  7. In order to change the average load for a reducer (in bytes):
  8. set hive.exec.reducers.bytes.per.reducer=<number>
  9. In order to limit the maximum number of reducers:
  10. set hive.exec.reducers.max=<number>
  11. In order to set a constant number of reducers:
  12. set mapreduce.job.reduces=<number>
  13. Starting Job = job_1533789743141_0015, Tracking URL = http://s101:8088/proxy/application_1533789743141_0015/
  14. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0015
  15. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  16. -- ::, Stage- map = %, reduce = %
  17. -- ::, Stage- map = %, reduce = %, Cumulative CPU 2.46 sec
  18. -- ::, Stage- map = %, reduce = %, Cumulative CPU 5.08 sec
  19. MapReduce Total cumulative CPU time: seconds msec
  20. Ended Job = job_1533789743141_0015
  21. MapReduce Jobs Launched:
  22. Stage-Stage-: Map: Reduce: Cumulative CPU: 5.08 sec HDFS Read: HDFS Write: SUCCESS
  23. Total MapReduce CPU Time Spent: seconds msec
  24. OK
  25. max_age
  26.  
  27. Time taken: 74.014 seconds, Fetched: row(s)
  28. hive (yinzhengjie)>

基本查询- 常用函数之求年龄的最大值(hive (yinzhengjie)> select max(id) max_age from teacher;)

  1. hive (yinzhengjie)> select min(id) min_age from teacher;
  2. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  3. Query ID = yinzhengjie_20180809202623_b1b99783-b7d3--901e-4e901795a128
  4. Total jobs =
  5. Launching Job out of
  6. Number of reduce tasks determined at compile time:
  7. In order to change the average load for a reducer (in bytes):
  8. set hive.exec.reducers.bytes.per.reducer=<number>
  9. In order to limit the maximum number of reducers:
  10. set hive.exec.reducers.max=<number>
  11. In order to set a constant number of reducers:
  12. set mapreduce.job.reduces=<number>
  13. Starting Job = job_1533789743141_0016, Tracking URL = http://s101:8088/proxy/application_1533789743141_0016/
  14. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0016
  15. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  16. -- ::, Stage- map = %, reduce = %
  17. -- ::, Stage- map = %, reduce = %, Cumulative CPU 1.34 sec
  18. -- ::, Stage- map = %, reduce = %, Cumulative CPU 3.77 sec
  19. -- ::, Stage- map = %, reduce = %, Cumulative CPU 4.42 sec
  20. MapReduce Total cumulative CPU time: seconds msec
  21. Ended Job = job_1533789743141_0016
  22. MapReduce Jobs Launched:
  23. Stage-Stage-: Map: Reduce: Cumulative CPU: 4.42 sec HDFS Read: HDFS Write: SUCCESS
  24. Total MapReduce CPU Time Spent: seconds msec
  25. OK
  26. min_age
  27.  
  28. Time taken: 79.135 seconds, Fetched: row(s)
  29. hive (yinzhengjie)>

基本查询- 常用函数之求年龄的最小值(hive (yinzhengjie)> select min(id) min_age from teacher;)

  1. hive (yinzhengjie)> select sum(id) sum_age from teacher;
  2. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  3. Query ID = yinzhengjie_20180809202800_14580ea4-3e65-461e-a1c6-6607e960c3d7
  4. Total jobs =
  5. Launching Job out of
  6. Number of reduce tasks determined at compile time:
  7. In order to change the average load for a reducer (in bytes):
  8. set hive.exec.reducers.bytes.per.reducer=<number>
  9. In order to limit the maximum number of reducers:
  10. set hive.exec.reducers.max=<number>
  11. In order to set a constant number of reducers:
  12. set mapreduce.job.reduces=<number>
  13. Starting Job = job_1533789743141_0017, Tracking URL = http://s101:8088/proxy/application_1533789743141_0017/
  14. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0017
  15. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  16. -- ::, Stage- map = %, reduce = %
  17. -- ::, Stage- map = %, reduce = %, Cumulative CPU 2.27 sec
  18. -- ::, Stage- map = %, reduce = %, Cumulative CPU 4.58 sec
  19. MapReduce Total cumulative CPU time: seconds msec
  20. Ended Job = job_1533789743141_0017
  21. MapReduce Jobs Launched:
  22. Stage-Stage-: Map: Reduce: Cumulative CPU: 4.58 sec HDFS Read: HDFS Write: SUCCESS
  23. Total MapReduce CPU Time Spent: seconds msec
  24. OK
  25. sum_age
  26.  
  27. Time taken: 43.081 seconds, Fetched: row(s)
  28. hive (yinzhengjie)>

基本查询- 常用函数之求年龄的总和(hive (yinzhengjie)> select sum(id) sum_age from teacher;)

  1. hive (yinzhengjie)> select avg(id) avg_age from teacher;
  2. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  3. Query ID = yinzhengjie_20180809202900_618a9c9f-535a-45ac-94de-16723f47d9b9
  4. Total jobs =
  5. Launching Job out of
  6. Number of reduce tasks determined at compile time:
  7. In order to change the average load for a reducer (in bytes):
  8. set hive.exec.reducers.bytes.per.reducer=<number>
  9. In order to limit the maximum number of reducers:
  10. set hive.exec.reducers.max=<number>
  11. In order to set a constant number of reducers:
  12. set mapreduce.job.reduces=<number>
  13. Starting Job = job_1533789743141_0018, Tracking URL = http://s101:8088/proxy/application_1533789743141_0018/
  14. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0018
  15. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  16. -- ::, Stage- map = %, reduce = %
  17. -- ::, Stage- map = %, reduce = %, Cumulative CPU 3.19 sec
  18. -- ::, Stage- map = %, reduce = %, Cumulative CPU 5.25 sec
  19. MapReduce Total cumulative CPU time: seconds msec
  20. Ended Job = job_1533789743141_0018
  21. MapReduce Jobs Launched:
  22. Stage-Stage-: Map: Reduce: Cumulative CPU: 5.25 sec HDFS Read: HDFS Write: SUCCESS
  23. Total MapReduce CPU Time Spent: seconds msec
  24. OK
  25. avg_age
  26. 59.333333333333336
  27. Time taken: 59.897 seconds, Fetched: row(s)
  28. hive (yinzhengjie)>

基本查询- 常用函数之求年龄的平均值(hive (yinzhengjie)> select avg(id) avg_age from teacher;)

  1. hive (yinzhengjie)> select id AS age , name from teacher;
  2. OK
  3. age name
  4. Dennis MacAlistair Ritchie
  5. Linus Benedict Torvalds
  6. Bjarne Stroustrup
  7. Guido van Rossum
  8. James Gosling
  9. Martin Odersky
  10. Rob Pike
  11. Rasmus Lerdorf
  12. Brendan Eich
  13. Time taken: 0.068 seconds, Fetched: row(s)
  14. hive (yinzhengjie)> select id AS age , name from teacher limit ; #典型的查询会返回多行数据。LIMIT子句用于限制返回的行数。
  15. OK
  16. age name
  17. Dennis MacAlistair Ritchie
  18. Linus Benedict Torvalds
  19. Bjarne Stroustrup
  20. Time taken: 0.1 seconds, Fetched: row(s)
  21. hive (yinzhengjie)>

基本查询- Limit语句(hive (yinzhengjie)> select id AS age , name from teacher limit 3;)

  1. hive (yinzhengjie)> select id, name from teacher where id> ; #使用WHERE子句,将不满足条件的行过滤掉。WHERE子句紧随FROM子句。
  2. OK
  3. id name
  4. Dennis MacAlistair Ritchie
  5. Bjarne Stroustrup
  6. Guido van Rossum
  7. James Gosling
  8. Rob Pike
  9. Time taken: 0.056 seconds, Fetched: row(s)
  10. hive (yinzhengjie)>

Where语句(hive (yinzhengjie)> select id, name from teacher where id> 60;)

  1. hive (yinzhengjie)> select * from teacher where id = ; #查询出id等于60的老师
  2. OK
  3. teacher.id teacher.name
  4. Martin Odersky
  5. Time taken: 0.075 seconds, Fetched: row(s)
  6. hive (yinzhengjie)>
  7. hive (yinzhengjie)> select * from teacher where id between and ; #查询id在40到60的老师
  8. OK
  9. teacher.id teacher.name
  10. Linus Benedict Torvalds
  11. Martin Odersky
  12. Rasmus Lerdorf
  13. Brendan Eich
  14. Time taken: 0.05 seconds, Fetched: row(s)
  15. hive (yinzhengjie)>
  16. hive (yinzhengjie)> select * from teacher where name is null; #查询name字段为空的所有老师信息,很显然我没有这样的数据
  17. OK
  18. teacher.id teacher.name
  19. Time taken: 0.104 seconds
  20. hive (yinzhengjie)>
  21. hive (yinzhengjie)> select * from teacher where id IN(,); #查询id是50和60的老师信息
  22. OK
  23. teacher.id teacher.name
  24. Martin Odersky
  25. Rasmus Lerdorf
  26. Brendan Eich
  27. Time taken: 0.07 seconds, Fetched: row(s)
  28. hive (yinzhengjie)>
  29.  
  30. 下面表中描述了谓词操作符,这些操作符同样可以用于JOINONHAVING语句中。
  31. 操作符 支持的数据类型 描述
  32. A=B 基本数据类型 如果A等于B则返回TRUE,反之返回FALSE
  33. A<=>B 基本数据类型 如果AB都为NULL,则返回TRUE,其他的和等号(=)操作符的结果一致,如果任一为NULL则结果为NULL
  34. A<>B, A!=B 基本数据类型 A或者BNULL则返回NULL;如果A不等于B,则返回TRUE,反之返回FALSE
  35. A<B 基本数据类型 A或者BNULL,则返回NULL;如果A小于B,则返回TRUE,反之返回FALSE
  36. A<=B 基本数据类型 A或者BNULL,则返回NULL;如果A小于等于B,则返回TRUE,反之返回FALSE
  37. A>B 基本数据类型 A或者BNULL,则返回NULL;如果A大于B,则返回TRUE,反之返回FALSE
  38. A>=B 基本数据类型 A或者BNULL,则返回NULL;如果A大于等于B,则返回TRUE,反之返回FALSE
  39. A [NOT] BETWEEN B AND C 基本数据类型 如果AB或者C任一为NULL,则结果为NULL。如果A的值大于等于B而且小于或等于C,则结果为TRUE,反之为FALSE。如果使用NOT关键字则可达到相反的效果。
  40. A IS NULL 所有数据类型 如果A等于NULL,则返回TRUE,反之返回FALSE
  41. A IS NOT NULL 所有数据类型 如果A不等于NULL,则返回TRUE,反之返回FALSE
  42. IN(数值1, 数值2) 所有数据类型 使用 IN运算显示列表中的值
  43. A [NOT] LIKE B STRING类型 B是一个SQL下的简单正则表达式,如果A与其匹配的话,则返回TRUE;反之返回FALSEB的表达式说明如下:‘x%’表示A必须以字母‘x’开头,‘%x’表示A必须以字母’x’结尾,而‘%x%’表示A包含有字母’x’,可以位于开头,结尾或者字符串中间。如果使用NOT关键字则可达到相反的效果。
  44. A RLIKE B, A REGEXP B STRING类型 B是一个正则表达式,如果A与其匹配,则返回TRUE;反之返回FALSE。匹配使用的是JDK中的正则表达式接口实现的,因为正则也依据其中的规则。例如,正则表达式必须和整个字符串A相匹配,而不是只需与其字符串匹配。

Where语句-比较运算符详解(hive (yinzhengjie)> select * from teacher where id IN(50,60);)

  1. >.使用LIKE运算选择类似的值
  2. >.选择条件可以包含字符或数字:
  3. % :代表零个或多个字符(任意个字符)。
  4. _ :代表一个字符。
  5. >.RLIKE子句是Hive中这个功能的一个扩展,其可以通过Java的正则表达式这个更强大的语言来指定匹配条件。
  6.  
  7. hive (yinzhengjie)> select * from teacher where id LIKE '5%'; #查找以5开头id的老师信息
  8. OK
  9. teacher.id teacher.name
  10. Rasmus Lerdorf
  11. Brendan Eich
  12. Time taken: 0.126 seconds, Fetched: row(s)
  13. hive (yinzhengjie)>
  14. hive (yinzhengjie)> select * from teacher where id LIKE '_2%'; #查找第二个数值为2的id的老师信息
  15. OK
  16. teacher.id teacher.name
  17. Guido van Rossum
  18. Rob Pike
  19. Time taken: 0.065 seconds, Fetched: row(s)
  20. hive (yinzhengjie)>
  21. hive (yinzhengjie)> select * from teacher where name RLIKE '[P]'; #查找name字段中含有“P”字母的老师信息
  22. OK
  23. teacher.id teacher.name
  24. Rob Pike
  25. Time taken: 0.049 seconds, Fetched: row(s)
  26. hive (yinzhengjie)>

Where语句-Like和RLike(hive (yinzhengjie)> select * from teacher where name RLIKE '[P]';)

  1. hive (yinzhengjie)> select * from teacher where id NOT IN(,,,,);
  2. OK
  3. teacher.id teacher.name
  4. James Gosling
  5. Martin Odersky
  6. Time taken: 0.076 seconds, Fetched: row(s)
  7. hive (yinzhengjie)>

Where语句-逻辑运算符(hive (yinzhengjie)> select * from teacher where id > 65 or id <50;)

  1. hive (yinzhengjie)> select * from dept_partition;
  2. OK
  3. dept_partition.deptno dept_partition.dname dept_partition.loc dept_partition.month
  4. 开发部门
  5. 运维部门
  6. 测试部门
  7. 产品部门
  8. 销售部门
  9. 财务部门
  10. 人事部门
  11. 开发部门
  12. 开发部门
  13. 运维部门
  14. 测试部门
  15. 产品部门
  16. 销售部门
  17. 财务部门
  18. 人事部门
  19. 运维部门
  20. 测试部门
  21. 产品部门
  22. 销售部门
  23. 财务部门
  24. 人事部门
  25. Time taken: 0.059 seconds, Fetched: row(s)
  26. hive (yinzhengjie)>
  27. hive (yinzhengjie)> select t.deptno, avg(t.loc) avg_sal from dept_partition t group by t.deptno; #计算dept_partition表每个部门的平均工资
  28. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  29. Query ID = yinzhengjie_20180809212224_fcbdaa54-b167-4a43-8a08-c0a984c25a0d
  30. Total jobs =
  31. Launching Job out of
  32. Number of reduce tasks not specified. Estimated from input data size:
  33. In order to change the average load for a reducer (in bytes):
  34. set hive.exec.reducers.bytes.per.reducer=<number>
  35. In order to limit the maximum number of reducers:
  36. set hive.exec.reducers.max=<number>
  37. In order to set a constant number of reducers:
  38. set mapreduce.job.reduces=<number>
  39. Starting Job = job_1533789743141_0021, Tracking URL = http://s101:8088/proxy/application_1533789743141_0021/
  40. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0021
  41. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  42. -- ::, Stage- map = %, reduce = %
  43. -- ::, Stage- map = %, reduce = %, Cumulative CPU 2.62 sec
  44. -- ::, Stage- map = %, reduce = %, Cumulative CPU 5.14 sec
  45. MapReduce Total cumulative CPU time: seconds msec
  46. Ended Job = job_1533789743141_0021
  47. MapReduce Jobs Launched:
  48. Stage-Stage-: Map: Reduce: Cumulative CPU: 5.14 sec HDFS Read: HDFS Write: SUCCESS
  49. Total MapReduce CPU Time Spent: seconds msec
  50. OK
  51. t.deptno avg_sal
  52. 18333.333333333332
  53. 15666.666666666666
  54. 7666.666666666667
  55. 8266.666666666666
  56. 18666.666666666668
  57. 15000.0
  58. 13566.666666666666
  59. Time taken: 68.573 seconds, Fetched: row(s)
  60. hive (yinzhengjie)>

分组-Group By语句案例一(hive (yinzhengjie)> select t.deptno, avg(t.loc) avg_sal from dept_partition t group by t.deptno;)

  1. hive (yinzhengjie)> select * from dept_partition;
  2. OK
  3. dept_partition.deptno dept_partition.dname dept_partition.loc dept_partition.month
  4. 开发部门
  5. 运维部门
  6. 测试部门
  7. 产品部门
  8. 销售部门
  9. 财务部门
  10. 人事部门
  11. 开发部门
  12. 开发部门
  13. 运维部门
  14. 测试部门
  15. 产品部门
  16. 销售部门
  17. 财务部门
  18. 人事部门
  19. 运维部门
  20. 测试部门
  21. 产品部门
  22. 销售部门
  23. 财务部门
  24. 人事部门
  25. Time taken: 0.072 seconds, Fetched: row(s)
  26. hive (yinzhengjie)>
  27. hive (yinzhengjie)> select t.deptno, t.dname,max(t.loc) max_sal from dept_partition t group by t.deptno,t.dname; #计算dept_partition每个部门中每个岗位的最高薪水
  28. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  29. Query ID = yinzhengjie_20180809213154_e1ea82c8-897d-40b5-b167-5fe42d0e6476
  30. Total jobs =
  31. Launching Job out of
  32. Number of reduce tasks not specified. Estimated from input data size:
  33. In order to change the average load for a reducer (in bytes):
  34. set hive.exec.reducers.bytes.per.reducer=<number>
  35. In order to limit the maximum number of reducers:
  36. set hive.exec.reducers.max=<number>
  37. In order to set a constant number of reducers:
  38. set mapreduce.job.reduces=<number>
  39. Starting Job = job_1533789743141_0023, Tracking URL = http://s101:8088/proxy/application_1533789743141_0023/
  40. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0023
  41. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  42. -- ::, Stage- map = %, reduce = %
  43. -- ::, Stage- map = %, reduce = %, Cumulative CPU 1.85 sec
  44. -- ::, Stage- map = %, reduce = %, Cumulative CPU 3.61 sec
  45. MapReduce Total cumulative CPU time: seconds msec
  46. Ended Job = job_1533789743141_0023
  47. MapReduce Jobs Launched:
  48. Stage-Stage-: Map: Reduce: Cumulative CPU: 3.61 sec HDFS Read: HDFS Write: SUCCESS
  49. Total MapReduce CPU Time Spent: seconds msec
  50. OK
  51. t.deptno t.dname max_sal
  52. 开发部门
  53. 运维部门
  54. 测试部门
  55. 产品部门
  56. 销售部门
  57. 财务部门
  58. 人事部门
  59. Time taken: 37.781 seconds, Fetched: row(s)
  60. hive (yinzhengjie)>

分组-Group By语句案例二(hive (yinzhengjie)> select t.deptno, t.dname,max(t.loc) max_sal from dept_partition t group by t.deptno,t.dname;)

  1. hive (yinzhengjie)> select deptno,dname,avg(loc) AS avg_sal from dept_partition group by dname,deptno; #求每个部门的平均工资
  2. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  3. Query ID = yinzhengjie_20180809213945_f7a1a9c2-8c19--9c1a-37faa29fee44
  4. Total jobs =
  5. Launching Job out of
  6. Number of reduce tasks not specified. Estimated from input data size:
  7. In order to change the average load for a reducer (in bytes):
  8. set hive.exec.reducers.bytes.per.reducer=<number>
  9. In order to limit the maximum number of reducers:
  10. set hive.exec.reducers.max=<number>
  11. In order to set a constant number of reducers:
  12. set mapreduce.job.reduces=<number>
  13. Starting Job = job_1533789743141_0024, Tracking URL = http://s101:8088/proxy/application_1533789743141_0024/
  14. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0024
  15. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  16. -- ::, Stage- map = %, reduce = %
  17. -- ::, Stage- map = %, reduce = %, Cumulative CPU 2.2 sec
  18. -- ::, Stage- map = %, reduce = %, Cumulative CPU 4.69 sec
  19. MapReduce Total cumulative CPU time: seconds msec
  20. Ended Job = job_1533789743141_0024
  21. MapReduce Jobs Launched:
  22. Stage-Stage-: Map: Reduce: Cumulative CPU: 4.69 sec HDFS Read: HDFS Write: SUCCESS
  23. Total MapReduce CPU Time Spent: seconds msec
  24. OK
  25. deptno dname avg_sal
  26. 开发部门 18333.333333333332
  27. 运维部门 15666.666666666666
  28. 测试部门 7666.666666666667
  29. 产品部门 8266.666666666666
  30. 销售部门 18666.666666666668
  31. 财务部门 15000.0
  32. 人事部门 13566.666666666666
  33. Time taken: 63.433 seconds, Fetched: row(s)
  34. hive (yinzhengjie)>
  35. hive (yinzhengjie)> select deptno,dname,avg(loc) AS avg_sal from dept_partition group by dname, deptno having avg_sal > ; #求每个部门的平均薪水大于10000的部门
  36. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  37. Query ID = yinzhengjie_20180809214521_d980d9db--4fd4-a062-ec9de0cafca2
  38. Total jobs =
  39. Launching Job out of
  40. Number of reduce tasks not specified. Estimated from input data size:
  41. In order to change the average load for a reducer (in bytes):
  42. set hive.exec.reducers.bytes.per.reducer=<number>
  43. In order to limit the maximum number of reducers:
  44. set hive.exec.reducers.max=<number>
  45. In order to set a constant number of reducers:
  46. set mapreduce.job.reduces=<number>
  47. Starting Job = job_1533789743141_0026, Tracking URL = http://s101:8088/proxy/application_1533789743141_0026/
  48. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0026
  49. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  50. -- ::, Stage- map = %, reduce = %
  51. -- ::, Stage- map = %, reduce = %, Cumulative CPU 1.45 sec
  52. -- ::, Stage- map = %, reduce = %, Cumulative CPU 4.45 sec
  53. MapReduce Total cumulative CPU time: seconds msec
  54. Ended Job = job_1533789743141_0026
  55. MapReduce Jobs Launched:
  56. Stage-Stage-: Map: Reduce: Cumulative CPU: 4.45 sec HDFS Read: HDFS Write: SUCCESS
  57. Total MapReduce CPU Time Spent: seconds msec
  58. OK
  59. deptno dname avg_sal
  60. 人事部门 13566.666666666666
  61. 开发部门 18333.333333333332
  62. 财务部门 15000.0
  63. 运维部门 15666.666666666666
  64. 销售部门 18666.666666666668
  65. Time taken: 43.701 seconds, Fetched: row(s)
  66. hive (yinzhengjie)>

分组-Having语句(hive (yinzhengjie)> select deptno,dname,avg(loc) AS avg_sal from dept_partition group by dname, deptno having avg_sal > 10000;)

  1. Join语句-等值Join(hive (yinzhengjie)> select e.empno, e.ename, d.deptno, d.dname from emp e join dept d on e.deptno = d.deptno;)
  2. Hive支持通常的SQL JOIN语句,但是只支持等值连接,不支持非等值连接。
  3.  
  4. 测试数据如下:
  5. [yinzhengjie@s101 download]$ cat /home/yinzhengjie/download/dept.txt
  6. ACCOUNTING
  7. RESEARCH
  8. SALES
  9. OPERATIONS
  10. [yinzhengjie@s101 download]$
  11. [yinzhengjie@s101 download]$ cat /home/yinzhengjie/download/emp.txt
  12. SMITH CLERK -- 800.00
  13. ALLEN SALESMAN -- 1600.00 300.00
  14. WARD SALESMAN -- 1250.00 500.00
  15. JONES MANAGER -- 2975.00
  16. MARTIN SALESMAN -- 1250.00 1400.00
  17. BLAKE MANAGER -- 2850.00
  18. CLARK MANAGER -- 2450.00
  19. SCOTT ANALYST -- 3000.00
  20. KING PRESIDENT -- 5000.00
  21. TURNER SALESMAN -- 1500.00 0.00
  22. ADAMS CLERK -- 1100.00
  23. JAMES CLERK -- 950.00
  24. FORD ANALYST -- 3000.00
  25. MILLER CLERK -- 1300.00
  26. [yinzhengjie@s101 download]$
  27.  
  28. hive查询操作如下:
  29. hive (yinzhengjie)> create table if not exists yinzhengjie.dept(
  30. > deptno int,
  31. > dname string,
  32. > loc int
  33. > )
  34. > row format delimited fields terminated by '\t'; #创建部门表dept
  35. OK
  36. Time taken: 0.204 seconds
  37. hive (yinzhengjie)> create table if not exists yinzhengjie.emp(
  38. > empno int,
  39. > ename string,
  40. > job string,
  41. > mgr int,
  42. > hiredate string,
  43. > sal double,
  44. > comm double,
  45. > deptno int
  46. >)
  47. > row format delimited fields terminated by '\t'; #创建员工表emp
  48. OK
  49. Time taken: 0.088 seconds
  50. hive (yinzhengjie)>
  51. hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/dept.txt' into table yinzhengjie.dept; #向dept中导入数据
  52. Loading data to table yinzhengjie.dept
  53. OK
  54. Time taken: 0.222 seconds
  55. hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/emp.txt' into table yinzhengjie.emp; #向emp中导入数据
  56. Loading data to table yinzhengjie.emp
  57. OK
  58. Time taken: 0.175 seconds
  59. hive (yinzhengjie)>
  60. hive (yinzhengjie)> select e.empno, e.ename, d.deptno, d.dname from emp e join dept d on e.deptno = d.deptno; #根据员工表和部门表中的部门编号相等,查询员工编号、员工名称和部门编号;
  61. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  62. Query ID = yinzhengjie_20180809233409_a9437af4-b312-4dfb-86af-f29bcf679577
  63. Total jobs =
  64. SLF4J: Class path contains multiple SLF4J bindings.
  65. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.-bin/lib/log4j-slf4j-impl-2.4..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  66. SLF4J: Found binding in [jar:file:/soft/hbase-1.2./lib/phoenix-4.10.-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  67. SLF4J: Found binding in [jar:file:/soft/hadoop-2.7./share/hadoop/common/lib/slf4j-log4j12-1.7..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  68. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  69. -- :: Starting to launch local task to process map join; maximum memory =
  70. -- :: Dump the side-table for tag: with group count: into file: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30--6b0d3d62c227/hive_2018--09_23--09_040_8075868526571286750-/-local-/HashTable-Stage-/MapJoin-mapfile11--.hashtable
  71. -- :: Uploaded File to: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30--6b0d3d62c227/hive_2018--09_23--09_040_8075868526571286750-/-local-/HashTable-Stage-/MapJoin-mapfile11--.hashtable ( bytes)
  72. -- :: End of local task; Time Taken: 9.163 sec.
  73. Execution completed successfully
  74. MapredLocal task succeeded
  75. Launching Job out of
  76. Number of reduce tasks is set to since there's no reduce operator
  77. Starting Job = job_1533789743141_0028, Tracking URL = http://s101:8088/proxy/application_1533789743141_0028/
  78. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0028
  79. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  80. -- ::, Stage- map = %, reduce = %
  81. -- ::, Stage- map = %, reduce = %, Cumulative CPU 2.71 sec
  82. MapReduce Total cumulative CPU time: seconds msec
  83. Ended Job = job_1533789743141_0028
  84. MapReduce Jobs Launched:
  85. Stage-Stage-: Map: Cumulative CPU: 2.71 sec HDFS Read: HDFS Write: SUCCESS
  86. Total MapReduce CPU Time Spent: seconds msec
  87. OK
  88. e.empno e.ename d.deptno d.dname
  89. SMITH RESEARCH
  90. SMITH RESEARCH
  91. ALLEN SALES
  92. ALLEN SALES
  93. WARD SALES
  94. WARD SALES
  95. JONES RESEARCH
  96. JONES RESEARCH
  97. MARTIN SALES
  98. MARTIN SALES
  99. BLAKE SALES
  100. BLAKE SALES
  101. CLARK ACCOUNTING
  102. CLARK ACCOUNTING
  103. SCOTT RESEARCH
  104. SCOTT RESEARCH
  105. KING ACCOUNTING
  106. KING ACCOUNTING
  107. TURNER SALES
  108. TURNER SALES
  109. ADAMS RESEARCH
  110. ADAMS RESEARCH
  111. JAMES SALES
  112. JAMES SALES
  113. FORD RESEARCH
  114. FORD RESEARCH
  115. MILLER ACCOUNTING
  116. MILLER ACCOUNTING
  117. SMITH RESEARCH
  118. SMITH RESEARCH
  119. ALLEN SALES
  120. ALLEN SALES
  121. WARD SALES
  122. WARD SALES
  123. JONES RESEARCH
  124. JONES RESEARCH
  125. MARTIN SALES
  126. MARTIN SALES
  127. BLAKE SALES
  128. BLAKE SALES
  129. CLARK ACCOUNTING
  130. CLARK ACCOUNTING
  131. SCOTT RESEARCH
  132. SCOTT RESEARCH
  133. KING ACCOUNTING
  134. KING ACCOUNTING
  135. TURNER SALES
  136. TURNER SALES
  137. ADAMS RESEARCH
  138. ADAMS RESEARCH
  139. JAMES SALES
  140. JAMES SALES
  141. FORD RESEARCH
  142. FORD RESEARCH
  143. MILLER ACCOUNTING
  144. MILLER ACCOUNTING
  145. Time taken: 98.923 seconds, Fetched: row(s)
  146. hive (yinzhengjie)>

Join语句-等值Join(hive (yinzhengjie)> select e.empno, e.ename, d.deptno, d.dname from emp e join dept d on e.deptno = d.deptno;)

  1. Join语句-表的别名(hive (yinzhengjie)> select e.empno, e.ename, d.deptno from emp e join dept d on e.deptno = d.deptno;)
  2. 表的别名有以下两个好处:
  3. >.使用别名可以简化查询。
  4. >.使用表名前缀可以提高执行效率。
  5.  
  6. hive (yinzhengjie)> select e.empno, e.ename, d.deptno from emp e join dept d on e.deptno = d.deptno; #合并员工表和部门表
  7. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  8. Query ID = yinzhengjie_20180809233120_cdd0ba5f-33b4-41f6-8f49-4a51e3c104ec
  9. Total jobs =
  10. SLF4J: Class path contains multiple SLF4J bindings.
  11. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.-bin/lib/log4j-slf4j-impl-2.4..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  12. SLF4J: Found binding in [jar:file:/soft/hbase-1.2./lib/phoenix-4.10.-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  13. SLF4J: Found binding in [jar:file:/soft/hadoop-2.7./share/hadoop/common/lib/slf4j-log4j12-1.7..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  14. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  15. -- :: Starting to launch local task to process map join; maximum memory =
  16. -- :: Dump the side-table for tag: with group count: into file: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30--6b0d3d62c227/hive_2018--09_23--20_931_5011927912909131499-/-local-/HashTable-Stage-/MapJoin-mapfile01--.hashtable
  17. -- :: Uploaded File to: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30--6b0d3d62c227/hive_2018--09_23--20_931_5011927912909131499-/-local-/HashTable-Stage-/MapJoin-mapfile01--.hashtable ( bytes)
  18. -- :: End of local task; Time Taken: 16.147 sec.
  19. Execution completed successfully
  20. MapredLocal task succeeded
  21. Launching Job out of
  22. Number of reduce tasks is set to since there's no reduce operator
  23. Starting Job = job_1533789743141_0027, Tracking URL = http://s101:8088/proxy/application_1533789743141_0027/
  24. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0027
  25. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  26. -- ::, Stage- map = %, reduce = %
  27. -- ::, Stage- map = %, reduce = %, Cumulative CPU 1.82 sec
  28. MapReduce Total cumulative CPU time: seconds msec
  29. Ended Job = job_1533789743141_0027
  30. MapReduce Jobs Launched:
  31. Stage-Stage-: Map: Cumulative CPU: 1.82 sec HDFS Read: HDFS Write: SUCCESS
  32. Total MapReduce CPU Time Spent: seconds msec
  33. OK
  34. e.empno e.ename d.deptno
  35. SMITH
  36. SMITH
  37. ALLEN
  38. ALLEN
  39. WARD
  40. WARD
  41. JONES
  42. JONES
  43. MARTIN
  44. MARTIN
  45. BLAKE
  46. BLAKE
  47. CLARK
  48. CLARK
  49. SCOTT
  50. SCOTT
  51. KING
  52. KING
  53. TURNER
  54. TURNER
  55. ADAMS
  56. ADAMS
  57. JAMES
  58. JAMES
  59. FORD
  60. FORD
  61. MILLER
  62. MILLER
  63. SMITH
  64. SMITH
  65. ALLEN
  66. ALLEN
  67. WARD
  68. WARD
  69. JONES
  70. JONES
  71. MARTIN
  72. MARTIN
  73. BLAKE
  74. BLAKE
  75. CLARK
  76. CLARK
  77. SCOTT
  78. SCOTT
  79. KING
  80. KING
  81. TURNER
  82. TURNER
  83. ADAMS
  84. ADAMS
  85. JAMES
  86. JAMES
  87. FORD
  88. FORD
  89. MILLER
  90. MILLER
  91. Time taken: 113.095 seconds, Fetched: row(s)
  92. hive (yinzhengjie)>

Join语句-表的别名(hive (yinzhengjie)> select e.empno, e.ename, d.deptno from emp e join dept d on e.deptno = d.deptno;)

  1. hive (yinzhengjie)> select e.empno, e.ename, d.deptno from emp e join dept d on e.deptno = d.deptno;
  2. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  3. Query ID = yinzhengjie_20180809234054_a83fd2f0-136f--880a-0a928ecb86f0
  4. Total jobs =
  5. SLF4J: Class path contains multiple SLF4J bindings.
  6. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.-bin/lib/log4j-slf4j-impl-2.4..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  7. SLF4J: Found binding in [jar:file:/soft/hbase-1.2./lib/phoenix-4.10.-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  8. SLF4J: Found binding in [jar:file:/soft/hadoop-2.7./share/hadoop/common/lib/slf4j-log4j12-1.7..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  9. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  10. -- :: Starting to launch local task to process map join; maximum memory =
  11. -- :: Dump the side-table for tag: with group count: into file: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30--6b0d3d62c227/hive_2018--09_23--54_618_7309603760212569588-/-local-/HashTable-Stage-/MapJoin-mapfile21--.hashtable
  12. -- :: Uploaded File to: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30--6b0d3d62c227/hive_2018--09_23--54_618_7309603760212569588-/-local-/HashTable-Stage-/MapJoin-mapfile21--.hashtable ( bytes)
  13. -- :: End of local task; Time Taken: 5.741 sec.
  14. Execution completed successfully
  15. MapredLocal task succeeded
  16. Launching Job out of
  17. Number of reduce tasks is set to since there's no reduce operator
  18. Starting Job = job_1533789743141_0029, Tracking URL = http://s101:8088/proxy/application_1533789743141_0029/
  19. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0029
  20. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  21. -- ::, Stage- map = %, reduce = %
  22. -- ::, Stage- map = %, reduce = %, Cumulative CPU 2.69 sec
  23. MapReduce Total cumulative CPU time: seconds msec
  24. Ended Job = job_1533789743141_0029
  25. MapReduce Jobs Launched:
  26. Stage-Stage-: Map: Cumulative CPU: 2.69 sec HDFS Read: HDFS Write: SUCCESS
  27. Total MapReduce CPU Time Spent: seconds msec
  28. OK
  29. e.empno e.ename d.deptno
  30. SMITH
  31. SMITH
  32. ALLEN
  33. ALLEN
  34. WARD
  35. WARD
  36. JONES
  37. JONES
  38. MARTIN
  39. MARTIN
  40. BLAKE
  41. BLAKE
  42. CLARK
  43. CLARK
  44. SCOTT
  45. SCOTT
  46. KING
  47. KING
  48. TURNER
  49. TURNER
  50. ADAMS
  51. ADAMS
  52. JAMES
  53. JAMES
  54. FORD
  55. FORD
  56. MILLER
  57. MILLER
  58. SMITH
  59. SMITH
  60. ALLEN
  61. ALLEN
  62. WARD
  63. WARD
  64. JONES
  65. JONES
  66. MARTIN
  67. MARTIN
  68. BLAKE
  69. BLAKE
  70. CLARK
  71. CLARK
  72. SCOTT
  73. SCOTT
  74. KING
  75. KING
  76. TURNER
  77. TURNER
  78. ADAMS
  79. ADAMS
  80. JAMES
  81. JAMES
  82. FORD
  83. FORD
  84. MILLER
  85. MILLER
  86. Time taken: 53.142 seconds, Fetched: row(s)
  87. hive (yinzhengjie)>

Join语句-内连接(hive (yinzhengjie)> select e.empno, e.ename, d.deptno from emp e join dept d on e.deptno = d.deptno;)

  1. hive (yinzhengjie)> select e.empno, e.ename, d.deptno from emp e left join dept d on e.deptno = d.deptno;
  2. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  3. Query ID = yinzhengjie_20180809234222_5966f5f0-b54a--ae82-fd47e8655582
  4. Total jobs =
  5. SLF4J: Class path contains multiple SLF4J bindings.
  6. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.-bin/lib/log4j-slf4j-impl-2.4..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  7. SLF4J: Found binding in [jar:file:/soft/hbase-1.2./lib/phoenix-4.10.-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  8. SLF4J: Found binding in [jar:file:/soft/hadoop-2.7./share/hadoop/common/lib/slf4j-log4j12-1.7..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  9. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  10. -- :: Starting to launch local task to process map join; maximum memory =
  11. -- :: Dump the side-table for tag: with group count: into file: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30--6b0d3d62c227/hive_2018--09_23--22_712_6649379300342030940-/-local-/HashTable-Stage-/MapJoin-mapfile31--.hashtable
  12. -- :: Uploaded File to: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30--6b0d3d62c227/hive_2018--09_23--22_712_6649379300342030940-/-local-/HashTable-Stage-/MapJoin-mapfile31--.hashtable ( bytes)
  13. -- :: End of local task; Time Taken: 4.518 sec.
  14. Execution completed successfully
  15. MapredLocal task succeeded
  16. Launching Job out of
  17. Number of reduce tasks is set to since there's no reduce operator
  18. Starting Job = job_1533789743141_0030, Tracking URL = http://s101:8088/proxy/application_1533789743141_0030/
  19. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0030
  20. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  21. -- ::, Stage- map = %, reduce = %
  22. -- ::, Stage- map = %, reduce = %, Cumulative CPU 2.03 sec
  23. MapReduce Total cumulative CPU time: seconds msec
  24. Ended Job = job_1533789743141_0030
  25. MapReduce Jobs Launched:
  26. Stage-Stage-: Map: Cumulative CPU: 2.03 sec HDFS Read: HDFS Write: SUCCESS
  27. Total MapReduce CPU Time Spent: seconds msec
  28. OK
  29. e.empno e.ename d.deptno
  30. SMITH
  31. SMITH
  32. ALLEN
  33. ALLEN
  34. WARD
  35. WARD
  36. JONES
  37. JONES
  38. MARTIN
  39. MARTIN
  40. BLAKE
  41. BLAKE
  42. CLARK
  43. CLARK
  44. SCOTT
  45. SCOTT
  46. KING
  47. KING
  48. TURNER
  49. TURNER
  50. ADAMS
  51. ADAMS
  52. JAMES
  53. JAMES
  54. FORD
  55. FORD
  56. MILLER
  57. MILLER
  58. SMITH
  59. SMITH
  60. ALLEN
  61. ALLEN
  62. WARD
  63. WARD
  64. JONES
  65. JONES
  66. MARTIN
  67. MARTIN
  68. BLAKE
  69. BLAKE
  70. CLARK
  71. CLARK
  72. SCOTT
  73. SCOTT
  74. KING
  75. KING
  76. TURNER
  77. TURNER
  78. ADAMS
  79. ADAMS
  80. JAMES
  81. JAMES
  82. FORD
  83. FORD
  84. MILLER
  85. MILLER
  86. Time taken: 57.477 seconds, Fetched: row(s)
  87. hive (yinzhengjie)>

Join语句-左外连接(hive (yinzhengjie)> select e.empno, e.ename, d.deptno from emp e left join dept d on e.deptno = d.deptno;)

  1. hive (yinzhengjie)> select e.empno, e.ename, d.deptno from emp e right join dept d on e.deptno = d.deptno;
  2. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  3. Query ID = yinzhengjie_20180809234332_c83104d3--4e3d-a2bf-342b5c397b9d
  4. Total jobs =
  5. SLF4J: Class path contains multiple SLF4J bindings.
  6. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.-bin/lib/log4j-slf4j-impl-2.4..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  7. SLF4J: Found binding in [jar:file:/soft/hbase-1.2./lib/phoenix-4.10.-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  8. SLF4J: Found binding in [jar:file:/soft/hadoop-2.7./share/hadoop/common/lib/slf4j-log4j12-1.7..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  9. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  10. -- :: Starting to launch local task to process map join; maximum memory =
  11. -- :: Dump the side-table for tag: with group count: into file: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30--6b0d3d62c227/hive_2018--09_23--32_208_373121853797344697-/-local-/HashTable-Stage-/MapJoin-mapfile40--.hashtable
  12. -- :: Uploaded File to: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30--6b0d3d62c227/hive_2018--09_23--32_208_373121853797344697-/-local-/HashTable-Stage-/MapJoin-mapfile40--.hashtable ( bytes)
  13. -- :: End of local task; Time Taken: 4.69 sec.
  14. Execution completed successfully
  15. MapredLocal task succeeded
  16. Launching Job out of
  17. Number of reduce tasks is set to since there's no reduce operator
  18. Starting Job = job_1533789743141_0031, Tracking URL = http://s101:8088/proxy/application_1533789743141_0031/
  19. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0031
  20. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  21. -- ::, Stage- map = %, reduce = %
  22. -- ::, Stage- map = %, reduce = %, Cumulative CPU 2.38 sec
  23. MapReduce Total cumulative CPU time: seconds msec
  24. Ended Job = job_1533789743141_0031
  25. MapReduce Jobs Launched:
  26. Stage-Stage-: Map: Cumulative CPU: 2.38 sec HDFS Read: HDFS Write: SUCCESS
  27. Total MapReduce CPU Time Spent: seconds msec
  28. OK
  29. e.empno e.ename d.deptno
  30. CLARK
  31. KING
  32. MILLER
  33. CLARK
  34. KING
  35. MILLER
  36. SMITH
  37. JONES
  38. SCOTT
  39. ADAMS
  40. FORD
  41. SMITH
  42. JONES
  43. SCOTT
  44. ADAMS
  45. FORD
  46. ALLEN
  47. WARD
  48. MARTIN
  49. BLAKE
  50. TURNER
  51. JAMES
  52. ALLEN
  53. WARD
  54. MARTIN
  55. BLAKE
  56. TURNER
  57. JAMES
  58. NULL NULL
  59. CLARK
  60. KING
  61. MILLER
  62. CLARK
  63. KING
  64. MILLER
  65. SMITH
  66. JONES
  67. SCOTT
  68. ADAMS
  69. FORD
  70. SMITH
  71. JONES
  72. SCOTT
  73. ADAMS
  74. FORD
  75. ALLEN
  76. WARD
  77. MARTIN
  78. BLAKE
  79. TURNER
  80. JAMES
  81. ALLEN
  82. WARD
  83. MARTIN
  84. BLAKE
  85. TURNER
  86. JAMES
  87. NULL NULL
  88. Time taken: 87.954 seconds, Fetched: row(s)
  89. hive (yinzhengjie)>

Join语句-右外连接(hive (yinzhengjie)> select e.empno, e.ename, d.deptno from emp e right join dept d on e.deptno = d.deptno;)

  1. Join语句-满外连接(hive (yinzhengjie)> select e.empno, e.ename, d.deptno from emp e full join dept d on e.deptno = d.deptno;)
  2. 满外连接:将会返回所有表中符合WHERE语句条件的所有记录。如果任一表的指定字段没有符合条件的值的话,那么就使用NULL值替代。
  3.  
  4. hive (yinzhengjie)> select e.empno, e.ename, d.deptno from emp e full join dept d on e.deptno = d.deptno;
  5. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  6. Query ID = yinzhengjie_20180809235025_e7e97788-2d65-45e0-b567-004f2d7057e0
  7. Total jobs =
  8. Launching Job out of
  9. Number of reduce tasks not specified. Estimated from input data size:
  10. In order to change the average load for a reducer (in bytes):
  11. set hive.exec.reducers.bytes.per.reducer=<number>
  12. In order to limit the maximum number of reducers:
  13. set hive.exec.reducers.max=<number>
  14. In order to set a constant number of reducers:
  15. set mapreduce.job.reduces=<number>
  16. Starting Job = job_1533789743141_0035, Tracking URL = http://s101:8088/proxy/application_1533789743141_0035/
  17. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0035
  18. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  19. -- ::, Stage- map = %, reduce = %
  20. -- ::, Stage- map = %, reduce = %, Cumulative CPU 2.58 sec
  21. -- ::, Stage- map = %, reduce = %, Cumulative CPU 4.88 sec
  22. -- ::, Stage- map = %, reduce = %, Cumulative CPU 7.56 sec
  23. MapReduce Total cumulative CPU time: seconds msec
  24. Ended Job = job_1533789743141_0035
  25. MapReduce Jobs Launched:
  26. Stage-Stage-: Map: Reduce: Cumulative CPU: 7.56 sec HDFS Read: HDFS Write: SUCCESS
  27. Total MapReduce CPU Time Spent: seconds msec
  28. OK
  29. e.empno e.ename d.deptno
  30. MILLER
  31. MILLER
  32. KING
  33. KING
  34. CLARK
  35. CLARK
  36. MILLER
  37. MILLER
  38. KING
  39. KING
  40. CLARK
  41. CLARK
  42. SCOTT
  43. SCOTT
  44. JONES
  45. JONES
  46. JONES
  47. JONES
  48. SMITH
  49. SMITH
  50. FORD
  51. FORD
  52. ADAMS
  53. ADAMS
  54. SCOTT
  55. SCOTT
  56. SMITH
  57. SMITH
  58. FORD
  59. FORD
  60. ADAMS
  61. ADAMS
  62. JAMES
  63. JAMES
  64. TURNER
  65. TURNER
  66. TURNER
  67. TURNER
  68. ALLEN
  69. ALLEN
  70. BLAKE
  71. BLAKE
  72. MARTIN
  73. MARTIN
  74. JAMES
  75. JAMES
  76. WARD
  77. WARD
  78. ALLEN
  79. ALLEN
  80. MARTIN
  81. MARTIN
  82. WARD
  83. WARD
  84. BLAKE
  85. BLAKE
  86. NULL NULL
  87. NULL NULL
  88. Time taken: 63.838 seconds, Fetched: row(s)
  89. hive (yinzhengjie)>

Join语句-满外连接(hive (yinzhengjie)> select e.empno, e.ename, d.deptno from emp e full join dept d on e.deptno = d.deptno;)

  1. Join语句-多表连接查询(hive (yinzhengjie)> SELECT e.ename, d.deptno, l. loc_name FROM emp e JOIN dept d ON d.deptno = e.deptno JOIN location l ON d.loc = l.loc;)
  2.  
  3. 测试文件内容:
  4. [yinzhengjie@s101 ~]$ cat /home/yinzhengjie/download/location.txt
  5. Beijing
  6. London
  7. Tokyo
  8. [yinzhengjie@s101 ~]$
  9.  
  10. 大多数情况下,Hive会对每对JOIN连接对象启动一个MapReduce任务。以下案例中会首先启动一个MapReduce job对表e和表d进行连接操作,
  11. 然后会再启动一个MapReduce job将第一个MapReduce job的输出和表l;进行连接操作。
  12. 温馨提示:为什么不是表d和表l先进行连接操作呢?这是因为Hive总是按照从左到右的顺序执行的。
  13.  
  14. hive (yinzhengjie)> create table if not exists yinzhengjie.location(
  15. > loc int,
  16. > loc_name string
  17. > )
  18. > row format delimited fields terminated by '\t'; #创建location表
  19. OK
  20. Time taken: 0.614 seconds
  21. hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/location.txt' into table yinzhengjie.location; #向表中导入数据
  22. Loading data to table yinzhengjie.location
  23. OK
  24. Time taken: 0.478 seconds
  25. hive (yinzhengjie)>
  26. hive (yinzhengjie)> SELECT e.ename, d.deptno, l. loc_name
  27. > FROM emp e
  28. > JOIN dept d
  29. > ON d.deptno = e.deptno
  30. > JOIN location l
  31. > ON d.loc = l.loc; #多表连接查询
  32. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  33. Query ID = yinzhengjie_20180809235602_7fbd82df--4b76-b5c4-9482d4aa2ccc
  34. Total jobs =
  35. SLF4J: Class path contains multiple SLF4J bindings.
  36. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.-bin/lib/log4j-slf4j-impl-2.4..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  37. SLF4J: Found binding in [jar:file:/soft/hbase-1.2./lib/phoenix-4.10.-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  38. SLF4J: Found binding in [jar:file:/soft/hadoop-2.7./share/hadoop/common/lib/slf4j-log4j12-1.7..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  39. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  40. -- :: Starting to launch local task to process map join; maximum memory =
  41. -- :: Dump the side-table for tag: with group count: into file: file:/home/yinzhengjie/yinzhengjie/85f0ef7d-ce74-41a8-942e-d1798288e72b/hive_2018--09_23--02_428_1537442849954313200-/-local-/HashTable-Stage-/MapJoin-mapfile01--.hashtable
  42. -- :: Uploaded File to: file:/home/yinzhengjie/yinzhengjie/85f0ef7d-ce74-41a8-942e-d1798288e72b/hive_2018--09_23--02_428_1537442849954313200-/-local-/HashTable-Stage-/MapJoin-mapfile01--.hashtable ( bytes)
  43. -- :: Dump the side-table for tag: with group count: into file: file:/home/yinzhengjie/yinzhengjie/85f0ef7d-ce74-41a8-942e-d1798288e72b/hive_2018--09_23--02_428_1537442849954313200-/-local-/HashTable-Stage-/MapJoin-mapfile11--.hashtable
  44. -- :: Uploaded File to: file:/home/yinzhengjie/yinzhengjie/85f0ef7d-ce74-41a8-942e-d1798288e72b/hive_2018--09_23--02_428_1537442849954313200-/-local-/HashTable-Stage-/MapJoin-mapfile11--.hashtable ( bytes)
  45. -- :: End of local task; Time Taken: 3.928 sec.
  46. Execution completed successfully
  47. MapredLocal task succeeded
  48. Launching Job out of
  49. Number of reduce tasks is set to since there's no reduce operator
  50. Starting Job = job_1533789743141_0036, Tracking URL = http://s101:8088/proxy/application_1533789743141_0036/
  51. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0036
  52. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  53. -- ::, Stage- map = %, reduce = %
  54. -- ::, Stage- map = %, reduce = %, Cumulative CPU 2.64 sec
  55. MapReduce Total cumulative CPU time: seconds msec
  56. Ended Job = job_1533789743141_0036
  57. MapReduce Jobs Launched:
  58. Stage-Stage-: Map: Cumulative CPU: 2.64 sec HDFS Read: HDFS Write: SUCCESS
  59. Total MapReduce CPU Time Spent: seconds msec
  60. OK
  61. e.ename d.deptno l.loc_name
  62. SMITH London
  63. ALLEN Tokyo
  64. WARD Tokyo
  65. JONES London
  66. MARTIN Tokyo
  67. BLAKE Tokyo
  68. CLARK Beijing
  69. SCOTT London
  70. KING Beijing
  71. TURNER Tokyo
  72. ADAMS London
  73. JAMES Tokyo
  74. FORD London
  75. MILLER Beijing
  76. SMITH London
  77. ALLEN Tokyo
  78. WARD Tokyo
  79. JONES London
  80. MARTIN Tokyo
  81. BLAKE Tokyo
  82. CLARK Beijing
  83. SCOTT London
  84. KING Beijing
  85. TURNER Tokyo
  86. ADAMS London
  87. JAMES Tokyo
  88. FORD London
  89. MILLER Beijing
  90. Time taken: 56.659 seconds, Fetched: row(s)
  91. hive (yinzhengjie)>

Join语句-多表连接查询(hive (yinzhengjie)> SELECT e.ename, d.deptno, l. loc_name FROM emp e JOIN dept d ON d.deptno = e.deptno JOIN location l ON d.loc = l.loc;)

  1. Join语句-笛卡尔积(hive (yinzhengjie)> select * from emp, dept;)
  2. 笛卡尔集会在下面条件下产生:
  3. >.省略连接条件
  4. >.连接条件无效
  5. >.所有表中的所有行互相连接
  6.  
  7. hive (yinzhengjie)> set hive.mapred.mode=strict;
  8. hive (yinzhengjie)> set hive.mapred.mode;
  9. hive.mapred.mode=strict
  10. hive (yinzhengjie)> select * from emp, dept; #在strict模式执行笛卡尔积操作是失败的
  11. FAILED: SemanticException Cartesian products are disabled for safety reasons. If you know what you are doing, please make sure that hive.strict.checks.cartesian.product is set to false and that hive.mapred.mode is not set to 'strict' to enable them.
  12. hive (yinzhengjie)>
  13. hive (yinzhengjie)> set hive.mapred.mode=nonstrict;
  14. hive (yinzhengjie)> set hive.mapred.mode;
  15. hive.mapred.mode=nonstrict
  16. hive (yinzhengjie)> select empno, deptno from emp, dept;
  17. FAILED: SemanticException Column deptno Found in more than One Tables/Subqueries
  18. hive (yinzhengjie)> select * from emp, dept; #在nonstrict模式执行笛卡尔积操作是可以的,但不推荐使用这样的查询语句,意义不大!
  19. Warning: Map Join MAPJOIN[][bigTable=?] in task 'Stage-3:MAPRED' is a cross product
  20. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  21. Query ID = yinzhengjie_20180810000249_98e28c13-db4d-4e2b-81c6-28e44bf51f1d
  22. Total jobs =
  23. SLF4J: Class path contains multiple SLF4J bindings.
  24. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.-bin/lib/log4j-slf4j-impl-2.4..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  25. SLF4J: Found binding in [jar:file:/soft/hbase-1.2./lib/phoenix-4.10.-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
  26. SLF4J: Found binding in [jar:file:/soft/hadoop-2.7./share/hadoop/common/lib/slf4j-log4j12-1.7..jar!/org/slf4j/impl/StaticLoggerBinder.class]
  27. SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
  28. -- :: Starting to launch local task to process map join; maximum memory =
  29. -- :: Dump the side-table for tag: with group count: into file: file:/home/yinzhengjie/yinzhengjie/85f0ef7d-ce74-41a8-942e-d1798288e72b/hive_2018--10_00--49_246_882868568149391185-/-local-/HashTable-Stage-/MapJoin-mapfile21--.hashtable
  30. -- :: Uploaded File to: file:/home/yinzhengjie/yinzhengjie/85f0ef7d-ce74-41a8-942e-d1798288e72b/hive_2018--10_00--49_246_882868568149391185-/-local-/HashTable-Stage-/MapJoin-mapfile21--.hashtable ( bytes)
  31. -- :: End of local task; Time Taken: 3.916 sec.
  32. Execution completed successfully
  33. MapredLocal task succeeded
  34. Launching Job out of
  35. Number of reduce tasks is set to since there's no reduce operator
  36. Starting Job = job_1533789743141_0037, Tracking URL = http://s101:8088/proxy/application_1533789743141_0037/
  37. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0037
  38. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  39. -- ::, Stage- map = %, reduce = %
  40. -- ::, Stage- map = %, reduce = %, Cumulative CPU 1.8 sec
  41. MapReduce Total cumulative CPU time: seconds msec
  42. Ended Job = job_1533789743141_0037
  43. MapReduce Jobs Launched:
  44. Stage-Stage-: Map: Cumulative CPU: 1.8 sec HDFS Read: HDFS Write: SUCCESS
  45. Total MapReduce CPU Time Spent: seconds msec
  46. OK
  47. emp.empno emp.ename emp.job emp.mgr emp.hiredate emp.sal emp.comm emp.deptno dept.deptno dept.dname dept.loc
  48. SMITH CLERK -- 800.0 NULL ACCOUNTING
  49. SMITH CLERK -- 800.0 NULL RESEARCH
  50. SMITH CLERK -- 800.0 NULL SALES
  51. SMITH CLERK -- 800.0 NULL OPERATIONS
  52. SMITH CLERK -- 800.0 NULL ACCOUNTING
  53. SMITH CLERK -- 800.0 NULL RESEARCH
  54. SMITH CLERK -- 800.0 NULL SALES
  55. SMITH CLERK -- 800.0 NULL OPERATIONS
  56. ALLEN SALESMAN -- 1600.0 300.0 ACCOUNTING
  57. ALLEN SALESMAN -- 1600.0 300.0 RESEARCH
  58. ALLEN SALESMAN -- 1600.0 300.0 SALES
  59. ALLEN SALESMAN -- 1600.0 300.0 OPERATIONS
  60. ALLEN SALESMAN -- 1600.0 300.0 ACCOUNTING
  61. ALLEN SALESMAN -- 1600.0 300.0 RESEARCH
  62. ALLEN SALESMAN -- 1600.0 300.0 SALES
  63. ALLEN SALESMAN -- 1600.0 300.0 OPERATIONS
  64. WARD SALESMAN -- 1250.0 500.0 ACCOUNTING
  65. WARD SALESMAN -- 1250.0 500.0 RESEARCH
  66. WARD SALESMAN -- 1250.0 500.0 SALES
  67. WARD SALESMAN -- 1250.0 500.0 OPERATIONS
  68. WARD SALESMAN -- 1250.0 500.0 ACCOUNTING
  69. WARD SALESMAN -- 1250.0 500.0 RESEARCH
  70. WARD SALESMAN -- 1250.0 500.0 SALES
  71. WARD SALESMAN -- 1250.0 500.0 OPERATIONS
  72. JONES MANAGER -- 2975.0 NULL ACCOUNTING
  73. JONES MANAGER -- 2975.0 NULL RESEARCH
  74. JONES MANAGER -- 2975.0 NULL SALES
  75. JONES MANAGER -- 2975.0 NULL OPERATIONS
  76. JONES MANAGER -- 2975.0 NULL ACCOUNTING
  77. JONES MANAGER -- 2975.0 NULL RESEARCH
  78. JONES MANAGER -- 2975.0 NULL SALES
  79. JONES MANAGER -- 2975.0 NULL OPERATIONS
  80. MARTIN SALESMAN -- 1250.0 1400.0 ACCOUNTING
  81. MARTIN SALESMAN -- 1250.0 1400.0 RESEARCH
  82. MARTIN SALESMAN -- 1250.0 1400.0 SALES
  83. MARTIN SALESMAN -- 1250.0 1400.0 OPERATIONS
  84. MARTIN SALESMAN -- 1250.0 1400.0 ACCOUNTING
  85. MARTIN SALESMAN -- 1250.0 1400.0 RESEARCH
  86. MARTIN SALESMAN -- 1250.0 1400.0 SALES
  87. MARTIN SALESMAN -- 1250.0 1400.0 OPERATIONS
  88. BLAKE MANAGER -- 2850.0 NULL ACCOUNTING
  89. BLAKE MANAGER -- 2850.0 NULL RESEARCH
  90. BLAKE MANAGER -- 2850.0 NULL SALES
  91. BLAKE MANAGER -- 2850.0 NULL OPERATIONS
  92. BLAKE MANAGER -- 2850.0 NULL ACCOUNTING
  93. BLAKE MANAGER -- 2850.0 NULL RESEARCH
  94. BLAKE MANAGER -- 2850.0 NULL SALES
  95. BLAKE MANAGER -- 2850.0 NULL OPERATIONS
  96. CLARK MANAGER -- 2450.0 NULL ACCOUNTING
  97. CLARK MANAGER -- 2450.0 NULL RESEARCH
  98. CLARK MANAGER -- 2450.0 NULL SALES
  99. CLARK MANAGER -- 2450.0 NULL OPERATIONS
  100. CLARK MANAGER -- 2450.0 NULL ACCOUNTING
  101. CLARK MANAGER -- 2450.0 NULL RESEARCH
  102. CLARK MANAGER -- 2450.0 NULL SALES
  103. CLARK MANAGER -- 2450.0 NULL OPERATIONS
  104. SCOTT ANALYST -- 3000.0 NULL ACCOUNTING
  105. SCOTT ANALYST -- 3000.0 NULL RESEARCH
  106. SCOTT ANALYST -- 3000.0 NULL SALES
  107. SCOTT ANALYST -- 3000.0 NULL OPERATIONS
  108. SCOTT ANALYST -- 3000.0 NULL ACCOUNTING
  109. SCOTT ANALYST -- 3000.0 NULL RESEARCH
  110. SCOTT ANALYST -- 3000.0 NULL SALES
  111. SCOTT ANALYST -- 3000.0 NULL OPERATIONS
  112. KING PRESIDENT NULL -- 5000.0 NULL ACCOUNTING
  113. KING PRESIDENT NULL -- 5000.0 NULL RESEARCH
  114. KING PRESIDENT NULL -- 5000.0 NULL SALES
  115. KING PRESIDENT NULL -- 5000.0 NULL OPERATIONS
  116. KING PRESIDENT NULL -- 5000.0 NULL ACCOUNTING
  117. KING PRESIDENT NULL -- 5000.0 NULL RESEARCH
  118. KING PRESIDENT NULL -- 5000.0 NULL SALES
  119. KING PRESIDENT NULL -- 5000.0 NULL OPERATIONS
  120. TURNER SALESMAN -- 1500.0 0.0 ACCOUNTING
  121. TURNER SALESMAN -- 1500.0 0.0 RESEARCH
  122. TURNER SALESMAN -- 1500.0 0.0 SALES
  123. TURNER SALESMAN -- 1500.0 0.0 OPERATIONS
  124. TURNER SALESMAN -- 1500.0 0.0 ACCOUNTING
  125. TURNER SALESMAN -- 1500.0 0.0 RESEARCH
  126. TURNER SALESMAN -- 1500.0 0.0 SALES
  127. TURNER SALESMAN -- 1500.0 0.0 OPERATIONS
  128. ADAMS CLERK -- 1100.0 NULL ACCOUNTING
  129. ADAMS CLERK -- 1100.0 NULL RESEARCH
  130. ADAMS CLERK -- 1100.0 NULL SALES
  131. ADAMS CLERK -- 1100.0 NULL OPERATIONS
  132. ADAMS CLERK -- 1100.0 NULL ACCOUNTING
  133. ADAMS CLERK -- 1100.0 NULL RESEARCH
  134. ADAMS CLERK -- 1100.0 NULL SALES
  135. ADAMS CLERK -- 1100.0 NULL OPERATIONS
  136. JAMES CLERK -- 950.0 NULL ACCOUNTING
  137. JAMES CLERK -- 950.0 NULL RESEARCH
  138. JAMES CLERK -- 950.0 NULL SALES
  139. JAMES CLERK -- 950.0 NULL OPERATIONS
  140. JAMES CLERK -- 950.0 NULL ACCOUNTING
  141. JAMES CLERK -- 950.0 NULL RESEARCH
  142. JAMES CLERK -- 950.0 NULL SALES
  143. JAMES CLERK -- 950.0 NULL OPERATIONS
  144. FORD ANALYST -- 3000.0 NULL ACCOUNTING
  145. FORD ANALYST -- 3000.0 NULL RESEARCH
  146. FORD ANALYST -- 3000.0 NULL SALES
  147. FORD ANALYST -- 3000.0 NULL OPERATIONS
  148. FORD ANALYST -- 3000.0 NULL ACCOUNTING
  149. FORD ANALYST -- 3000.0 NULL RESEARCH
  150. FORD ANALYST -- 3000.0 NULL SALES
  151. FORD ANALYST -- 3000.0 NULL OPERATIONS
  152. MILLER CLERK -- 1300.0 NULL ACCOUNTING
  153. MILLER CLERK -- 1300.0 NULL RESEARCH
  154. MILLER CLERK -- 1300.0 NULL SALES
  155. MILLER CLERK -- 1300.0 NULL OPERATIONS
  156. MILLER CLERK -- 1300.0 NULL ACCOUNTING
  157. MILLER CLERK -- 1300.0 NULL RESEARCH
  158. MILLER CLERK -- 1300.0 NULL SALES
  159. MILLER CLERK -- 1300.0 NULL OPERATIONS
  160. SMITH CLERK -- 800.0 NULL ACCOUNTING
  161. SMITH CLERK -- 800.0 NULL RESEARCH
  162. SMITH CLERK -- 800.0 NULL SALES
  163. SMITH CLERK -- 800.0 NULL OPERATIONS
  164. SMITH CLERK -- 800.0 NULL ACCOUNTING
  165. SMITH CLERK -- 800.0 NULL RESEARCH
  166. SMITH CLERK -- 800.0 NULL SALES
  167. SMITH CLERK -- 800.0 NULL OPERATIONS
  168. ALLEN SALESMAN -- 1600.0 300.0 ACCOUNTING
  169. ALLEN SALESMAN -- 1600.0 300.0 RESEARCH
  170. ALLEN SALESMAN -- 1600.0 300.0 SALES
  171. ALLEN SALESMAN -- 1600.0 300.0 OPERATIONS
  172. ALLEN SALESMAN -- 1600.0 300.0 ACCOUNTING
  173. ALLEN SALESMAN -- 1600.0 300.0 RESEARCH
  174. ALLEN SALESMAN -- 1600.0 300.0 SALES
  175. ALLEN SALESMAN -- 1600.0 300.0 OPERATIONS
  176. WARD SALESMAN -- 1250.0 500.0 ACCOUNTING
  177. WARD SALESMAN -- 1250.0 500.0 RESEARCH
  178. WARD SALESMAN -- 1250.0 500.0 SALES
  179. WARD SALESMAN -- 1250.0 500.0 OPERATIONS
  180. WARD SALESMAN -- 1250.0 500.0 ACCOUNTING
  181. WARD SALESMAN -- 1250.0 500.0 RESEARCH
  182. WARD SALESMAN -- 1250.0 500.0 SALES
  183. WARD SALESMAN -- 1250.0 500.0 OPERATIONS
  184. JONES MANAGER -- 2975.0 NULL ACCOUNTING
  185. JONES MANAGER -- 2975.0 NULL RESEARCH
  186. JONES MANAGER -- 2975.0 NULL SALES
  187. JONES MANAGER -- 2975.0 NULL OPERATIONS
  188. JONES MANAGER -- 2975.0 NULL ACCOUNTING
  189. JONES MANAGER -- 2975.0 NULL RESEARCH
  190. JONES MANAGER -- 2975.0 NULL SALES
  191. JONES MANAGER -- 2975.0 NULL OPERATIONS
  192. MARTIN SALESMAN -- 1250.0 1400.0 ACCOUNTING
  193. MARTIN SALESMAN -- 1250.0 1400.0 RESEARCH
  194. MARTIN SALESMAN -- 1250.0 1400.0 SALES
  195. MARTIN SALESMAN -- 1250.0 1400.0 OPERATIONS
  196. MARTIN SALESMAN -- 1250.0 1400.0 ACCOUNTING
  197. MARTIN SALESMAN -- 1250.0 1400.0 RESEARCH
  198. MARTIN SALESMAN -- 1250.0 1400.0 SALES
  199. MARTIN SALESMAN -- 1250.0 1400.0 OPERATIONS
  200. BLAKE MANAGER -- 2850.0 NULL ACCOUNTING
  201. BLAKE MANAGER -- 2850.0 NULL RESEARCH
  202. BLAKE MANAGER -- 2850.0 NULL SALES
  203. BLAKE MANAGER -- 2850.0 NULL OPERATIONS
  204. BLAKE MANAGER -- 2850.0 NULL ACCOUNTING
  205. BLAKE MANAGER -- 2850.0 NULL RESEARCH
  206. BLAKE MANAGER -- 2850.0 NULL SALES
  207. BLAKE MANAGER -- 2850.0 NULL OPERATIONS
  208. CLARK MANAGER -- 2450.0 NULL ACCOUNTING
  209. CLARK MANAGER -- 2450.0 NULL RESEARCH
  210. CLARK MANAGER -- 2450.0 NULL SALES
  211. CLARK MANAGER -- 2450.0 NULL OPERATIONS
  212. CLARK MANAGER -- 2450.0 NULL ACCOUNTING
  213. CLARK MANAGER -- 2450.0 NULL RESEARCH
  214. CLARK MANAGER -- 2450.0 NULL SALES
  215. CLARK MANAGER -- 2450.0 NULL OPERATIONS
  216. SCOTT ANALYST -- 3000.0 NULL ACCOUNTING
  217. SCOTT ANALYST -- 3000.0 NULL RESEARCH
  218. SCOTT ANALYST -- 3000.0 NULL SALES
  219. SCOTT ANALYST -- 3000.0 NULL OPERATIONS
  220. SCOTT ANALYST -- 3000.0 NULL ACCOUNTING
  221. SCOTT ANALYST -- 3000.0 NULL RESEARCH
  222. SCOTT ANALYST -- 3000.0 NULL SALES
  223. SCOTT ANALYST -- 3000.0 NULL OPERATIONS
  224. KING PRESIDENT NULL -- 5000.0 NULL ACCOUNTING
  225. KING PRESIDENT NULL -- 5000.0 NULL RESEARCH
  226. KING PRESIDENT NULL -- 5000.0 NULL SALES
  227. KING PRESIDENT NULL -- 5000.0 NULL OPERATIONS
  228. KING PRESIDENT NULL -- 5000.0 NULL ACCOUNTING
  229. KING PRESIDENT NULL -- 5000.0 NULL RESEARCH
  230. KING PRESIDENT NULL -- 5000.0 NULL SALES
  231. KING PRESIDENT NULL -- 5000.0 NULL OPERATIONS
  232. TURNER SALESMAN -- 1500.0 0.0 ACCOUNTING
  233. TURNER SALESMAN -- 1500.0 0.0 RESEARCH
  234. TURNER SALESMAN -- 1500.0 0.0 SALES
  235. TURNER SALESMAN -- 1500.0 0.0 OPERATIONS
  236. TURNER SALESMAN -- 1500.0 0.0 ACCOUNTING
  237. TURNER SALESMAN -- 1500.0 0.0 RESEARCH
  238. TURNER SALESMAN -- 1500.0 0.0 SALES
  239. TURNER SALESMAN -- 1500.0 0.0 OPERATIONS
  240. ADAMS CLERK -- 1100.0 NULL ACCOUNTING
  241. ADAMS CLERK -- 1100.0 NULL RESEARCH
  242. ADAMS CLERK -- 1100.0 NULL SALES
  243. ADAMS CLERK -- 1100.0 NULL OPERATIONS
  244. ADAMS CLERK -- 1100.0 NULL ACCOUNTING
  245. ADAMS CLERK -- 1100.0 NULL RESEARCH
  246. ADAMS CLERK -- 1100.0 NULL SALES
  247. ADAMS CLERK -- 1100.0 NULL OPERATIONS
  248. JAMES CLERK -- 950.0 NULL ACCOUNTING
  249. JAMES CLERK -- 950.0 NULL RESEARCH
  250. JAMES CLERK -- 950.0 NULL SALES
  251. JAMES CLERK -- 950.0 NULL OPERATIONS
  252. JAMES CLERK -- 950.0 NULL ACCOUNTING
  253. JAMES CLERK -- 950.0 NULL RESEARCH
  254. JAMES CLERK -- 950.0 NULL SALES
  255. JAMES CLERK -- 950.0 NULL OPERATIONS
  256. FORD ANALYST -- 3000.0 NULL ACCOUNTING
  257. FORD ANALYST -- 3000.0 NULL RESEARCH
  258. FORD ANALYST -- 3000.0 NULL SALES
  259. FORD ANALYST -- 3000.0 NULL OPERATIONS
  260. FORD ANALYST -- 3000.0 NULL ACCOUNTING
  261. FORD ANALYST -- 3000.0 NULL RESEARCH
  262. FORD ANALYST -- 3000.0 NULL SALES
  263. FORD ANALYST -- 3000.0 NULL OPERATIONS
  264. MILLER CLERK -- 1300.0 NULL ACCOUNTING
  265. MILLER CLERK -- 1300.0 NULL RESEARCH
  266. MILLER CLERK -- 1300.0 NULL SALES
  267. MILLER CLERK -- 1300.0 NULL OPERATIONS
  268. MILLER CLERK -- 1300.0 NULL ACCOUNTING
  269. MILLER CLERK -- 1300.0 NULL RESEARCH
  270. MILLER CLERK -- 1300.0 NULL SALES
  271. MILLER CLERK -- 1300.0 NULL OPERATIONS
  272. Time taken: 52.698 seconds, Fetched: row(s)
  273. hive (yinzhengjie)>

Join语句-笛卡尔积,不推荐使用,我们应该避免笛卡尔积的查询,因为在实际生产环境中使用笛卡尔积查询对hadoop的集群是压力是很大的,如果集群配置低的话很可能让整个集群崩掉!!!(hive (yinzhengjie)> select * from emp, dept;)

  1. 排序-全局排序(hive (yinzhengjie)> select * from emp order by sal desc;)
  2. Order By:全局排序,一个MapReduce
  3. >.使用 ORDER BY 子句排序
  4. ASCascend): 升序(默认)
  5. DESCdescend): 降序
  6. >.ORDER BY 子句在SELECT语句的结尾。
  7.  
  8. hive (yinzhengjie)> select * from emp order by sal; #查询员工信息按工资升序排列,默认就是升序排列
  9. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  10. Query ID = yinzhengjie_20180810001838_6c529433-c84b-447d-89e0-16af47dc89eb
  11. Total jobs =
  12. Launching Job out of
  13. Number of reduce tasks determined at compile time:
  14. In order to change the average load for a reducer (in bytes):
  15. set hive.exec.reducers.bytes.per.reducer=<number>
  16. In order to limit the maximum number of reducers:
  17. set hive.exec.reducers.max=<number>
  18. In order to set a constant number of reducers:
  19. set mapreduce.job.reduces=<number>
  20. Starting Job = job_1533789743141_0039, Tracking URL = http://s101:8088/proxy/application_1533789743141_0039/
  21. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0039
  22. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  23. -- ::, Stage- map = %, reduce = %
  24. -- ::, Stage- map = %, reduce = %, Cumulative CPU 2.66 sec
  25. -- ::, Stage- map = %, reduce = %, Cumulative CPU 4.41 sec
  26. MapReduce Total cumulative CPU time: seconds msec
  27. Ended Job = job_1533789743141_0039
  28. MapReduce Jobs Launched:
  29. Stage-Stage-: Map: Reduce: Cumulative CPU: 4.41 sec HDFS Read: HDFS Write: SUCCESS
  30. Total MapReduce CPU Time Spent: seconds msec
  31. OK
  32. emp.empno emp.ename emp.job emp.mgr emp.hiredate emp.sal emp.comm emp.deptno
  33. SMITH CLERK -- 800.0 NULL
  34. SMITH CLERK -- 800.0 NULL
  35. JAMES CLERK -- 950.0 NULL
  36. JAMES CLERK -- 950.0 NULL
  37. ADAMS CLERK -- 1100.0 NULL
  38. ADAMS CLERK -- 1100.0 NULL
  39. WARD SALESMAN -- 1250.0 500.0
  40. WARD SALESMAN -- 1250.0 500.0
  41. MARTIN SALESMAN -- 1250.0 1400.0
  42. MARTIN SALESMAN -- 1250.0 1400.0
  43. MILLER CLERK -- 1300.0 NULL
  44. MILLER CLERK -- 1300.0 NULL
  45. TURNER SALESMAN -- 1500.0 0.0
  46. TURNER SALESMAN -- 1500.0 0.0
  47. ALLEN SALESMAN -- 1600.0 300.0
  48. ALLEN SALESMAN -- 1600.0 300.0
  49. CLARK MANAGER -- 2450.0 NULL
  50. CLARK MANAGER -- 2450.0 NULL
  51. BLAKE MANAGER -- 2850.0 NULL
  52. BLAKE MANAGER -- 2850.0 NULL
  53. JONES MANAGER -- 2975.0 NULL
  54. JONES MANAGER -- 2975.0 NULL
  55. SCOTT ANALYST -- 3000.0 NULL
  56. SCOTT ANALYST -- 3000.0 NULL
  57. FORD ANALYST -- 3000.0 NULL
  58. FORD ANALYST -- 3000.0 NULL
  59. KING PRESIDENT NULL -- 5000.0 NULL
  60. KING PRESIDENT NULL -- 5000.0 NULL
  61. Time taken: 82.564 seconds, Fetched: row(s)
  62. hive (yinzhengjie)>
  63. hive (yinzhengjie)> select * from emp order by sal desc; #查询员工信息按工资降序排列
  64. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  65. Query ID = yinzhengjie_20180810002012_ebf1251c-c92b--bea7-bb8a2c34ebdb
  66. Total jobs =
  67. Launching Job out of
  68. Number of reduce tasks determined at compile time:
  69. In order to change the average load for a reducer (in bytes):
  70. set hive.exec.reducers.bytes.per.reducer=<number>
  71. In order to limit the maximum number of reducers:
  72. set hive.exec.reducers.max=<number>
  73. In order to set a constant number of reducers:
  74. set mapreduce.job.reduces=<number>
  75. Starting Job = job_1533789743141_0040, Tracking URL = http://s101:8088/proxy/application_1533789743141_0040/
  76. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0040
  77. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  78. -- ::, Stage- map = %, reduce = %
  79. -- ::, Stage- map = %, reduce = %, Cumulative CPU 2.47 sec
  80. -- ::, Stage- map = %, reduce = %, Cumulative CPU 5.31 sec
  81. MapReduce Total cumulative CPU time: seconds msec
  82. Ended Job = job_1533789743141_0040
  83. MapReduce Jobs Launched:
  84. Stage-Stage-: Map: Reduce: Cumulative CPU: 5.31 sec HDFS Read: HDFS Write: SUCCESS
  85. Total MapReduce CPU Time Spent: seconds msec
  86. OK
  87. emp.empno emp.ename emp.job emp.mgr emp.hiredate emp.sal emp.comm emp.deptno
  88. KING PRESIDENT NULL -- 5000.0 NULL
  89. KING PRESIDENT NULL -- 5000.0 NULL
  90. FORD ANALYST -- 3000.0 NULL
  91. SCOTT ANALYST -- 3000.0 NULL
  92. SCOTT ANALYST -- 3000.0 NULL
  93. FORD ANALYST -- 3000.0 NULL
  94. JONES MANAGER -- 2975.0 NULL
  95. JONES MANAGER -- 2975.0 NULL
  96. BLAKE MANAGER -- 2850.0 NULL
  97. BLAKE MANAGER -- 2850.0 NULL
  98. CLARK MANAGER -- 2450.0 NULL
  99. CLARK MANAGER -- 2450.0 NULL
  100. ALLEN SALESMAN -- 1600.0 300.0
  101. ALLEN SALESMAN -- 1600.0 300.0
  102. TURNER SALESMAN -- 1500.0 0.0
  103. TURNER SALESMAN -- 1500.0 0.0
  104. MILLER CLERK -- 1300.0 NULL
  105. MILLER CLERK -- 1300.0 NULL
  106. WARD SALESMAN -- 1250.0 500.0
  107. MARTIN SALESMAN -- 1250.0 1400.0
  108. MARTIN SALESMAN -- 1250.0 1400.0
  109. WARD SALESMAN -- 1250.0 500.0
  110. ADAMS CLERK -- 1100.0 NULL
  111. ADAMS CLERK -- 1100.0 NULL
  112. JAMES CLERK -- 950.0 NULL
  113. JAMES CLERK -- 950.0 NULL
  114. SMITH CLERK -- 800.0 NULL
  115. SMITH CLERK -- 800.0 NULL
  116. Time taken: 51.103 seconds, Fetched: row(s)
  117. hive (yinzhengjie)>

排序-全局排序(hive (yinzhengjie)> select * from emp order by sal desc;)

  1. 排序-按照别名排序(hive (yinzhengjie)> select ename, sal* twosal from emp order by twosal;)
  2.  
  3. hive (yinzhengjie)> select ename, sal* twosal from emp order by twosal; #按照员工薪水的2倍排序
  4. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  5. Query ID = yinzhengjie_20180810002258_b9f73ab7-2a29-459a-9b27-119eb56f1dde
  6. Total jobs =
  7. Launching Job out of
  8. Number of reduce tasks determined at compile time:
  9. In order to change the average load for a reducer (in bytes):
  10. set hive.exec.reducers.bytes.per.reducer=<number>
  11. In order to limit the maximum number of reducers:
  12. set hive.exec.reducers.max=<number>
  13. In order to set a constant number of reducers:
  14. set mapreduce.job.reduces=<number>
  15. Starting Job = job_1533789743141_0041, Tracking URL = http://s101:8088/proxy/application_1533789743141_0041/
  16. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0041
  17. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  18. -- ::, Stage- map = %, reduce = %
  19. -- ::, Stage- map = %, reduce = %, Cumulative CPU 2.6 sec
  20. -- ::, Stage- map = %, reduce = %, Cumulative CPU 4.99 sec
  21. MapReduce Total cumulative CPU time: seconds msec
  22. Ended Job = job_1533789743141_0041
  23. MapReduce Jobs Launched:
  24. Stage-Stage-: Map: Reduce: Cumulative CPU: 4.99 sec HDFS Read: HDFS Write: SUCCESS
  25. Total MapReduce CPU Time Spent: seconds msec
  26. OK
  27. ename twosal
  28. SMITH 1600.0
  29. SMITH 1600.0
  30. JAMES 1900.0
  31. JAMES 1900.0
  32. ADAMS 2200.0
  33. ADAMS 2200.0
  34. WARD 2500.0
  35. WARD 2500.0
  36. MARTIN 2500.0
  37. MARTIN 2500.0
  38. MILLER 2600.0
  39. MILLER 2600.0
  40. TURNER 3000.0
  41. TURNER 3000.0
  42. ALLEN 3200.0
  43. ALLEN 3200.0
  44. CLARK 4900.0
  45. CLARK 4900.0
  46. BLAKE 5700.0
  47. BLAKE 5700.0
  48. JONES 5950.0
  49. JONES 5950.0
  50. SCOTT 6000.0
  51. SCOTT 6000.0
  52. FORD 6000.0
  53. FORD 6000.0
  54. KING 10000.0
  55. KING 10000.0
  56. Time taken: 44.517 seconds, Fetched: row(s)
  57. hive (yinzhengjie)>

排序-按照别名排序(hive (yinzhengjie)> select ename, sal*2 twosal from emp order by twosal;)

  1. 排序-多个列排序(hive (yinzhengjie)> select ename, deptno, sal from emp order by deptno, sal ;)
  2.  
  3. hive (yinzhengjie)> select ename, deptno, sal from emp order by deptno, sal ; #按照部门和工资升序排序
  4. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  5. Query ID = yinzhengjie_20180810002405_c29a1508--4d7c-9b50-e2fc04c8bdbc
  6. Total jobs =
  7. Launching Job out of
  8. Number of reduce tasks determined at compile time:
  9. In order to change the average load for a reducer (in bytes):
  10. set hive.exec.reducers.bytes.per.reducer=<number>
  11. In order to limit the maximum number of reducers:
  12. set hive.exec.reducers.max=<number>
  13. In order to set a constant number of reducers:
  14. set mapreduce.job.reduces=<number>
  15. Starting Job = job_1533789743141_0042, Tracking URL = http://s101:8088/proxy/application_1533789743141_0042/
  16. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0042
  17. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  18. -- ::, Stage- map = %, reduce = %
  19. -- ::, Stage- map = %, reduce = %, Cumulative CPU 1.77 sec
  20. -- ::, Stage- map = %, reduce = %, Cumulative CPU 3.85 sec
  21. MapReduce Total cumulative CPU time: seconds msec
  22. Ended Job = job_1533789743141_0042
  23. MapReduce Jobs Launched:
  24. Stage-Stage-: Map: Reduce: Cumulative CPU: 3.85 sec HDFS Read: HDFS Write: SUCCESS
  25. Total MapReduce CPU Time Spent: seconds msec
  26. OK
  27. ename deptno sal
  28. MILLER 1300.0
  29. MILLER 1300.0
  30. CLARK 2450.0
  31. CLARK 2450.0
  32. KING 5000.0
  33. KING 5000.0
  34. SMITH 800.0
  35. SMITH 800.0
  36. ADAMS 1100.0
  37. ADAMS 1100.0
  38. JONES 2975.0
  39. JONES 2975.0
  40. FORD 3000.0
  41. SCOTT 3000.0
  42. FORD 3000.0
  43. SCOTT 3000.0
  44. JAMES 950.0
  45. JAMES 950.0
  46. WARD 1250.0
  47. MARTIN 1250.0
  48. MARTIN 1250.0
  49. WARD 1250.0
  50. TURNER 1500.0
  51. TURNER 1500.0
  52. ALLEN 1600.0
  53. ALLEN 1600.0
  54. BLAKE 2850.0
  55. BLAKE 2850.0
  56. Time taken: 39.975 seconds, Fetched: row(s)
  57. hive (yinzhengjie)>

排序-多个列排序(hive (yinzhengjie)> select ename, deptno, sal from emp order by deptno, sal ;)

  1. 排序-每个MapReduce内部排序(hive (yinzhengjie)> insert overwrite local directory '/home/yinzhengjie/download/sortby-result' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' select * from emp sort by deptno desc;)
  2.  
  3. hive (yinzhengjie)> set mapreduce.job.reduces=; #设置reduce个数
  4. hive (yinzhengjie)> set mapreduce.job.reduces; #查看设置reduce个数
  5. mapreduce.job.reduces=
  6. hive (yinzhengjie)>
  7. hive (yinzhengjie)> select * from emp sort by empno desc; #根据部门编号降序查看员工信息
  8. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  9. Query ID = yinzhengjie_20180810002752_cd4d7e0d-be26---9379c1632a3a
  10. Total jobs =
  11. Launching Job out of
  12. Number of reduce tasks not specified. Defaulting to jobconf value of:
  13. In order to change the average load for a reducer (in bytes):
  14. set hive.exec.reducers.bytes.per.reducer=<number>
  15. In order to limit the maximum number of reducers:
  16. set hive.exec.reducers.max=<number>
  17. In order to set a constant number of reducers:
  18. set mapreduce.job.reduces=<number>
  19. Starting Job = job_1533789743141_0043, Tracking URL = http://s101:8088/proxy/application_1533789743141_0043/
  20. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0043
  21. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  22. -- ::, Stage- map = %, reduce = %
  23. -- ::, Stage- map = %, reduce = %, Cumulative CPU 2.02 sec
  24. -- ::, Stage- map = %, reduce = %, Cumulative CPU 2.45 sec
  25. -- ::, Stage- map = %, reduce = %, Cumulative CPU 4.76 sec
  26. -- ::, Stage- map = %, reduce = %, Cumulative CPU 7.48 sec
  27. -- ::, Stage- map = %, reduce = %, Cumulative CPU 10.02 sec
  28. -- ::, Stage- map = %, reduce = %, Cumulative CPU 10.69 sec
  29. MapReduce Total cumulative CPU time: seconds msec
  30. Ended Job = job_1533789743141_0043
  31. MapReduce Jobs Launched:
  32. Stage-Stage-: Map: Reduce: Cumulative CPU: 10.69 sec HDFS Read: HDFS Write: SUCCESS
  33. Total MapReduce CPU Time Spent: seconds msec
  34. OK
  35. emp.empno emp.ename emp.job emp.mgr emp.hiredate emp.sal emp.comm emp.deptno
  36. ADAMS CLERK -- 1100.0 NULL
  37. TURNER SALESMAN -- 1500.0 0.0
  38. TURNER SALESMAN -- 1500.0 0.0
  39. KING PRESIDENT NULL -- 5000.0 NULL
  40. SCOTT ANALYST -- 3000.0 NULL
  41. SCOTT ANALYST -- 3000.0 NULL
  42. CLARK MANAGER -- 2450.0 NULL
  43. BLAKE MANAGER -- 2850.0 NULL
  44. MARTIN SALESMAN -- 1250.0 1400.0
  45. MARTIN SALESMAN -- 1250.0 1400.0
  46. JONES MANAGER -- 2975.0 NULL
  47. SMITH CLERK -- 800.0 NULL
  48. MILLER CLERK -- 1300.0 NULL
  49. FORD ANALYST -- 3000.0 NULL
  50. JAMES CLERK -- 950.0 NULL
  51. JAMES CLERK -- 950.0 NULL
  52. ADAMS CLERK -- 1100.0 NULL
  53. BLAKE MANAGER -- 2850.0 NULL
  54. JONES MANAGER -- 2975.0 NULL
  55. WARD SALESMAN -- 1250.0 500.0
  56. WARD SALESMAN -- 1250.0 500.0
  57. ALLEN SALESMAN -- 1600.0 300.0
  58. MILLER CLERK -- 1300.0 NULL
  59. FORD ANALYST -- 3000.0 NULL
  60. KING PRESIDENT NULL -- 5000.0 NULL
  61. CLARK MANAGER -- 2450.0 NULL
  62. ALLEN SALESMAN -- 1600.0 300.0
  63. SMITH CLERK -- 800.0 NULL
  64. Time taken: 67.599 seconds, Fetched: row(s)
  65. hive (yinzhengjie)>
  66. hive (yinzhengjie)> insert overwrite local directory '/home/yinzhengjie/download/sortby-result' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' select * from emp sort by deptno desc; #将查询结果导入到文件中(按照部门编号降序排序)
  67. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  68. Query ID = yinzhengjie_20180810003404_42a220b7-02c7-42ae-bf8a-566c6300f4c3
  69. Total jobs =
  70. Launching Job out of
  71. Number of reduce tasks not specified. Defaulting to jobconf value of:
  72. In order to change the average load for a reducer (in bytes):
  73. set hive.exec.reducers.bytes.per.reducer=<number>
  74. In order to limit the maximum number of reducers:
  75. set hive.exec.reducers.max=<number>
  76. In order to set a constant number of reducers:
  77. set mapreduce.job.reduces=<number>
  78. Starting Job = job_1533789743141_0045, Tracking URL = http://s101:8088/proxy/application_1533789743141_0045/
  79. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0045
  80. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  81. -- ::, Stage- map = %, reduce = %
  82. -- ::, Stage- map = %, reduce = %, Cumulative CPU 1.22 sec
  83. -- ::, Stage- map = %, reduce = %, Cumulative CPU 3.35 sec
  84. -- ::, Stage- map = %, reduce = %, Cumulative CPU 5.71 sec
  85. -- ::, Stage- map = %, reduce = %, Cumulative CPU 7.57 sec
  86. MapReduce Total cumulative CPU time: seconds msec
  87. Ended Job = job_1533789743141_0045
  88. Moving data to local directory /home/yinzhengjie/download/sortby-result
  89. MapReduce Jobs Launched:
  90. Stage-Stage-: Map: Reduce: Cumulative CPU: 7.57 sec HDFS Read: HDFS Write: SUCCESS
  91. Total MapReduce CPU Time Spent: seconds msec
  92. OK
  93. emp.empno emp.ename emp.job emp.mgr emp.hiredate emp.sal emp.comm emp.deptno
  94. Time taken: 62.425 seconds
  95. hive (yinzhengjie)>

排序-每个MapReduce内部排序(hive (yinzhengjie)> insert overwrite local directory '/home/yinzhengjie/download/sortby-result' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' select * from emp sort by deptno desc;)

  1. 排序-分区排序(hive (yinzhengjie)> insert overwrite local directory '/home/yinzhengjie/download/sortby-result' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' select * from emp distribute by deptno sort by empno desc;)
  2. Distribute By:类似MRpartition,进行分区,结合sort by使用。
  3. 温馨提示,Hive要求DISTRIBUTE BY语句要写在SORT BY语句之前。对于distribute by进行测试,一定要分配多reduce进行处理,否则无法看到distribute by的效果。
  4.  
  5. hive (yinzhengjie)> set mapreduce.job.reduces;
  6. mapreduce.job.reduces=
  7. hive (yinzhengjie)>
  8. hive (yinzhengjie)> insert overwrite local directory '/home/yinzhengjie/download/sortby-result' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' select * from emp distribute by deptno sort by empno desc; #先按照部门编号分区,再按照员工编号降序排序。
  9. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  10. Query ID = yinzhengjie_20180810003826_af885657-4f0a-4e2a-83f3-62cbdabda4f3
  11. Total jobs =
  12. Launching Job out of
  13. Number of reduce tasks not specified. Defaulting to jobconf value of:
  14. In order to change the average load for a reducer (in bytes):
  15. set hive.exec.reducers.bytes.per.reducer=<number>
  16. In order to limit the maximum number of reducers:
  17. set hive.exec.reducers.max=<number>
  18. In order to set a constant number of reducers:
  19. set mapreduce.job.reduces=<number>
  20. Starting Job = job_1533789743141_0046, Tracking URL = http://s101:8088/proxy/application_1533789743141_0046/
  21. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0046
  22. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  23. -- ::, Stage- map = %, reduce = %
  24. -- ::, Stage- map = %, reduce = %, Cumulative CPU 2.07 sec
  25. -- ::, Stage- map = %, reduce = %, Cumulative CPU 4.54 sec
  26. -- ::, Stage- map = %, reduce = %, Cumulative CPU 6.44 sec
  27. -- ::, Stage- map = %, reduce = %, Cumulative CPU 8.78 sec
  28. MapReduce Total cumulative CPU time: seconds msec
  29. Ended Job = job_1533789743141_0046
  30. Moving data to local directory /home/yinzhengjie/download/sortby-result
  31. MapReduce Jobs Launched:
  32. Stage-Stage-: Map: Reduce: Cumulative CPU: 8.78 sec HDFS Read: HDFS Write: SUCCESS
  33. Total MapReduce CPU Time Spent: seconds msec
  34. OK
  35. emp.empno emp.ename emp.job emp.mgr emp.hiredate emp.sal emp.comm emp.deptno
  36. Time taken: 86.59 seconds
  37. hive (yinzhengjie)>

排序-分区排序(hive (yinzhengjie)> insert overwrite local directory '/home/yinzhengjie/download/sortby-result' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' select * from emp distribute by deptno sort by empno desc;)

  1. 排序-Cluster By(hive (yinzhengjie)> select * from emp cluster by deptno;)
  2. distribute bysorts by字段相同时,可以使用cluster by方式。
  3. cluster by除了具有distribute by的功能外还兼具sort by的功能。但是排序只能是倒序排序,不能指定排序规则为ASC或者DESC
  4.  
  5. 我们可以看以下两个案例,以下两种写法等价:
  6.  
  7. hive (yinzhengjie)> select * from emp cluster by deptno;
  8. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  9. Query ID = yinzhengjie_20180810004115_0faf59ba-950a-4f86-885a-00865338c95c
  10. Total jobs =
  11. Launching Job out of
  12. Number of reduce tasks not specified. Defaulting to jobconf value of:
  13. In order to change the average load for a reducer (in bytes):
  14. set hive.exec.reducers.bytes.per.reducer=<number>
  15. In order to limit the maximum number of reducers:
  16. set hive.exec.reducers.max=<number>
  17. In order to set a constant number of reducers:
  18. set mapreduce.job.reduces=<number>
  19. Starting Job = job_1533789743141_0047, Tracking URL = http://s101:8088/proxy/application_1533789743141_0047/
  20. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0047
  21. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  22. -- ::, Stage- map = %, reduce = %
  23. -- ::, Stage- map = %, reduce = %, Cumulative CPU 1.21 sec
  24. -- ::, Stage- map = %, reduce = %, Cumulative CPU 3.64 sec
  25. -- ::, Stage- map = %, reduce = %, Cumulative CPU 5.93 sec
  26. -- ::, Stage- map = %, reduce = %, Cumulative CPU 8.2 sec
  27. -- ::, Stage- map = %, reduce = %, Cumulative CPU 8.97 sec
  28. MapReduce Total cumulative CPU time: seconds msec
  29. Ended Job = job_1533789743141_0047
  30. MapReduce Jobs Launched:
  31. Stage-Stage-: Map: Reduce: Cumulative CPU: 8.97 sec HDFS Read: HDFS Write: SUCCESS
  32. Total MapReduce CPU Time Spent: seconds msec
  33. OK
  34. emp.empno emp.ename emp.job emp.mgr emp.hiredate emp.sal emp.comm emp.deptno
  35. ALLEN SALESMAN -- 1600.0 300.0
  36. TURNER SALESMAN -- 1500.0 0.0
  37. WARD SALESMAN -- 1250.0 500.0
  38. JAMES CLERK -- 950.0 NULL
  39. TURNER SALESMAN -- 1500.0 0.0
  40. ALLEN SALESMAN -- 1600.0 300.0
  41. MARTIN SALESMAN -- 1250.0 1400.0
  42. JAMES CLERK -- 950.0 NULL
  43. BLAKE MANAGER -- 2850.0 NULL
  44. MARTIN SALESMAN -- 1250.0 1400.0
  45. BLAKE MANAGER -- 2850.0 NULL
  46. WARD SALESMAN -- 1250.0 500.0
  47. MILLER CLERK -- 1300.0 NULL
  48. KING PRESIDENT NULL -- 5000.0 NULL
  49. CLARK MANAGER -- 2450.0 NULL
  50. MILLER CLERK -- 1300.0 NULL
  51. KING PRESIDENT NULL -- 5000.0 NULL
  52. CLARK MANAGER -- 2450.0 NULL
  53. SMITH CLERK -- 800.0 NULL
  54. FORD ANALYST -- 3000.0 NULL
  55. SCOTT ANALYST -- 3000.0 NULL
  56. SMITH CLERK -- 800.0 NULL
  57. JONES MANAGER -- 2975.0 NULL
  58. SCOTT ANALYST -- 3000.0 NULL
  59. JONES MANAGER -- 2975.0 NULL
  60. ADAMS CLERK -- 1100.0 NULL
  61. FORD ANALYST -- 3000.0 NULL
  62. ADAMS CLERK -- 1100.0 NULL
  63. Time taken: 64.632 seconds, Fetched: row(s)
  64. hive (yinzhengjie)> select * from emp distribute by deptno sort by deptno;
  65. WARNING: Hive-on-MR is deprecated in Hive and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive .X releases.
  66. Query ID = yinzhengjie_20180810004343_d5ce078f-80a7--8a00-a75b6a97f7b2
  67. Total jobs =
  68. Launching Job out of
  69. Number of reduce tasks not specified. Defaulting to jobconf value of:
  70. In order to change the average load for a reducer (in bytes):
  71. set hive.exec.reducers.bytes.per.reducer=<number>
  72. In order to limit the maximum number of reducers:
  73. set hive.exec.reducers.max=<number>
  74. In order to set a constant number of reducers:
  75. set mapreduce.job.reduces=<number>
  76. Starting Job = job_1533789743141_0048, Tracking URL = http://s101:8088/proxy/application_1533789743141_0048/
  77. Kill Command = /soft/hadoop-2.7./bin/hadoop job -kill job_1533789743141_0048
  78. Hadoop job information for Stage-: number of mappers: ; number of reducers:
  79. -- ::, Stage- map = %, reduce = %
  80. -- ::, Stage- map = %, reduce = %, Cumulative CPU 1.51 sec
  81. -- ::, Stage- map = %, reduce = %, Cumulative CPU 4.62 sec
  82. -- ::, Stage- map = %, reduce = %, Cumulative CPU 10.22 sec
  83. MapReduce Total cumulative CPU time: seconds msec
  84. Ended Job = job_1533789743141_0048
  85. MapReduce Jobs Launched:
  86. Stage-Stage-: Map: Reduce: Cumulative CPU: 10.22 sec HDFS Read: HDFS Write: SUCCESS
  87. Total MapReduce CPU Time Spent: seconds msec
  88. OK
  89. emp.empno emp.ename emp.job emp.mgr emp.hiredate emp.sal emp.comm emp.deptno
  90. ALLEN SALESMAN -- 1600.0 300.0
  91. TURNER SALESMAN -- 1500.0 0.0
  92. WARD SALESMAN -- 1250.0 500.0
  93. JAMES CLERK -- 950.0 NULL
  94. TURNER SALESMAN -- 1500.0 0.0
  95. ALLEN SALESMAN -- 1600.0 300.0
  96. MARTIN SALESMAN -- 1250.0 1400.0
  97. JAMES CLERK -- 950.0 NULL
  98. BLAKE MANAGER -- 2850.0 NULL
  99. MARTIN SALESMAN -- 1250.0 1400.0
  100. BLAKE MANAGER -- 2850.0 NULL
  101. WARD SALESMAN -- 1250.0 500.0
  102. MILLER CLERK -- 1300.0 NULL
  103. KING PRESIDENT NULL -- 5000.0 NULL
  104. CLARK MANAGER -- 2450.0 NULL
  105. MILLER CLERK -- 1300.0 NULL
  106. KING PRESIDENT NULL -- 5000.0 NULL
  107. CLARK MANAGER -- 2450.0 NULL
  108. SMITH CLERK -- 800.0 NULL
  109. FORD ANALYST -- 3000.0 NULL
  110. SCOTT ANALYST -- 3000.0 NULL
  111. SMITH CLERK -- 800.0 NULL
  112. JONES MANAGER -- 2975.0 NULL
  113. SCOTT ANALYST -- 3000.0 NULL
  114. JONES MANAGER -- 2975.0 NULL
  115. ADAMS CLERK -- 1100.0 NULL
  116. FORD ANALYST -- 3000.0 NULL
  117. ADAMS CLERK -- 1100.0 NULL
  118. Time taken: 48.312 seconds, Fetched: row(s)
  119. hive (yinzhengjie)>

排序-Cluster By(hive (yinzhengjie)> select * from emp cluster by deptno;)

  1. 分桶表-分桶抽样查询(hive (yinzhengjie)> select * from stu_buck tablesample(bucket out of on id);)
  2. 对于非常大的数据集,有时用户需要使用的是一个具有代表性的查询结果而不是全部结果。Hive可以通过对表进行抽样来满足这个需求。
  3.  
  4. hive (yinzhengjie)> select * from stu_buck;
  5. OK
  6. stu_buck.id stu_buck.name
  7. ss16
  8. ss12
  9. ss8
  10. ss4
  11. ss1
  12. ss13
  13. ss5
  14. ss9
  15. ss14
  16. ss10
  17. ss6
  18. ss2
  19. ss15
  20. ss7
  21. ss3
  22. ss11
  23. Time taken: 0.073 seconds, Fetched: row(s)
  24. hive (yinzhengjie)> select * from stu_buck tablesample(bucket out of on id); #查询表stu_buck中的数据。
  25. OK
  26. stu_buck.id stu_buck.name
  27. ss16
  28. ss12
  29. ss8
  30. ss4
  31. Time taken: 0.088 seconds, Fetched: row(s)
  32. hive (yinzhengjie)>
  33.  
  34. 温馨提示:tablesample是抽样语句,语法:TABLESAMPLE(BUCKET x OUT OF y)
  35. y必须是tablebucket数的倍数或者因子。hive根据y的大小,决定抽样的比例。例如,table总共分了4份,当y=2时,抽取(/=)2个bucket的数据,当y=8时,抽取(/=)/2bucket的数据。
  36. x表示从哪个bucket开始抽取。例如,tablebucket数为4tablesample(bucket out of ),表示总共抽取(/=)1bucket的数据,抽取第4bucket的数据。
  37. 注意:x的值必须小于等于y的值,否则会抛异常,FAILED: SemanticException [Error ]: Numerator should not be bigger than denominator in sample clause for table stu_buck

分桶表-分桶抽样查询(hive (yinzhengjie)> select * from stu_buck tablesample(bucket 1 out of 4 on id);)

  1. 分桶表-数据块抽样(hive (yinzhengjie)> select * from stu_buck tablesample(0.1 percent);)
  2. Hive提供了另外一种按照百分比进行抽样的方式,这种是基于行数的,按照输入路径下的数据块百分比进行的抽样。
  3.  
  4. 温馨提示:
  5. 这种抽样方式不一定适用于所有的文件格式。另外,这种抽样的最小抽样单元是一个HDFS数据块。因此,如果表的数据大小小于普通的块大小128M的话,那么将会返回所有行。
  6.  
  7. hive (yinzhengjie)> select * from stu_buck;
  8. OK
  9. stu_buck.id stu_buck.name
  10. ss16
  11. ss12
  12. ss8
  13. ss4
  14. ss1
  15. ss13
  16. ss5
  17. ss9
  18. ss14
  19. ss10
  20. ss6
  21. ss2
  22. ss15
  23. ss7
  24. ss3
  25. ss11
  26. Time taken: 0.078 seconds, Fetched: row(s)
  27. hive (yinzhengjie)> select * from stu_buck tablesample(0.1 percent) ; #注意,stu_buck是一个4和桶的桶表,因此他不会把桶表的数据都查询出来,因为它是从四个桶中随机抽取的一个桶的数据
  28. OK
  29. stu_buck.id stu_buck.name
  30. ss16
  31. ss12
  32. ss8
  33. ss4
  34. Time taken: 0.04 seconds, Fetched: row(s)
  35. hive (yinzhengjie)> select * from stu tablesample(0.1 percent) ;
  36. OK
  37. stu.id stu.name
  38. ss1
  39. ss2
  40. ss3
  41. ss4
  42. ss5
  43. ss6
  44. ss7
  45. ss8
  46. ss9
  47. ss10
  48. ss11
  49. ss12
  50. ss13
  51. ss14
  52. ss15
  53. ss16
  54. Time taken: 0.059 seconds, Fetched: row(s)
  55. hive (yinzhengjie)>

分桶表-数据块抽样(hive (yinzhengjie)> select * from stu_buck tablesample(0.1 percent);)

 5>.函数

  1. hive (yinzhengjie)> show functions; #查看系统自带的函数
  2.  
  3. hive (yinzhengjie)> desc function xpath; #显示自带的函数的用法
  4.  
  5. hive (yinzhengjie)> desc function extended xpath; #详细显示自带的函数的用法
  6.  
  7. 关于自定义函数,可以参考:https://www.cnblogs.com/yinzhengjie/p/9154359.html

Hadoop生态圈-Hive快速入门篇之HQL的基础语法的更多相关文章

  1. Hadoop生态圈-Hive快速入门篇之Hive环境搭建

    Hadoop生态圈-Hive快速入门篇之Hive环境搭建 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.数据仓库(理论性知识大多摘自百度百科) 1>.什么是数据仓库 数据 ...

  2. Hadoop生态圈-大数据生态体系快速入门篇

    Hadoop生态圈-大数据生态体系快速入门篇 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.大数据概念 1>.什么是大数据 大数据(big data):是指无法在一定时间 ...

  3. [转帖]Hive 快速入门(全面)

    Hive 快速入门(全面) 2018-07-30 16:11:56 琅琊山二当家 阅读数 4343更多 分类专栏: hadoop 大数据   转载: https://www.codercto.com/ ...

  4. 私有仓库GitLab快速入门篇

    私有仓库GitLab快速入门篇 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 安装文档请参考官网:https://about.gitlab.com/installation/#ce ...

  5. Java基础-SSM之Spring快速入门篇

    Java基础-SSM之Spring快速入门篇 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任.    Spring是一个开源框架,Spring是于2003 年兴起的一个轻量级的Java ...

  6. Java基础-SSM之mybatis快速入门篇

    Java基础-SSM之mybatis快速入门篇 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 其实你可能会问什么是SSM,简单的说就是spring mvc + Spring + m ...

  7. Hadoop生态圈-hive优化手段-作业和查询优化

    Hadoop生态圈-hive优化手段-作业和查询优化 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任.

  8. Hadoop生态圈-hive编写自定义函数

    Hadoop生态圈-hive编写自定义函数 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任.

  9. Hadoop生态圈-Hive函数

    Hadoop生态圈-Hive函数 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任.

随机推荐

  1. JavaScript实现大整数减法

    继上一篇博文写了大整数加法之后,我又模拟上篇博文的算法,自己实现了大整数减法. 大整数减法相对于加法来说,稍微复杂一点.由于要考虑一些情况: 1. 两个数相减,可能会出现结果为正.负和0三种情况: 2 ...

  2. Chapter 8 面向对象设计

    设计也是一个建模的活动,在设计阶段将集中研究系统的软件实现问题包括体系结构设计.详细设计.用户界面设计和数据库设计等.通常设计活动分为系统设计和详细设计两个主要阶段.软件设计要遵循模块化.耦合度和内聚 ...

  3. object-oriented first work

    前言:在星期三的第一次面向对象程序设计课,遇见我们的栋哥,初次见面,发现老师的幽默.....下课后,就给我们一道作业题目... 作业要求:Create a program that asks for ...

  4. POJ 1112 Team Them Up! 二分图判定+01背包

    题目链接: http://poj.org/problem?id=1112 Team Them Up! Time Limit: 1000MSMemory Limit: 10000K 问题描述 Your ...

  5. mvc学习-编辑提交需要注意-mvc重点

    示例代码: // GET: /Movies/Edit/5 public ActionResult Edit(int? id) { if (id == null) { return new HttpSt ...

  6. android自动化之appium的环境搭建

    简介appium     appium是C/S架构,appium的核心是一个web服务器,它提供了一套REST的接口,他会接收客户端的连接,监听到命令.执行会再将结果通过HTTP响应返还给客户端.ap ...

  7. apache DBUtils 使用例子demo

    转自:http://blog.csdn.net/earbao/article/details/44901061 apache DBUtils是java编程中的数据库操作实用工具,小巧简单实用, 1.对 ...

  8. JavaScript 稀奇的js语法

    function c(expression) { console.log(expression); } c(-0); // -0 c(-0 === +0); // true c((-0).toStri ...

  9. Python 的 “Magic” 方法

    在以前的文章中,我聊过了Python的 __getitem__ 和 __setitem__ 方法.这些方法被称为“魔法”方法.特殊方法或者dunger方法(译者:国内书籍用“魔法”一词较多).那么,什 ...

  10. 【大数据】Spark基础解析

    第1章 Spark概述 1.1 什么是Spark 1.2 Spark内置模块 Spark Core:实现了Spark的基本功能,包含任务调度.内存管理.错误恢复.与存储系统交互等模块.Spark Co ...