我们在创建表的时候可以指定external关键字创建外部表,外部表对应的文件存储在location指定的目录下,向该目录添加新文件的同时,该表也会读取到该文件(当然文件格式必须跟表定义的一致),删除外部表的同时并不会删除location指定目录下的文件.

1.查看hdfs系统目录/user/hadoop1/myfile下文件

[hadoop1@node1]$ hadoop fs -ls /user/hadoop1/myfile/
Found 1 items
-rw-r--r-- 3 hadoop1 supergroup 567839 2014-10-29 16:50 /user/hadoop1/myfile/tb_class.txt

2.创建外部表指向myfile目录下的文件

hive (hxl)> create external table tb_class_info_external
> (id int,
> class_name string,
> createtime timestamp ,
> modifytime timestamp)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|'
> location '/user/hadoop1/myfile';
OK
Time taken: 0.083 seconds

注意这里的location指向的是hdfs系统上的路径,而不是本地机器上的路径,这里表tb_class_info_external会读取myfile目录下的所有文件

3.查看外部表

hive (hxl)> select count(1) from tb_class_info_external;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Starting Job = job_201410300915_0009, Tracking URL = http://node1:50030/jobdetails.jsp?jobid=job_201410300915_0009
Kill Command = /usr1/hadoop/libexec/../bin/hadoop job -Dmapred.job.tracker=http://192.168.56.101:9001 -kill job_201410300915_0009
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2014-10-30 15:25:10,652 Stage-1 map = 0%, reduce = 0%
2014-10-30 15:25:12,664 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.48 sec
2014-10-30 15:25:13,671 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.48 sec
2014-10-30 15:25:14,682 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.48 sec
2014-10-30 15:25:15,690 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.48 sec
2014-10-30 15:25:16,697 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.48 sec
2014-10-30 15:25:17,704 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.48 sec
2014-10-30 15:25:18,710 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.48 sec
2014-10-30 15:25:19,718 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.48 sec
2014-10-30 15:25:20,725 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.21 sec
2014-10-30 15:25:21,730 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.21 sec
2014-10-30 15:25:22,737 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.21 sec
MapReduce Total cumulative CPU time: 1 seconds 210 msec
Ended Job = job_201410300915_0009
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 1.21 sec HDFS Read: 568052 HDFS Write: 6 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 210 msec
OK
10001
Time taken: 14.742 seconds

可以看到这里表记录数是10001,下面我们在myfile目录下添加另外一个文件tb_class_bak.txt

4.在myfile目录下添加文本

$hadoop fs -cp /user/hadoop1/myfile/tb_class.txt /user/hadoop1/myfile/tb_class_bak.txt

5.再次查询表记录数

hive (hxl)> select count(1) from tb_class_info_external;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Starting Job = job_201410300915_0010, Tracking URL = http://node1:50030/jobdetails.jsp?jobid=job_201410300915_0010
Kill Command = /usr1/hadoop/libexec/../bin/hadoop job -Dmapred.job.tracker=http://192.168.56.101:9001 -kill job_201410300915_0010
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2014-10-30 15:32:02,275 Stage-1 map = 0%, reduce = 0%
2014-10-30 15:32:04,286 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.48 sec
2014-10-30 15:32:05,292 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.48 sec
2014-10-30 15:32:06,300 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.48 sec
2014-10-30 15:32:07,306 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.48 sec
2014-10-30 15:32:08,313 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.48 sec
2014-10-30 15:32:09,319 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.48 sec
2014-10-30 15:32:10,327 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.48 sec
2014-10-30 15:32:11,331 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.48 sec
2014-10-30 15:32:12,338 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.16 sec
2014-10-30 15:32:13,343 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.16 sec
2014-10-30 15:32:14,350 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.16 sec
MapReduce Total cumulative CPU time: 1 seconds 160 msec
Ended Job = job_201410300915_0010
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 1.16 sec HDFS Read: 1135971 HDFS Write: 6 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 160 msec
OK
20002
Time taken: 14.665 seconds

可以看到记录数加倍了,那就说明表已经读取了新增加的文件

6.删除表

hive (hxl)> drop table tb_class_info_external;
OK
Time taken: 1.7 seconds

表对应的文件并没有删除

[hadoop1@node1]$ hadoop fs -ls /user/hadoop1/myfile/
Found 2 items
-rw-r--r-- 3 hadoop1 supergroup 567839 2014-10-29 16:50 /user/hadoop1/myfile/tb_class.txt
-rw-r--r-- 3 hadoop1 supergroup 567839 2014-10-30 15:28 /user/hadoop1/myfile/tb_class_bak.txt

hive外部表自动读取文件夹里的数据的更多相关文章

  1. Matlab 读取文件夹里所有的文件

    (image = dir('D:\gesture\*.*'); % dir是指定文件夹得位置,他与dos下的dir用法相同. 用法有三种: 1. dir 是指工作在当前文件夹里 2. dir name ...

  2. Matlab 读取文件夹中所有的bmp文件

    将srcimg文件下的bmp文件转为jpg图像,存放在dstimg文件夹下 str = 'srcimg'; dst = 'dstimg'; file=dir([str,'\*.bmp']); :len ...

  3. diff两个文件夹里的东西

    diff --help -x, --exclude=PAT               exclude files that match PAT 排除某个类型的文件 -u, -U NUM, --uni ...

  4. R8—批量生成文件夹,批量读取文件夹名称+R文件管理系统操作函数

    一. 批量生成文件夹,批量读取文件夹名称 今日,工作中遇到这样一个问题:boss给我们提供了200多家公司的ID代码(如6007.7920等),需要根据这些ID号去搜索下载新闻,从而将下载到的新闻存到 ...

  5. c++读取文件夹及子文件夹数据

    这里有两种情况:读取文件夹下所有嵌套的子文件夹里的所有文件  和 读取文件夹下的指定子文件夹(或所有子文件夹里指定的文件名) <ps,里面和file文件有关的结构体类型和方法在 <io.h ...

  6. Matlab批量读取文件夹文件

    现在有一个文件夹 里面有50个左右的txt文件 每个文件大概三万行 两列 第一列是字符串 第二列是浮点数字 我只需要读第二列 现在我想写一个.M文件 批量读取这个文件夹里的txt文件 读取完以后的数组 ...

  7. php 读取网页源码 , 导出成txt文件, 读取xls,读取文件夹下的所有文件的文件名

    <?php // 读取网页源码$curl = curl_init();curl_setopt($curl, CURLOPT_URL, $url);curl_setopt($curl, CURLO ...

  8. C#读取文件夹大小

    今天需要做一个读取文件夹大小的功能,为了避免遍历文件夹下所有文件并求出总大小,找到如下的好方法: 首先要在项目中引用一个COM组件:Microsoft Scripting Runtime,这个在Ref ...

  9. su认证失败&文件夹里打开终端的方法&atom安装

    很久没用笔记本上的ubuntu,用不顺手,比在公司调教了半年多的电脑差远了.一步一步来.先解决最不顺手的三件事 1.su认证失败. 新安装的ubuntu系统是无法切换到root账户的,得做一番修改 s ...

随机推荐

  1. HDU-4627 The Unsolvable Problem 简单数学

    题目链接:http://acm.hdu.edu.cn/showproblem.php?pid=4627 对n除个2,然后考虑下奇偶... //STATUS:C++_AC_15MS_228KB #inc ...

  2. Linux Pmap 命令:查看进程用了多少内存

    Pmap 提供了进程的内存映射,pmap命令用于显示一个或多个进程的内存状态.其报告进程的地址空间和内存状态信息.Pmap实际上是一个Sun OS上的命令,linux仅支持其有限的功能.但是它还是对查 ...

  3. SecureCRT配置显示的字符集

  4. weblogic11g 安装——linux 无图形界面

    weblogic11g 安装——linux下无weblogic安装图形界面 注意:此次安装,没做server.ip .系统规划 目的:学习weblogic11g 在linux下  无图形安装的过程 j ...

  5. jquery完成带单选按钮的表格行高亮显示

    jquery完成带单选按钮的表格行高亮显示 上篇博客写的是复选框的,这次写的是单选框的,有时查询的时候,只能选择一条记录,如果将选中的这条记录的行高亮显示,同时该行的单选按钮也被选中了,这样会提高用户 ...

  6. Codeforces 390Div2-754D. Fedor and coupons(贪心+优先队列)

    D. Fedor and coupons time limit per test 4 seconds memory limit per test 256 megabytes input standar ...

  7. SQLite DBHelp

    c#连接SQLite SQLite这个精巧的小数据库,无需安装软件,只需要一个System.Data.SQLite.DLL文件即可操作SQLite数据库.SQLite是一个开源数据库,现在已变得越来越 ...

  8. 【11】在operator=中处理“自我赋值”

    1.自我赋值,看起来愚蠢,但是却合法.有些自我赋值一眼就可看出来.有些自我赋值是潜在的.比如:a[i] = a[j]; *px = *py; 甚至不同类型的指针,都指向同一个地址,也是自我赋值,这一类 ...

  9. js判断字符在另一个字符串中出现次数

    经过搜索验证,提供两个方法. 1. 通过分割获取长度原理 var s = 'www.51qdq.com';var n = (s.split('.')).length-1;alert(n);  //弹出 ...

  10. POJ 3692 最大独立集

    题意:有G个女生,B个男生,所有的女生都互相认识,所有的男生都互相认识,还有N对男女,他们互相认识. 问从中选出最多的人数,是的他们全部互相认识. 思路:这道题的构图很巧妙,对于他的补图构图,对于所有 ...