回顾:

  1. hive 优点
  2. 1. sql语句靠近关系型数据库,可自定义函数,增加了扩展性,易于开发,减少mapreduce学习成本
  3. 2. hive转换sql语句为mapreduce程序以mapreduce为底层实现
  4. 3. hive基于hadoophdfs,在hdfs上存储,因为hdfs的扩展性,hive的存储扩展性相应增加

hive 安装部署

  1. 1. 解压安装包
  2. 2. 进入conf目录,拷贝(备份)相应配置文件,修改
  3. hive-env.sh
  4. --> HADOOP_HOME=/opt/cdh-5.6.3/hadoop-2.5.0-cdh5.3.6
  5. --> export HIVE_CONF_DIR=/opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/conf
  6. hive-log4j.properties
  7. --> hive根目录下创建日志文件夹,来存放hive运行日志
  8. --> hive.log.threshold=ALL
  9. hive.root.logger=INFO,DRFA
  10. hive.log.dir=/opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/logs
  11. hive.log.file=hive.log
  12. hive-site.xml
  13. --> javax.jdo.option.ConnectionURL --- jdbc:mysql://hadoop09-linux-01.ibeifeng.com:3306/chd_metastore?createDatabaseIfNotExist=true
  14. --> javax.jdo.option.ConnectionDriverName --- com.mysql.jdbc.Driver
  15. --> javax.jdo.option.ConnectionUserName --- root
  16. --> javax.jdo.option.ConnectionPassword --- root
  17. --> hive.cli.print.header --- true # 这行表示是否显示标的列名(可不配)
  18. --> hive.cli.print.current.db --- true # 这行表示是否显示当前数据库名(可不配)
  19. --> hive.fetch.task.conversion --- true # 这行表示运行sql语句是否走mr(可不配)

Hive 架构

  1. 1. metastore
  2. --> derby数据库存储,在hive目录中会生成derby文件和metastore_db,弊端同级目录下启动hive会报错
  3. --> mysql中存储元数据
  4. --> 在远程mysql存储元数据
  5. 2. client
  6. --> cli/jdbc/Driver/SQLParser/QueryOptimizer/Physical Plan/Execution

Hive 创建表几种方式,分别是什么

  1. 1. 普通建表
  2. create table if not exists tablename(...)
  3. row format delimited fields terminated by '\t';
  4. stored as textfile;
  5. 2. 子查询方式
  6. create table if not exists tablename as select * from tablename2;
  7. 3. like 方式
  8. create table if not exists tablename like tablename2 # 该创建方式仅复制tablename2的表结构

表的类型

  1. 1. 管理表(默认表类型)
  2. 2. 外部表(external;解决多用户使用同一表)
  3. 3. 分区表(partiion;优化分析查询表数据)
  4. --> 查看分区表: show partitions tablename; # 查看tablename的分区情况
  5. 手动添加分区
  6. 1. 创建分区
  7. hive (workdb)> dfs -mkdir /user/hive/warehouse/workdb.db/emp_part/date=20161029
  8. dfs -put /home/liuwl/opt/datas/emp.txt /user/hive/warehouse/workdb.db/emp_part/date=20161029
  9. --- 发现使用show partitions emp_parts; 不能检索出刚刚手动添加的表分区
  10. 2. 解决:alter tabel emp_part add partition (date='20161029')

分析函数和窗口函数(重点)

  1. 1. 分析函数
  2. 部门20的所有员工,按薪资降序排列
  3. select * from emp where emp.deptno='20' order by sal desc;
  4. 所有部门分组,按薪资降序排列
  5. select empno,ename,deptno,sal,max(sal) over (partition by deptno order by sal desc) as maxsal from emp;
  6. 结果:
  7. empno ename deptno sal maxsal
  8. 7839 KING 10 5000.0 5000.0
  9. 7782 CLARK 10 2450.0 5000.0
  10. 7934 MILLER 10 1300.0 5000.0
  11. 7788 SCOTT 20 3000.0 3000.0
  12. 7902 FORD 20 3000.0 3000.0
  13. 7566 JONES 20 2975.0 3000.0
  14. 7876 ADAMS 20 1100.0 3000.0
  15. 7369 SMITH 20 800.0 3000.0
  16. 7698 BLAKE 30 2850.0 2850.0
  17. 7499 ALLEN 30 1600.0 2850.0
  18. 7844 TURNER 30 1500.0 2850.0
  19. 7654 MARTIN 30 1250.0 2850.0
  20. 7521 WARD 30 1250.0 2850.0
  21. 7900 JAMES 30 950.0 2850.0
  22. 实现行号
  23. select empno,ename,deptno,sal,row_number() over (partition by deptno order by sal desc) as rownum from emp;
  24. empno ename deptno sal rownum
  25. 7839 KING 10 5000.0 1
  26. 7782 CLARK 10 2450.0 2
  27. 7934 MILLER 10 1300.0 3
  28. 7788 SCOTT 20 3000.0 1
  29. 7902 FORD 20 3000.0 2
  30. 7566 JONES 20 2975.0 3
  31. 7876 ADAMS 20 1100.0 4
  32. 7369 SMITH 20 800.0 5
  33. 7698 BLAKE 30 2850.0 1
  34. 7499 ALLEN 30 1600.0 2
  35. 7844 TURNER 30 1500.0 3
  36. 7654 MARTIN 30 1250.0 4
  37. 7521 WARD 30 1250.0 5
  38. 7900 JAMES 30 950.0 6
  39. ROW_NUMBER() 行号
  40. 获取工资最高的前两位
  41. select * from (select empno,ename,deptno,sal,row_number() over (partition by deptno order by sal desc) as rownum from emp) t where t.rownum < 3;
  42. t.empno t.ename t.deptno t.sal t.rownum
  43. 7839 KING 10 5000.0 1
  44. 7782 CLARK 10 2450.0 2
  45. 7788 SCOTT 20 3000.0 1
  46. 7902 FORD 20 3000.0 2
  47. 7698 BLAKE 30 2850.0 1
  48. 7499 ALLEN 30 1600.0 2
  49. RANK()排名(第一二位为1,第三位为3,默认第二位为1)
  50. DENSE_RANK()(第一二位为1,第三位为2,默认第二位为1)
  51. 为所有部门分组,按薪资降序排列,且进行排名
  52. select empno,ename,deptno,sal,rank() over (partition by deptno order by sal desc) ranksal from emp;
  53. empno ename deptno sal ranksal
  54. 7839 KING 10 5000.0 1
  55. 7782 CLARK 10 2450.0 2
  56. 7934 MILLER 10 1300.0 3
  57. 7788 SCOTT 20 3000.0 1
  58. 7902 FORD 20 3000.0 1
  59. 7566 JONES 20 2975.0 3
  60. 7876 ADAMS 20 1100.0 4
  61. 7369 SMITH 20 800.0 5
  62. 7698 BLAKE 30 2850.0 1
  63. 7499 ALLEN 30 1600.0 2
  64. 7844 TURNER 30 1500.0 3
  65. 7654 MARTIN 30 1250.0 4
  66. 7521 WARD 30 1250.0 4
  67. 7900 JAMES 30 950.0 6
  68. select empno,ename,deptno,sal,dense_rank() over (partition by deptno order by sal desc) ranksal from emp;
  69. empno ename deptno sal dense_ranksal
  70. 7839 KING 10 5000.0 1
  71. 7782 CLARK 10 2450.0 2
  72. 7934 MILLER 10 1300.0 3
  73. 7788 SCOTT 20 3000.0 1
  74. 7902 FORD 20 3000.0 1
  75. 7566 JONES 20 2975.0 2
  76. 7876 ADAMS 20 1100.0 3
  77. 7369 SMITH 20 800.0 4
  78. 7698 BLAKE 30 2850.0 1
  79. 7499 ALLEN 30 1600.0 2
  80. 7844 TURNER 30 1500.0 3
  81. 7654 MARTIN 30 1250.0 4
  82. 7521 WARD 30 1250.0 4
  83. 7900 JAMES 30 950.0 5
  84. NTILE()层次查询
  85. 例:查询出所有组中工资水平前1/3人员
  86. select empno,ename,sal,ntile(3) over (order by sal desc ) ntile from emp group by empno,ename;
  87. empno ename sal til
  88. 7839 KING 5000.0 1
  89. 7902 FORD 3000.0 1
  90. 7788 SCOTT 3000.0 1
  91. 7566 JONES 2975.0 1
  92. 7698 BLAKE 2850.0 1
  93. 7782 CLARK 2450.0 2
  94. 7499 ALLEN 1600.0 2
  95. 7844 TURNER 1500.0 2
  96. 7934 MILLER 1300.0 2
  97. 7654 MARTIN 1250.0 2
  98. 7521 WARD 1250.0 3
  99. 7876 ADAMS 1100.0 3
  100. 7900 JAMES 950.0 3
  101. 7369 SMITH 800.0 3
  102. 2. 窗口函数LAG(向前取值)LEAG(向后取值)
  103. select empno,ename,sal,lag(ename,4,0) over (order by sal desc) lagvalue from emp;
  104. 结果:
  105. empno ename sal lagvalue
  106. 7839 KING 5000.0 0
  107. 7902 FORD 3000.0 0
  108. 7788 SCOTT 3000.0 0
  109. 7566 JONES 2975.0 0
  110. 7698 BLAKE 2850.0 KING
  111. 7782 CLARK 2450.0 FORD
  112. 7499 ALLEN 1600.0 SCOTT
  113. 7844 TURNER 1500.0 JONES
  114. 7934 MILLER 1300.0 BLAKE
  115. 7654 MARTIN 1250.0 CLARK
  116. 7521 WARD 1250.0 ALLEN
  117. 7876 ADAMS 1100.0 TURNER
  118. 7900 JAMES 950.0 MILLER
  119. 7369 SMITH 800.0 MARTIN
  120. select empno,ename,sal,lead(ename,4,0) over (order by sal desc) leadvalue from emp;
  121. 结果:
  122. empno ename sal leadvalue
  123. 7839 KING 5000.0 BLAKE
  124. 7902 FORD 3000.0 CLARK
  125. 7788 SCOTT 3000.0 ALLEN
  126. 7566 JONES 2975.0 TURNER
  127. 7698 BLAKE 2850.0 MILLER
  128. 7782 CLARK 2450.0 MARTIN
  129. 7499 ALLEN 1600.0 WARD
  130. 7844 TURNER 1500.0 ADAMS
  131. 7934 MILLER 1300.0 JAMES
  132. 7654 MARTIN 1250.0 SMITH
  133. 7521 WARD 1250.0 0
  134. 7876 ADAMS 1100.0 0
  135. 7900 JAMES 950.0 0
  136. 7369 SMITH 800.0 0

Hive中的case when then

  1. 1. case key
  2.    when value1 then ''
  3.    when value2 then ''
  4.   else ''
  5.   end
  6. 2. case
  7.    when key='value1' then ''
  8.    when key='value2' then ''
  9.   else ''
  10.   end
  11. 例:
  12.   select empno,ename,sal,deptno,
  13.   case when deptno=10 then 'U deptno is 10' when deptno=20 then 'U deptno is 20' else 'U deptno is 30' end from emp;
  14. Hive中类型转换(cast(key as type))
  15. Hive中记录时间的格式(unix_timestamp())

数据导入Hive(重点)

  1. 1. 从本地导入
  2. load data local inpath 'filepath' into table tbname;
  3. 2. hdfs导入
  4. load data inpath 'hdfs_filepath' into table tbname;
  5. 3. load覆盖
  6. load data local inpath 'filepath' overwrite into table tbname;
  7. load data inpath 'hdfs_filepath' overwrite into table tbname;
  8. 4. 子查询方式
  9. create table tb2 as select * from tb1; # 默认分隔符为^A
  10. 5. insert into table select ql; # 分隔符为emp定义的分隔符
  11. --> create table emp_insert like emp;
  12. --> insert into table emp_insert select * from emp;
  13. 6. location方式
  14. create table if not exists tbname location 'localPath';

Hive 数据导出(重点)

  1. 1. insert方式(注意使用该方式导出数据到本地,1:文件夹得有相应权限;2:最好建一个文件夹,否则原文件夹下所有内容被覆盖)
  2. --> insert overwrite [local] directory 'path' select ql ;
  3. 例:insert overwrite local directory '/tmp' row format delimited fields terminated by '\t' select * from emp;
  4. 2. bin/hdfs -get
  5. 3. Linux命令执行HQL:
  6. -> -e
  7. -> -f
  8. -> 输出重定向
  9. 4. sqoop:用户hdfs与关系型数据库之间的导入导出

Hive export与import(相关地址只能是hdfsPath)

  1. -> export
  2. export table tb_name to 'hdfs_path'
  3. 例: export table emp to '/export_emp';
  4. -> import
  5. import table tb_name from 'hdfs_path'
  6. 例: import table emp_im from '/export_emp';

Hive HQL

  1. 1. 字段查询
  2. -> select empno,ename from emp;
  3. 2. wherelimitdistinct
  4. -> select * from emp where sal > 3000;
  5. -> select * from emp limit 5;
  6. -> select distinct deptno from emp;
  7. 3. between and,>,<,=,is null,is not null,in
  8. -> select * from emp where sal between 2000 and 3000;
  9. -> select * from emp where comm is null;
  10. -> select * from emp where sal in (2000,3000,4000);
  11. 4. count(),sum(),avg(),max(),min()
  12. -> select count(1) from emp;
  13. -> select sum(sal) form emp;
  14. -> select avg(sal) from emp;
  15. 5. group by,having
  16. -> select deptno,avg(sal) from emp group by deptno;
  17. -> select deptno,avg(sal) avgsal from emp group by deptno having avgsal >= 3000;
  18. 6. join
  19. -> 等值join(匹配共有的记录)
  20. select e.empno,e.deptno,e.ename,e.sal,e.mgr from emp e join dept d on e.deptno=d.deptno;
  21. -> join(左边为小表)
  22. select e.empno,e.deptno,e.ename,e.sal,e.mgr from emp e left join dept d on e.deptno=d.deptno;
  23. -> join(右边为小表)
  24. select e.empno,e.deptno,e.ename,e.sal,e.mgr from emp e right join dept d on e.deptno=d.deptno;
  25. -> join
  26. select e.empno,e.deptno,e.ename,e.sal,e.mgr from emp e full join dept d on e.deptno=d.deptno;

Hive mapreduce相关操作(重点)

  1. 1. 设置每个reduce处理的数据量
  2. -> set hive.exec.reducers.bytes.per.reducer;
  3. 默认1G,当处理的数据量为10G时,将开启10reduce,每个reduce处理1G
  4. 2. 设置最大运行reduce的个数
  5. -> set hive.exec.reducers.max;
  6. 默认最大运行个数为999
  7. 3. 设置实际reduce的个数
  8. -> set mapreduce.job.reduces;
  9. hive打印显示-1,hadoop默认为1

Hive 几种排序(重点)

  1. 1. order by (只针对一个文件排序,当有多个reduce Task生成多个文件时,排序失效)
  2. -> select * from emp order by sal desc;
  3. 2. sort by (对于每一个文件进行排序,注意目必须是绝对路径)
  4. -> set mapreduce.job.reduces=3;
  5. -> insert overwrite local directory '/home/liuwl/opt/datas/sortData' row format delimited fields terminated by '\t' select * from emp sort by sal;
  6. 需要注意的是如果使用order by 不管reduce Task设置多少,只生成一个文件,并为该文件排序
  7. 3. distribute by (底层为mapreduce的分区,一般与sort by 连用)
  8. -> insert overwrite local directory '/home/liuwl/opt/datas/sortData' row format delimited fields terminated by '\t' select * from emp distribute by deptno sort by sal;
  9. 4. cluster by (等价于distribute by xx sort by xx) # xx为同一字段
  10. -> insert overwrite local directory '/home/liuwl/opt/datas/sortData' row format delimited fields terminated by '\t' select * from emp cluster by sal ;

Hive UDF(自定义函数,允许用户扩展HiveOL功能)(重点)

  1. 1. udf: 一进一出 upper/lower/day
  2. 2. udaf: 多进一出 count/max/min
  3. 3. udtf: 一进多出 ateral/view/explode
  4. udf:编程步骤
  5. --> 继承 org.apache.hadoop.hive.ql.UDF
  6. --> 实现 evaluate函数,evaluate函数支持重载
  7. 注意:UDF必须要有返回类型,可以返回NULL,但是返回类型不能为void
  8. UDF中常用Text/LongWritable等类型,不推荐使用java
  9. -->代码
  10. public Text evaluate(Text str){
  11.   return this.evaluate(str,new IntWritable(0));
  12. }
  13. public Text evaluate(Text str, IntWritable flag){
  14.  
  15.   if(str != null){
  16.     if(flag.get() == 0){
  17.       return new Text(str.toString().toLowerCase());
  18.     }else if(flag.get() ==1){
  19.       return new Text(str.toString().toUpperCase());
  20.     }else return null;
  21.   }else return null;
  22. }
  23. --> 打包
  24. --> hive中添加
  25. ---> 关联jar
  26. add jar jar_path;
  27. ---> 创建方法
  28. create temporary function tolower as 'com.hive.udf.UDFTest'
  29. ---> 测试
  30. select ename,tolower(ename),tolower(tolower(ename),1) lowername from emp;
  31. ename lowername  uppername
  32. SMITH smith    SMITH
  33. ALLEN allen    ALLEN
  34. WARD ward     WARD
  35. JONES jones    JONES
  36. MARTIN martin    MARTIN
  37. BLAKE blake    BLAKE
  38. CLARK clark    CLARK
  39. SCOTT scott    SCOTT
  40. KING king     KING
  41. TURNER turner    TURNER
  42. ADAMS adams    ADAMS
  43. JAMES james    JAMES
  44. FORD ford     FORD
  45. MILLER miller    MILLER
  46. -->案例2:去除所有双引号
  47. -->代码
  48. public Text evaluate(Text str){
  49. if(str != null){
  50. return new Text(str.toString().replaceAll("\"", ""));
  51. }else return null;
  52. }
  53. --> 关联jar
  54. add jar jar_path;
  55. --> 创建方法
  56. create temporary function rmquotes as 'com.hive.udf.RMQuotes'
  57. --> 测试
  58. select dname,rmquotes(dname) rmquotes from dept_quotes;
  59. dname rmquotes
  60. "ACCOUNTING" ACCOUNTING
  61. "RESEARCH" RESEARCH
  62. "SALES" SALES
  63. "OPERATIONS" OPERATIONS
  64. -->案例3:去除多有引号,并取出get后的路径
  65. 如:"116.216.17.0" "31/Aug/2015:00:19:42 +0800" "GET /theme/styles.php/bootstrap/1427679483/all HTTP/1.1"
  66. "116.216.17.0" "31/Aug/2015:00:19:42 +0800" /theme/styles.php/bootstrap/1427679483/all
  67. -->准备数据:
  68. moodel.log
  69. "116.216.17.0" "31/Aug/2015:00:19:42 +0800" "GET /theme/styles.php/bootstrap/1427679483/all HTTP/1.1"
  70. "116.216.17.0" "31/Aug/2015:00:19:42 +0800" "GET /theme/image.php/bootstrap/theme/1427679483/fp/logo HTTP/1.1"
  71. "116.216.17.0" "31/Aug/2015:00:19:42 +0800" "GET /theme/image.php/bootstrap/core/1427679483/t/expanded HTTP/1.1"
  72. "116.216.17.0" "31/Aug/2015:00:19:42 +0800" "GET /theme/image.php/bootstrap/theme/1427679483/fp/search_btn HTTP/1.1"
  73. "116.216.17.0" "31/Aug/2015:00:19:42 +0800" "GET /theme/image.php/bootstrap/core/1427679483/t/collapsed HTTP/1.1"
  74. "116.216.17.0" "31/Aug/2015:00:19:42 +0800" "GET /theme/image.php/bootstrap/theme/1427679483/fp/footerbg HTTP/1.1"
  75. "116.216.17.0" "31/Aug/2015:00:19:42 +0800" "GET /theme/yui_combo.php?m/1427679483/theme_bootstrap/bootstrap/bootstrap-min.js HTTP/1.1"
  76. "116.216.17.0" "31/Aug/2015:00:19:42 +0800" "GET /theme/yui_combo.php?m/1427679483/block_navigation/navigation/navigation-min.js HTTP/1.1"
  77. "116.216.17.0" "31/Aug/2015:00:19:43 +0800" "GET /theme/yui_combo.php?m/1427679483/theme_bootstrap/zoom/zoom-min.js HTTP/1.1"
  78. "116.216.17.0" "31/Aug/2015:00:19:43 +0800" "GET /theme/yui_combo.php?3.17.2/cssbutton/cssbutton-min.css HTTP/1.1"
  79. "116.216.17.0" "31/Aug/2015:00:19:43 +0800" "GET /theme/yui_combo.php?m/1427679483/core/lockscroll/lockscroll-min.js HTTP/1.1"
  80. "116.216.17.0" "31/Aug/2015:00:19:43 +0800" "GET /theme/image.php/bootstrap/core/1427679483/t/block_to_dock HTTP/1.1"
  81. "116.216.17.0" "31/Aug/2015:00:19:43 +0800" "GET /theme/image.php/bootstrap/core/1427679483/t/switch_plus HTTP/1.1"
  82. "116.216.17.0" "31/Aug/2015:00:19:43 +0800" "GET /theme/image.php/bootstrap/core/1427679483/t/switch_minus HTTP/1.1"
  83. "116.216.17.0" "31/Aug/2015:00:19:45 +0800" "GET /course/view.php?id=27&section=4 HTTP/1.1"
  84. "116.216.17.0" "31/Aug/2015:00:19:46 +0800" "GET /theme/image.php/bootstrap/page/1427679483/icon HTTP/1.1"
  85. "116.216.17.0" "31/Aug/2015:00:19:46 +0800" "GET /theme/image.php/bootstrap/core/1427679483/spacer HTTP/1.1"
  86. "116.216.17.0" "31/Aug/2015:00:19:46 +0800" "GET /theme/yui_combo.php?m/1427679483/core/formautosubmit/formautosubmit-min.js HTTP/1.1"
  87. "116.216.17.0" "31/Aug/2015:00:19:54 +0800" "GET /mod/page/view.php?id=11187&section=4 HTTP/1.1"
  88. -->创建moodle
  89. create table if not exists moodle(
  90. ip string,
  91. date string,
  92. url string) row format delimited fields terminated by '\t';
  93. -->加载数据
  94. load data local inpath '/home/liuwl/opt/datas/dd/moodle.log' into table moodle;
  95. -->代码
  96. public Text evaluate(Text text){
  97.   if(text != null){
  98.     String strs = text.toString().replaceAll("\"", "");
  99.     String str = "";
  100.     boolean isDate = false;
  101.     try{
  102.       SimpleDateFormat sdf = new SimpleDateFormat("dd/MMM/yyyy:HH:mm:ss Z", new Locale("ENGLISH", "CHINA"));
  103.       SimpleDateFormat sdf1 = new SimpleDateFormat("yyyyMMddHHmm",new Locale("CHINESE", "CHINA"));
  104.       Date date = sdf.parse(strs);
  105.       str=sdf1.format(date);
  106.       isDate = true;
  107.       }catch(ParseException p){
  108.       isDate = false;
  109.       }
  110.     // is date
  111.     if(isDate){
  112.       return new Text(str);
  113.     }else{
  114.       if(strs.indexOf("HTTP/1.1")>0){
  115.         return new Text(strs.split(" ")[1]);
  116.       }else{
  117.         return new Text(strs.split(" ")[0]);
  118.       }
  119.     }
  120.   }else return null;
  121. }
  122. --> 关联jar
  123. add jar jar_path;
  124. --> 创建方法
  125. create temporary function mymoodle as 'com.hive.udf.mymoodle';
  126. --> 测试
  127. select mymoodle(ip) ip,mymoodle(date) date,mymoodle(url) url from moodle;
  128. ip date url
  129. 116.216.17.0 201508310019 /theme/styles.php/bootstrap/1427679483/all
  130. 116.216.17.0 201508310019 /theme/image.php/bootstrap/theme/1427679483/fp/logo
  131. 116.216.17.0 201508310019 /theme/image.php/bootstrap/core/1427679483/t/expanded
  132. 116.216.17.0 201508310019 /theme/image.php/bootstrap/theme/1427679483/fp/search_btn
  133. 116.216.17.0 201508310019 /theme/image.php/bootstrap/core/1427679483/t/collapsed
  134. 116.216.17.0 201508310019 /theme/image.php/bootstrap/theme/1427679483/fp/footerbg
  135. 116.216.17.0 201508310019 /theme/yui_combo.php?m/1427679483/theme_bootstrap/bootstrap/bootstrap-min.js
  136. 116.216.17.0 201508310019 /theme/yui_combo.php?m/1427679483/block_navigation/navigation/navigation-min.js
  137. 116.216.17.0 201508310019 /theme/yui_combo.php?m/1427679483/theme_bootstrap/zoom/zoom-min.js
  138. 116.216.17.0 201508310019 /theme/yui_combo.php?3.17.2/cssbutton/cssbutton-min.css
  139. 116.216.17.0 201508310019 /theme/yui_combo.php?m/1427679483/core/lockscroll/lockscroll-min.js
  140. 116.216.17.0 201508310019 /theme/image.php/bootstrap/core/1427679483/t/block_to_dock
  141. 116.216.17.0 201508310019 /theme/image.php/bootstrap/core/1427679483/t/switch_plus
  142. 116.216.17.0 201508310019 /theme/image.php/bootstrap/core/1427679483/t/switch_minus
  143. 116.216.17.0 201508310019 /course/view.php?id=27&section=4
  144. 116.216.17.0 201508310019 /theme/image.php/bootstrap/page/1427679483/icon
  145. 116.216.17.0 201508310019 /theme/image.php/bootstrap/core/1427679483/spacer
  146. 116.216.17.0 201508310019 /theme/yui_combo.php?m/1427679483/core/formautosubmit/formautosubmit-min.js
  147. 116.216.17.0 201508310019 /mod/page/view.php?id=11187&section=4

Hive中hiveserver2,beeline,java client

  1. -->hiveserver2启动方式:bin/hiveserver2;bin/hiveserver2;bin/hive --service hiveserver2
  2. -->beeline启动方式:bin/beeline -u jdbc:hive2://hadoop09-linux-01.ibeifeng.com:10000/workdb -n liuwl -p liuwl
  3. --> bin/beeline
  4. --> !connect jdbc:hive2://hadoop09-linux-01.ibeifeng.com:10000/workdb
  5. -->java client
  6. -->启动hiveserver2
  7. -->编写代码:
  8. private static String driverName = "org.apache.hive.jdbc.HiveDriver";
  9. private static String url = "jdbc:hive2://hadoop09-linux-01.ibeifeng.com:10000/workdb";
  10. private static String username = "root";
  11. private static String password = "root";
  12.  
  13. public static Connection getConnection(){
  14.   try {
  15.     Class.forName(driverName);
  16.     Connection con = DriverManager.getConnection(url, username, password);
  17.     return con;
  18.   } catch (ClassNotFoundException e) {
  19.     e.printStackTrace();
  20.     System.exit(1);
  21.   } catch (SQLException e) {
  22.     e.printStackTrace();
  23.   }
  24.   return null;
  25. }
  26.  
  27. public static List<Object> querySql(Statement stmt, String sql) throws SQLException{
  28.  
  29.   ResultSet res = stmt.executeQuery(sql);
  30.   List<Object> objectList = new ArrayList<Object>();
  31.   while (res.next()) {
  32.     objectList.add(res.getString(1));
  33.   }
  34.   return objectList;
  35. }
  36.  
  37. public static void main(String[] args) throws SQLException {
  38.   Connection con = getConnection();
  39.   Statement stmt = con.createStatement();
  40.   List<Object> objectList = new ArrayList<Object>();
  41.   //query
  42.   String qsql = "select * from emp";
  43.   objectList = querySql(stmt,qsql);
  44.   for(int i = 0; i < objectList.size(); i++){
  45.     System.out.println(objectList.get(i));
  46.   }
  47.   //regular
  48.   String rsql = "select count(1) from emp";
  49.   objectList = querySql(stmt,rsql);
  50.   for(int i = 0; i < objectList.size(); i++){
  51.     System.out.println(objectList.get(i));
  52.   }
  53.   //create
  54.   String csql = "create table if not exists test (key int, value string) row format delimited fields terminated by '\t'";
  55.   stmt.execute(csql);
  56.   //load
  57.   //String lsql = "load data local inpath '/home/liuwl/opt/datas/test.txt' into table test";
  58.   //update start as 0.14
  59.   //String usql = "update table test set key =4 where value='uuuu'";
  60.   //stmt.executeUpdate(usql);
  61.   //drop
  62.   String dsql = "drop table if exists test";
  63.   if(!stmt.execute(dsql)){
  64.     System.out.println("success");
  65.   }
  66. }

Hive 本地模式(本地测试,只在当前节点运行)

  1. hive.exec.mode.local.auto=true
  2. 使用条件
  3.   job的输入数据大小不能超过默认参数大小
  4.     inputbytes.size=128M
  5.   job处理map task 的个数
  6.     4(最多4个)
  7.   job处理reduce的个数
  8.     0/1(最多1个)
  9.   本地此时下加快job任务运行效率

Hive 部分调优

  1. 1. 大表拆分小表(过滤出需求分析所需要的字段)
  2. 2. 按字段分类存放
  3. 3. 使用外部表(删除表时仅删除冗余数据信息,不删除数据文件,可让多人使用同一张源表,减少存储空间)
  4. 4. 使用分区表(分区表在hdfs上以文件夹形式存在,提高查询效率,可以手动分区也可以动态分区)
  5. 5. 使用外部表+分区表
  6. 6. 数据的存储格式 列式存储+压缩
  7. 7. SQL语句的优化(优化+filter-->join)
  8. 8. mapreduce优化(并行执行:默认false)
  9.     hive.exec.parallel=true
  10.     hive.exec.parallel.thread.number=8
  11.   jvm重用
  12.     mapreduce.job.jvm.numtasks=$number
  13.   推测执行
  14.     mapreduce.map.speculative=true
  15.     mapreduce.reduce.speculative=true
  16.     hive.mapred.reduce.tasks.speculative.execution=true
  17.   mapreduce的个数
  18.   map的个数
  19.   hdfs块的大小 dfs.blocks.size=128M
  20.   分片的大小 minsize/maxsize
  21.   mapreduce.input.fileinputformat.split.minsize
  22.   企业中
  23.   文件大 200M 100 map默认按块处理
  24.   文件小 40M 400 map按分片

Hive 执行sql两种模式(Fetch Task与mapreduce)

  1. --> hive.fetch.task.conversion--minimal # SELECT STAR, FILTER on partition columns, LIMIT only 译:在select *;使用分区作为过滤条件;limit语句
  2. hive.fetch.task.conversion--more # SELECT, FILTER, LIMIT only (TABLESAMPLE, virtual columns) 译: 在所有select语句,数据取样, 虚拟列,
  3. --> 虚拟列(注意是双下划线)
  4. --> input__file__name # 数据的来源
  5. --> block__offset_inside__file # 记录在块中的偏移量
  6. --> row__offset__inside__block # 行的偏移量(默认不启用,需要修改hive.exec.rowoffset)

Hive 严格模式

  1. --> hive.mapred.mode--(默认nonstrict)
  2. 注意: 在严格模式下
  3. 不允许由风险型sql语句:
  4. 笛卡尔积查询(使用join而不使用onwhere)
  5. 分区表没有指定分区
  6. order by没有使用limit
  7. bigintstring/double的比较

Hive_进阶的更多相关文章

  1. nodejs进阶(6)—连接MySQL数据库

    1. 建库连库 连接MySQL数据库需要安装支持 npm install mysql 我们需要提前安装按mysql sever端 建一个数据库mydb1 mysql> CREATE DATABA ...

  2. nodejs进阶(4)—读取图片到页面

    我们先实现从指定路径读取图片然后输出到页面的功能. 先准备一张图片imgs/dog.jpg. file.js里面继续添加readImg方法,在这里注意读写的时候都需要声明'binary'.(file. ...

  3. JavaScript进阶之路(一)初学者的开始

    一:写在前面的问题和话 一个javascript初学者的进阶之路! 背景:3年后端(ASP.NET)工作经验,javascript水平一般般,前端水平一般般.学习资料:犀牛书. 如有误导,或者错误的地 ...

  4. nodejs进阶(3)—路由处理

    1. url.parse(url)解析 该方法将一个URL字符串转换成对象并返回. url.parse(urlStr, [parseQueryString], [slashesDenoteHost]) ...

  5. nodejs进阶(5)—接收请求参数

    1. get请求参数接收 我们简单举一个需要接收参数的例子 如果有个查找功能,查找关键词需要从url里接收,http://localhost:8000/search?keyword=地球.通过前面的进 ...

  6. nodejs进阶(1)—输出hello world

    下面将带领大家一步步学习nodejs,知道怎么使用nodejs搭建服务器,响应get/post请求,连接数据库等. 搭建服务器页面输出hello world var  http  =  require ...

  7. [C#] 进阶 - LINQ 标准查询操作概述

    LINQ 标准查询操作概述 序 “标准查询运算符”是组成语言集成查询 (LINQ) 模式的方法.大多数这些方法都在序列上运行,其中的序列是一个对象,其类型实现了IEnumerable<T> ...

  8. Java 进阶 hello world! - 中级程序员之路

    Java 进阶 hello world! - 中级程序员之路 Java是一种跨平台的语言,号称:"一次编写,到处运行",在世界编程语言排行榜中稳居第二名(TIOBE index). ...

  9. C#进阶系列——WebApi 接口返回值不困惑:返回值类型详解

    前言:已经有一个月没写点什么了,感觉心里空落落的.今天再来篇干货,想要学习Webapi的园友们速速动起来,跟着博主一起来学习吧.之前分享过一篇 C#进阶系列——WebApi接口传参不再困惑:传参详解  ...

随机推荐

  1. JQuery.Ajax()的data参数类型

    假如现在有这样一个表单,是添加元素用的. <form id='addForm' action='UserAdd.action' type='post'> <label for='un ...

  2. python学习第二天

    dict字典 把数据放入dict:直接赋值.初始化时指定 pop删除key set集合 add添加元素 remove删除元素 字符串str是不可变对象,对字符串的操作都会返回新的字符串 pass 什么 ...

  3. Java 之 多线程编程

    1.线程: a.由来:单任务OS -- 多任务OS b.进程:每一个进程对应一个应用程序,分配独立内存空间 c.线程:线程是进程内部的一个独立的执行分支 d.特点:共享内容地址空间,切换成本更低 2. ...

  4. Struts2基本配置详解

    Struts2配置文件加载顺序 struts2 配置文件由核心控制器加载 StrutsPrepareAndExecuteFilter (预处理,执行过滤) init_DefaultProperties ...

  5. filter应用案例三:解决全站编码问题

    1 说明 乱码问题: 获取请求参数中的乱码问题: POST请求:request.setCharacterEncoding("utf-8"): GET请求:new String(re ...

  6. Servlet请求头response应用简单案例

    Servlet请求头response应用简单案例:访问AServlet重定向到BServlet,5秒后跳到CServlet,并显示图片: AServlet package cn.yzu; import ...

  7. 改了哪里vs调试直接用iis运行

    OAS2Web.csproj 中的useiis为true,但必须要先配置好iis网站指向文件目录  

  8. 【JavaScript】变量定义提升、this指针指向、运算符优先级、原型、继承、全局变量污染、对象属性及原型属性优先级

    参考资料http://caibaojian.com/toutiao/5446 1.所有变量声明(var)或者声明函数都会被提升到当前函数顶部 关于函数表达式,js会将代码拆分为两行代码分别执行.这里需 ...

  9. HTTP基础11--web(3)

    邮件首部注入攻击 指 Web 应用中的邮件发送功能,攻击者通过向邮件首部 To 或 Subject 内任意添加非法内容发起的攻击.利用存在安全漏洞的 Web 网站,可对任意邮件地址发送广告邮件或病毒邮 ...

  10. ember.js:使用笔记1-数组数据统一显示

    ember中数据一般都是以array的形式存储的,控制器使用,如: App.DataController = Em.ArrayController.extend({}); 想要在一个页面中输出所有的数 ...