Hive_进阶
回顾:
- hive 优点
- 1. 类sql语句靠近关系型数据库,可自定义函数,增加了扩展性,易于开发,减少mapreduce学习成本
- 2. hive转换sql语句为mapreduce程序以mapreduce为底层实现
- 3. hive基于hadoop的hdfs,在hdfs上存储,因为hdfs的扩展性,hive的存储扩展性相应增加
hive 安装部署
- 1. 解压安装包
- 2. 进入conf目录,拷贝(备份)相应配置文件,修改
- hive-env.sh
- --> HADOOP_HOME=/opt/cdh-5.6.3/hadoop-2.5.0-cdh5.3.6
- --> export HIVE_CONF_DIR=/opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/conf
- hive-log4j.properties
- --> 在hive根目录下创建日志文件夹,来存放hive运行日志
- --> hive.log.threshold=ALL
- hive.root.logger=INFO,DRFA
- hive.log.dir=/opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/logs
- hive.log.file=hive.log
- hive-site.xml
- --> javax.jdo.option.ConnectionURL --- jdbc:mysql://hadoop09-linux-01.ibeifeng.com:3306/chd_metastore?createDatabaseIfNotExist=true
- --> javax.jdo.option.ConnectionDriverName --- com.mysql.jdbc.Driver
- --> javax.jdo.option.ConnectionUserName --- root
- --> javax.jdo.option.ConnectionPassword --- root
- --> hive.cli.print.header --- true # 这行表示是否显示标的列名(可不配)
- --> hive.cli.print.current.db --- true # 这行表示是否显示当前数据库名(可不配)
- --> hive.fetch.task.conversion --- true # 这行表示运行sql语句是否走mr(可不配)
Hive 架构
- 1. metastore
- --> 在derby数据库存储,在hive目录中会生成derby文件和metastore_db,弊端同级目录下启动hive会报错
- --> 在mysql中存储元数据
- --> 在远程mysql存储元数据
- 2. client
- --> cli/jdbc/Driver/SQLParser/QueryOptimizer/Physical Plan/Execution
Hive 创建表几种方式,分别是什么
- 1. 普通建表
- create table if not exists tablename(...)
- row format delimited fields terminated by '\t';
- stored as textfile;
- 2. 子查询方式
- create table if not exists tablename as select * from tablename2;
- 3. like 方式
- create table if not exists tablename like tablename2 # 该创建方式仅复制tablename2的表结构
表的类型
- 1. 管理表(默认表类型)
- 2. 外部表(external;解决多用户使用同一表)
- 3. 分区表(partiion;优化分析查询表数据)
- --> 查看分区表: show partitions tablename; # 查看tablename的分区情况
- 手动添加分区
- 1. 创建分区
- hive (workdb)> dfs -mkdir /user/hive/warehouse/workdb.db/emp_part/date=20161029
- dfs -put /home/liuwl/opt/datas/emp.txt /user/hive/warehouse/workdb.db/emp_part/date=20161029
- --- 发现使用show partitions emp_parts; 不能检索出刚刚手动添加的表分区
- 2. 解决:alter tabel emp_part add partition (date='20161029')
分析函数和窗口函数(重点)
- 1. 分析函数
- 部门20的所有员工,按薪资降序排列
- select * from emp where emp.deptno='20' order by sal desc;
- 所有部门分组,按薪资降序排列
- select empno,ename,deptno,sal,max(sal) over (partition by deptno order by sal desc) as maxsal from emp;
- 结果:
- empno ename deptno sal maxsal
- 7839 KING 10 5000.0 5000.0
- 7782 CLARK 10 2450.0 5000.0
- 7934 MILLER 10 1300.0 5000.0
- 7788 SCOTT 20 3000.0 3000.0
- 7902 FORD 20 3000.0 3000.0
- 7566 JONES 20 2975.0 3000.0
- 7876 ADAMS 20 1100.0 3000.0
- 7369 SMITH 20 800.0 3000.0
- 7698 BLAKE 30 2850.0 2850.0
- 7499 ALLEN 30 1600.0 2850.0
- 7844 TURNER 30 1500.0 2850.0
- 7654 MARTIN 30 1250.0 2850.0
- 7521 WARD 30 1250.0 2850.0
- 7900 JAMES 30 950.0 2850.0
- 实现行号
- select empno,ename,deptno,sal,row_number() over (partition by deptno order by sal desc) as rownum from emp;
- empno ename deptno sal rownum
- 7839 KING 10 5000.0 1
- 7782 CLARK 10 2450.0 2
- 7934 MILLER 10 1300.0 3
- 7788 SCOTT 20 3000.0 1
- 7902 FORD 20 3000.0 2
- 7566 JONES 20 2975.0 3
- 7876 ADAMS 20 1100.0 4
- 7369 SMITH 20 800.0 5
- 7698 BLAKE 30 2850.0 1
- 7499 ALLEN 30 1600.0 2
- 7844 TURNER 30 1500.0 3
- 7654 MARTIN 30 1250.0 4
- 7521 WARD 30 1250.0 5
- 7900 JAMES 30 950.0 6
- ROW_NUMBER() 行号
- 获取工资最高的前两位
- select * from (select empno,ename,deptno,sal,row_number() over (partition by deptno order by sal desc) as rownum from emp) t where t.rownum < 3;
- t.empno t.ename t.deptno t.sal t.rownum
- 7839 KING 10 5000.0 1
- 7782 CLARK 10 2450.0 2
- 7788 SCOTT 20 3000.0 1
- 7902 FORD 20 3000.0 2
- 7698 BLAKE 30 2850.0 1
- 7499 ALLEN 30 1600.0 2
- RANK()排名(第一二位为1,第三位为3,默认第二位为1)
- DENSE_RANK()(第一二位为1,第三位为2,默认第二位为1)
- 为所有部门分组,按薪资降序排列,且进行排名
- select empno,ename,deptno,sal,rank() over (partition by deptno order by sal desc) ranksal from emp;
- empno ename deptno sal ranksal
- 7839 KING 10 5000.0 1
- 7782 CLARK 10 2450.0 2
- 7934 MILLER 10 1300.0 3
- 7788 SCOTT 20 3000.0 1
- 7902 FORD 20 3000.0 1
- 7566 JONES 20 2975.0 3
- 7876 ADAMS 20 1100.0 4
- 7369 SMITH 20 800.0 5
- 7698 BLAKE 30 2850.0 1
- 7499 ALLEN 30 1600.0 2
- 7844 TURNER 30 1500.0 3
- 7654 MARTIN 30 1250.0 4
- 7521 WARD 30 1250.0 4
- 7900 JAMES 30 950.0 6
- select empno,ename,deptno,sal,dense_rank() over (partition by deptno order by sal desc) ranksal from emp;
- empno ename deptno sal dense_ranksal
- 7839 KING 10 5000.0 1
- 7782 CLARK 10 2450.0 2
- 7934 MILLER 10 1300.0 3
- 7788 SCOTT 20 3000.0 1
- 7902 FORD 20 3000.0 1
- 7566 JONES 20 2975.0 2
- 7876 ADAMS 20 1100.0 3
- 7369 SMITH 20 800.0 4
- 7698 BLAKE 30 2850.0 1
- 7499 ALLEN 30 1600.0 2
- 7844 TURNER 30 1500.0 3
- 7654 MARTIN 30 1250.0 4
- 7521 WARD 30 1250.0 4
- 7900 JAMES 30 950.0 5
- NTILE()层次查询
- 例:查询出所有组中工资水平前1/3人员
- select empno,ename,sal,ntile(3) over (order by sal desc ) ntile from emp group by empno,ename;
- empno ename sal til
- 7839 KING 5000.0 1
- 7902 FORD 3000.0 1
- 7788 SCOTT 3000.0 1
- 7566 JONES 2975.0 1
- 7698 BLAKE 2850.0 1
- 7782 CLARK 2450.0 2
- 7499 ALLEN 1600.0 2
- 7844 TURNER 1500.0 2
- 7934 MILLER 1300.0 2
- 7654 MARTIN 1250.0 2
- 7521 WARD 1250.0 3
- 7876 ADAMS 1100.0 3
- 7900 JAMES 950.0 3
- 7369 SMITH 800.0 3
- 2. 窗口函数LAG(向前取值)LEAG(向后取值)
- select empno,ename,sal,lag(ename,4,0) over (order by sal desc) lagvalue from emp;
- 结果:
- empno ename sal lagvalue
- 7839 KING 5000.0 0
- 7902 FORD 3000.0 0
- 7788 SCOTT 3000.0 0
- 7566 JONES 2975.0 0
- 7698 BLAKE 2850.0 KING
- 7782 CLARK 2450.0 FORD
- 7499 ALLEN 1600.0 SCOTT
- 7844 TURNER 1500.0 JONES
- 7934 MILLER 1300.0 BLAKE
- 7654 MARTIN 1250.0 CLARK
- 7521 WARD 1250.0 ALLEN
- 7876 ADAMS 1100.0 TURNER
- 7900 JAMES 950.0 MILLER
- 7369 SMITH 800.0 MARTIN
- select empno,ename,sal,lead(ename,4,0) over (order by sal desc) leadvalue from emp;
- 结果:
- empno ename sal leadvalue
- 7839 KING 5000.0 BLAKE
- 7902 FORD 3000.0 CLARK
- 7788 SCOTT 3000.0 ALLEN
- 7566 JONES 2975.0 TURNER
- 7698 BLAKE 2850.0 MILLER
- 7782 CLARK 2450.0 MARTIN
- 7499 ALLEN 1600.0 WARD
- 7844 TURNER 1500.0 ADAMS
- 7934 MILLER 1300.0 JAMES
- 7654 MARTIN 1250.0 SMITH
- 7521 WARD 1250.0 0
- 7876 ADAMS 1100.0 0
- 7900 JAMES 950.0 0
- 7369 SMITH 800.0 0
Hive中的case when then
- 1. case key
- when value1 then ''
- when value2 then ''
- else ''
- end
- 2. case
- when key='value1' then ''
- when key='value2' then ''
- else ''
- end
- 例:
- select empno,ename,sal,deptno,
- case when deptno=10 then 'U deptno is 10' when deptno=20 then 'U deptno is 20' else 'U deptno is 30' end from emp;
- Hive中类型转换(cast(key as type))
- Hive中记录时间的格式(unix_timestamp())
数据导入Hive(重点)
- 1. 从本地导入
- load data local inpath 'filepath' into table tbname;
- 2. 从hdfs导入
- load data inpath 'hdfs_filepath' into table tbname;
- 3. load覆盖
- load data local inpath 'filepath' overwrite into table tbname;
- load data inpath 'hdfs_filepath' overwrite into table tbname;
- 4. 子查询方式
- create table tb2 as select * from tb1; # 默认分隔符为^A
- 5. insert into table select ql; # 分隔符为emp定义的分隔符
- --> create table emp_insert like emp;
- --> insert into table emp_insert select * from emp;
- 6. location方式
- create table if not exists tbname location 'localPath';
Hive 数据导出(重点)
- 1. insert方式(注意使用该方式导出数据到本地,1:文件夹得有相应权限;2:最好建一个文件夹,否则原文件夹下所有内容被覆盖)
- --> insert overwrite [local] directory 'path' select ql ;
- 例:insert overwrite local directory '/tmp' row format delimited fields terminated by '\t' select * from emp;
- 2. bin/hdfs -get
- 3. Linux命令执行HQL:
- -> -e
- -> -f
- -> 输出重定向
- 4. sqoop:用户hdfs与关系型数据库之间的导入导出
Hive export与import(相关地址只能是hdfsPath)
- -> export
- export table tb_name to 'hdfs_path'
- 例: export table emp to '/export_emp';
- -> import
- import table tb_name from 'hdfs_path'
- 例: import table emp_im from '/export_emp';
Hive HQL
- 1. 字段查询
- -> select empno,ename from emp;
- 2. where、limit、distinct
- -> select * from emp where sal > 3000;
- -> select * from emp limit 5;
- -> select distinct deptno from emp;
- 3. between and,>,<,=,is null,is not null,in
- -> select * from emp where sal between 2000 and 3000;
- -> select * from emp where comm is null;
- -> select * from emp where sal in (2000,3000,4000);
- 4. count(),sum(),avg(),max(),min()
- -> select count(1) from emp;
- -> select sum(sal) form emp;
- -> select avg(sal) from emp;
- 5. group by,having
- -> select deptno,avg(sal) from emp group by deptno;
- -> select deptno,avg(sal) avgsal from emp group by deptno having avgsal >= 3000;
- 6. join
- -> 等值join(匹配共有的记录)
- select e.empno,e.deptno,e.ename,e.sal,e.mgr from emp e join dept d on e.deptno=d.deptno;
- -> 左join(左边为小表)
- select e.empno,e.deptno,e.ename,e.sal,e.mgr from emp e left join dept d on e.deptno=d.deptno;
- -> 右join(右边为小表)
- select e.empno,e.deptno,e.ename,e.sal,e.mgr from emp e right join dept d on e.deptno=d.deptno;
- -> 全join
- select e.empno,e.deptno,e.ename,e.sal,e.mgr from emp e full join dept d on e.deptno=d.deptno;
Hive mapreduce相关操作(重点)
- 1. 设置每个reduce处理的数据量
- -> set hive.exec.reducers.bytes.per.reducer;
- 默认1G,当处理的数据量为10G时,将开启10个reduce,每个reduce处理1G
- 2. 设置最大运行reduce的个数
- -> set hive.exec.reducers.max;
- 默认最大运行个数为999
- 3. 设置实际reduce的个数
- -> set mapreduce.job.reduces;
- hive打印显示-1,hadoop默认为1
Hive 几种排序(重点)
- 1. order by (只针对一个文件排序,当有多个reduce Task生成多个文件时,排序失效)
- -> select * from emp order by sal desc;
- 2. sort by (对于每一个文件进行排序,注意目必须是绝对路径)
- -> set mapreduce.job.reduces=3;
- -> insert overwrite local directory '/home/liuwl/opt/datas/sortData' row format delimited fields terminated by '\t' select * from emp sort by sal;
- 需要注意的是如果使用order by 不管reduce Task设置多少,只生成一个文件,并为该文件排序
- 3. distribute by (底层为mapreduce的分区,一般与sort by 连用)
- -> insert overwrite local directory '/home/liuwl/opt/datas/sortData' row format delimited fields terminated by '\t' select * from emp distribute by deptno sort by sal;
- 4. cluster by (等价于distribute by xx sort by xx) # xx为同一字段
- -> insert overwrite local directory '/home/liuwl/opt/datas/sortData' row format delimited fields terminated by '\t' select * from emp cluster by sal ;
Hive UDF(自定义函数,允许用户扩展HiveOL功能)(重点)
- 1. udf: 一进一出 upper/lower/day
- 2. udaf: 多进一出 count/max/min
- 3. udtf: 一进多出 ateral/view/explode
- udf:编程步骤
- --> 继承 org.apache.hadoop.hive.ql.UDF
- --> 实现 evaluate函数,evaluate函数支持重载
- 注意:UDF必须要有返回类型,可以返回NULL,但是返回类型不能为void
- UDF中常用Text/LongWritable等类型,不推荐使用java
- -->代码
- public Text evaluate(Text str){
- return this.evaluate(str,new IntWritable(0));
- }
- public Text evaluate(Text str, IntWritable flag){
- if(str != null){
- if(flag.get() == 0){
- return new Text(str.toString().toLowerCase());
- }else if(flag.get() ==1){
- return new Text(str.toString().toUpperCase());
- }else return null;
- }else return null;
- }
- --> 打包
- --> 在hive中添加
- ---> 关联jar包
- add jar jar_path;
- ---> 创建方法
- create temporary function tolower as 'com.hive.udf.UDFTest'
- ---> 测试
- select ename,tolower(ename),tolower(tolower(ename),1) lowername from emp;
- ename lowername uppername
- SMITH smith SMITH
- ALLEN allen ALLEN
- WARD ward WARD
- JONES jones JONES
- MARTIN martin MARTIN
- BLAKE blake BLAKE
- CLARK clark CLARK
- SCOTT scott SCOTT
- KING king KING
- TURNER turner TURNER
- ADAMS adams ADAMS
- JAMES james JAMES
- FORD ford FORD
- MILLER miller MILLER
- -->案例2:去除所有双引号
- -->代码
- public Text evaluate(Text str){
- if(str != null){
- return new Text(str.toString().replaceAll("\"", ""));
- }else return null;
- }
- --> 关联jar包
- add jar jar_path;
- --> 创建方法
- create temporary function rmquotes as 'com.hive.udf.RMQuotes'
- --> 测试
- select dname,rmquotes(dname) rmquotes from dept_quotes;
- dname rmquotes
- "ACCOUNTING" ACCOUNTING
- "RESEARCH" RESEARCH
- "SALES" SALES
- "OPERATIONS" OPERATIONS
- -->案例3:去除多有引号,并取出get后的路径
- 如:"116.216.17.0" "31/Aug/2015:00:19:42 +0800" "GET /theme/styles.php/bootstrap/1427679483/all HTTP/1.1"
- "116.216.17.0" "31/Aug/2015:00:19:42 +0800" /theme/styles.php/bootstrap/1427679483/all
- -->准备数据:
- moodel.log
- "116.216.17.0" "31/Aug/2015:00:19:42 +0800" "GET /theme/styles.php/bootstrap/1427679483/all HTTP/1.1"
- "116.216.17.0" "31/Aug/2015:00:19:42 +0800" "GET /theme/image.php/bootstrap/theme/1427679483/fp/logo HTTP/1.1"
- "116.216.17.0" "31/Aug/2015:00:19:42 +0800" "GET /theme/image.php/bootstrap/core/1427679483/t/expanded HTTP/1.1"
- "116.216.17.0" "31/Aug/2015:00:19:42 +0800" "GET /theme/image.php/bootstrap/theme/1427679483/fp/search_btn HTTP/1.1"
- "116.216.17.0" "31/Aug/2015:00:19:42 +0800" "GET /theme/image.php/bootstrap/core/1427679483/t/collapsed HTTP/1.1"
- "116.216.17.0" "31/Aug/2015:00:19:42 +0800" "GET /theme/image.php/bootstrap/theme/1427679483/fp/footerbg HTTP/1.1"
- "116.216.17.0" "31/Aug/2015:00:19:42 +0800" "GET /theme/yui_combo.php?m/1427679483/theme_bootstrap/bootstrap/bootstrap-min.js HTTP/1.1"
- "116.216.17.0" "31/Aug/2015:00:19:42 +0800" "GET /theme/yui_combo.php?m/1427679483/block_navigation/navigation/navigation-min.js HTTP/1.1"
- "116.216.17.0" "31/Aug/2015:00:19:43 +0800" "GET /theme/yui_combo.php?m/1427679483/theme_bootstrap/zoom/zoom-min.js HTTP/1.1"
- "116.216.17.0" "31/Aug/2015:00:19:43 +0800" "GET /theme/yui_combo.php?3.17.2/cssbutton/cssbutton-min.css HTTP/1.1"
- "116.216.17.0" "31/Aug/2015:00:19:43 +0800" "GET /theme/yui_combo.php?m/1427679483/core/lockscroll/lockscroll-min.js HTTP/1.1"
- "116.216.17.0" "31/Aug/2015:00:19:43 +0800" "GET /theme/image.php/bootstrap/core/1427679483/t/block_to_dock HTTP/1.1"
- "116.216.17.0" "31/Aug/2015:00:19:43 +0800" "GET /theme/image.php/bootstrap/core/1427679483/t/switch_plus HTTP/1.1"
- "116.216.17.0" "31/Aug/2015:00:19:43 +0800" "GET /theme/image.php/bootstrap/core/1427679483/t/switch_minus HTTP/1.1"
- "116.216.17.0" "31/Aug/2015:00:19:45 +0800" "GET /course/view.php?id=27§ion=4 HTTP/1.1"
- "116.216.17.0" "31/Aug/2015:00:19:46 +0800" "GET /theme/image.php/bootstrap/page/1427679483/icon HTTP/1.1"
- "116.216.17.0" "31/Aug/2015:00:19:46 +0800" "GET /theme/image.php/bootstrap/core/1427679483/spacer HTTP/1.1"
- "116.216.17.0" "31/Aug/2015:00:19:46 +0800" "GET /theme/yui_combo.php?m/1427679483/core/formautosubmit/formautosubmit-min.js HTTP/1.1"
- "116.216.17.0" "31/Aug/2015:00:19:54 +0800" "GET /mod/page/view.php?id=11187§ion=4 HTTP/1.1"
- -->创建moodle表
- create table if not exists moodle(
- ip string,
- date string,
- url string) row format delimited fields terminated by '\t';
- -->加载数据
- load data local inpath '/home/liuwl/opt/datas/dd/moodle.log' into table moodle;
- -->代码
- public Text evaluate(Text text){
- if(text != null){
- String strs = text.toString().replaceAll("\"", "");
- String str = "";
- boolean isDate = false;
- try{
- SimpleDateFormat sdf = new SimpleDateFormat("dd/MMM/yyyy:HH:mm:ss Z", new Locale("ENGLISH", "CHINA"));
- SimpleDateFormat sdf1 = new SimpleDateFormat("yyyyMMddHHmm",new Locale("CHINESE", "CHINA"));
- Date date = sdf.parse(strs);
- str=sdf1.format(date);
- isDate = true;
- }catch(ParseException p){
- isDate = false;
- }
- // is date
- if(isDate){
- return new Text(str);
- }else{
- if(strs.indexOf("HTTP/1.1")>0){
- return new Text(strs.split(" ")[1]);
- }else{
- return new Text(strs.split(" ")[0]);
- }
- }
- }else return null;
- }
- --> 关联jar包
- add jar jar_path;
- --> 创建方法
- create temporary function mymoodle as 'com.hive.udf.mymoodle';
- --> 测试
- select mymoodle(ip) ip,mymoodle(date) date,mymoodle(url) url from moodle;
- ip date url
- 116.216.17.0 201508310019 /theme/styles.php/bootstrap/1427679483/all
- 116.216.17.0 201508310019 /theme/image.php/bootstrap/theme/1427679483/fp/logo
- 116.216.17.0 201508310019 /theme/image.php/bootstrap/core/1427679483/t/expanded
- 116.216.17.0 201508310019 /theme/image.php/bootstrap/theme/1427679483/fp/search_btn
- 116.216.17.0 201508310019 /theme/image.php/bootstrap/core/1427679483/t/collapsed
- 116.216.17.0 201508310019 /theme/image.php/bootstrap/theme/1427679483/fp/footerbg
- 116.216.17.0 201508310019 /theme/yui_combo.php?m/1427679483/theme_bootstrap/bootstrap/bootstrap-min.js
- 116.216.17.0 201508310019 /theme/yui_combo.php?m/1427679483/block_navigation/navigation/navigation-min.js
- 116.216.17.0 201508310019 /theme/yui_combo.php?m/1427679483/theme_bootstrap/zoom/zoom-min.js
- 116.216.17.0 201508310019 /theme/yui_combo.php?3.17.2/cssbutton/cssbutton-min.css
- 116.216.17.0 201508310019 /theme/yui_combo.php?m/1427679483/core/lockscroll/lockscroll-min.js
- 116.216.17.0 201508310019 /theme/image.php/bootstrap/core/1427679483/t/block_to_dock
- 116.216.17.0 201508310019 /theme/image.php/bootstrap/core/1427679483/t/switch_plus
- 116.216.17.0 201508310019 /theme/image.php/bootstrap/core/1427679483/t/switch_minus
- 116.216.17.0 201508310019 /course/view.php?id=27§ion=4
- 116.216.17.0 201508310019 /theme/image.php/bootstrap/page/1427679483/icon
- 116.216.17.0 201508310019 /theme/image.php/bootstrap/core/1427679483/spacer
- 116.216.17.0 201508310019 /theme/yui_combo.php?m/1427679483/core/formautosubmit/formautosubmit-min.js
- 116.216.17.0 201508310019 /mod/page/view.php?id=11187§ion=4
Hive中hiveserver2,beeline,java client
- -->hiveserver2启动方式:bin/hiveserver2;bin/hiveserver2;bin/hive --service hiveserver2
- -->beeline启动方式:bin/beeline -u jdbc:hive2://hadoop09-linux-01.ibeifeng.com:10000/workdb -n liuwl -p liuwl
- --> bin/beeline
- --> !connect jdbc:hive2://hadoop09-linux-01.ibeifeng.com:10000/workdb
- -->java client
- -->启动hiveserver2
- -->编写代码:
- private static String driverName = "org.apache.hive.jdbc.HiveDriver";
- private static String url = "jdbc:hive2://hadoop09-linux-01.ibeifeng.com:10000/workdb";
- private static String username = "root";
- private static String password = "root";
- public static Connection getConnection(){
- try {
- Class.forName(driverName);
- Connection con = DriverManager.getConnection(url, username, password);
- return con;
- } catch (ClassNotFoundException e) {
- e.printStackTrace();
- System.exit(1);
- } catch (SQLException e) {
- e.printStackTrace();
- }
- return null;
- }
- public static List<Object> querySql(Statement stmt, String sql) throws SQLException{
- ResultSet res = stmt.executeQuery(sql);
- List<Object> objectList = new ArrayList<Object>();
- while (res.next()) {
- objectList.add(res.getString(1));
- }
- return objectList;
- }
- public static void main(String[] args) throws SQLException {
- Connection con = getConnection();
- Statement stmt = con.createStatement();
- List<Object> objectList = new ArrayList<Object>();
- //query
- String qsql = "select * from emp";
- objectList = querySql(stmt,qsql);
- for(int i = 0; i < objectList.size(); i++){
- System.out.println(objectList.get(i));
- }
- //regular
- String rsql = "select count(1) from emp";
- objectList = querySql(stmt,rsql);
- for(int i = 0; i < objectList.size(); i++){
- System.out.println(objectList.get(i));
- }
- //create
- String csql = "create table if not exists test (key int, value string) row format delimited fields terminated by '\t'";
- stmt.execute(csql);
- //load
- //String lsql = "load data local inpath '/home/liuwl/opt/datas/test.txt' into table test";
- //update start as 0.14
- //String usql = "update table test set key =4 where value='uuuu'";
- //stmt.executeUpdate(usql);
- //drop
- String dsql = "drop table if exists test";
- if(!stmt.execute(dsql)){
- System.out.println("success");
- }
- }
Hive 本地模式(本地测试,只在当前节点运行)
- hive.exec.mode.local.auto=true
- 使用条件
- job的输入数据大小不能超过默认参数大小
- inputbytes.size=128M
- job处理map task 的个数
- 4(最多4个)
- job处理reduce的个数
- 0/1(最多1个)
- 本地此时下加快job任务运行效率
Hive 部分调优
- 1. 大表拆分小表(过滤出需求分析所需要的字段)
- 2. 按字段分类存放
- 3. 使用外部表(删除表时仅删除冗余数据信息,不删除数据文件,可让多人使用同一张源表,减少存储空间)
- 4. 使用分区表(分区表在hdfs上以文件夹形式存在,提高查询效率,可以手动分区也可以动态分区)
- 5. 使用外部表+分区表
- 6. 数据的存储格式 列式存储+压缩
- 7. SQL语句的优化(优化+filter-->join)
- 8. mapreduce优化(并行执行:默认false)
- hive.exec.parallel=true
- hive.exec.parallel.thread.number=8
- jvm重用
- mapreduce.job.jvm.numtasks=$number
- 推测执行
- mapreduce.map.speculative=true
- mapreduce.reduce.speculative=true
- hive.mapred.reduce.tasks.speculative.execution=true
- map和reduce的个数
- map的个数
- hdfs块的大小 dfs.blocks.size=128M
- 分片的大小 minsize/maxsize
- mapreduce.input.fileinputformat.split.minsize
- 企业中
- 文件大 少 200M 100个 map默认按块处理
- 文件小 多 40M 400个 map按分片
Hive 执行sql两种模式(Fetch Task与mapreduce)
- --> hive.fetch.task.conversion--minimal # SELECT STAR, FILTER on partition columns, LIMIT only 译:在select *;使用分区作为过滤条件;limit语句
- hive.fetch.task.conversion--more # SELECT, FILTER, LIMIT only (TABLESAMPLE, virtual columns) 译: 在所有select语句,数据取样, 虚拟列,
- --> 虚拟列(注意是双下划线)
- --> input__file__name # 数据的来源
- --> block__offset_inside__file # 记录在块中的偏移量
- --> row__offset__inside__block # 行的偏移量(默认不启用,需要修改hive.exec.rowoffset)
Hive 严格模式
- --> hive.mapred.mode--(默认nonstrict)
- 注意: 在严格模式下
- 不允许由风险型sql语句:
- 笛卡尔积查询(使用join而不使用on或where)
- 分区表没有指定分区
- order by没有使用limit
- bigint与string/double的比较
Hive_进阶的更多相关文章
- nodejs进阶(6)—连接MySQL数据库
1. 建库连库 连接MySQL数据库需要安装支持 npm install mysql 我们需要提前安装按mysql sever端 建一个数据库mydb1 mysql> CREATE DATABA ...
- nodejs进阶(4)—读取图片到页面
我们先实现从指定路径读取图片然后输出到页面的功能. 先准备一张图片imgs/dog.jpg. file.js里面继续添加readImg方法,在这里注意读写的时候都需要声明'binary'.(file. ...
- JavaScript进阶之路(一)初学者的开始
一:写在前面的问题和话 一个javascript初学者的进阶之路! 背景:3年后端(ASP.NET)工作经验,javascript水平一般般,前端水平一般般.学习资料:犀牛书. 如有误导,或者错误的地 ...
- nodejs进阶(3)—路由处理
1. url.parse(url)解析 该方法将一个URL字符串转换成对象并返回. url.parse(urlStr, [parseQueryString], [slashesDenoteHost]) ...
- nodejs进阶(5)—接收请求参数
1. get请求参数接收 我们简单举一个需要接收参数的例子 如果有个查找功能,查找关键词需要从url里接收,http://localhost:8000/search?keyword=地球.通过前面的进 ...
- nodejs进阶(1)—输出hello world
下面将带领大家一步步学习nodejs,知道怎么使用nodejs搭建服务器,响应get/post请求,连接数据库等. 搭建服务器页面输出hello world var http = require ...
- [C#] 进阶 - LINQ 标准查询操作概述
LINQ 标准查询操作概述 序 “标准查询运算符”是组成语言集成查询 (LINQ) 模式的方法.大多数这些方法都在序列上运行,其中的序列是一个对象,其类型实现了IEnumerable<T> ...
- Java 进阶 hello world! - 中级程序员之路
Java 进阶 hello world! - 中级程序员之路 Java是一种跨平台的语言,号称:"一次编写,到处运行",在世界编程语言排行榜中稳居第二名(TIOBE index). ...
- C#进阶系列——WebApi 接口返回值不困惑:返回值类型详解
前言:已经有一个月没写点什么了,感觉心里空落落的.今天再来篇干货,想要学习Webapi的园友们速速动起来,跟着博主一起来学习吧.之前分享过一篇 C#进阶系列——WebApi接口传参不再困惑:传参详解 ...
随机推荐
- JQuery.Ajax()的data参数类型
假如现在有这样一个表单,是添加元素用的. <form id='addForm' action='UserAdd.action' type='post'> <label for='un ...
- python学习第二天
dict字典 把数据放入dict:直接赋值.初始化时指定 pop删除key set集合 add添加元素 remove删除元素 字符串str是不可变对象,对字符串的操作都会返回新的字符串 pass 什么 ...
- Java 之 多线程编程
1.线程: a.由来:单任务OS -- 多任务OS b.进程:每一个进程对应一个应用程序,分配独立内存空间 c.线程:线程是进程内部的一个独立的执行分支 d.特点:共享内容地址空间,切换成本更低 2. ...
- Struts2基本配置详解
Struts2配置文件加载顺序 struts2 配置文件由核心控制器加载 StrutsPrepareAndExecuteFilter (预处理,执行过滤) init_DefaultProperties ...
- filter应用案例三:解决全站编码问题
1 说明 乱码问题: 获取请求参数中的乱码问题: POST请求:request.setCharacterEncoding("utf-8"): GET请求:new String(re ...
- Servlet请求头response应用简单案例
Servlet请求头response应用简单案例:访问AServlet重定向到BServlet,5秒后跳到CServlet,并显示图片: AServlet package cn.yzu; import ...
- 改了哪里vs调试直接用iis运行
OAS2Web.csproj 中的useiis为true,但必须要先配置好iis网站指向文件目录
- 【JavaScript】变量定义提升、this指针指向、运算符优先级、原型、继承、全局变量污染、对象属性及原型属性优先级
参考资料http://caibaojian.com/toutiao/5446 1.所有变量声明(var)或者声明函数都会被提升到当前函数顶部 关于函数表达式,js会将代码拆分为两行代码分别执行.这里需 ...
- HTTP基础11--web(3)
邮件首部注入攻击 指 Web 应用中的邮件发送功能,攻击者通过向邮件首部 To 或 Subject 内任意添加非法内容发起的攻击.利用存在安全漏洞的 Web 网站,可对任意邮件地址发送广告邮件或病毒邮 ...
- ember.js:使用笔记1-数组数据统一显示
ember中数据一般都是以array的形式存储的,控制器使用,如: App.DataController = Em.ArrayController.extend({}); 想要在一个页面中输出所有的数 ...