hive中的join

建表

: jdbc:hive2://localhost:10000> create database myjoin;

No rows affected (3.78 seconds)

: jdbc:hive2://localhost:10000> use myjoin;

No rows affected (0.419 seconds)

: jdbc:hive2://localhost:10000> create table a(id int,name string) row format delimited fields terminated by ',';

No rows affected (2.08 seconds)

: jdbc:hive2://localhost:10000> create table b(id int,name string) row format delimited fields terminated by ',';

: jdbc:hive2://localhost:10000> select * from a

: jdbc:hive2://localhost:10000> ;

+-------+---------+--+

| a.id  | a.name  |

+-------+---------+--+

|      | qq      |

|      | ww      |

|      | ee      |

|      | rr      |

|      | tt      |

|      | yy      |

|      | aa      |

|      | ss      |

|     | zz      |

+-------+---------+--+

 rows selected (1.881 seconds)

: jdbc:hive2://localhost:10000> select * from b;

+-------+---------+--+

| b.id  | b.name  |

+-------+---------+--+

|      | qq      |

|      |       |

|      | dd      |

|      | rr      |

|      | fgf     |

|      | as      |

|      |       |

|     | ww      |

|     |        |

|     |       |

|     |       |

|     | 4r      |

+-------+---------+--+

 rows selected (0.147 seconds)

inner join 的结果，也就是join
0: jdbc:hive2://localhost:10000> select a.*,b.* from a inner join b on a.id = b.id;

INFO  : Execution completed successfully

INFO  : MapredLocal task succeeded

INFO  : Number of reduce tasks is set to  since there's no reduce operator

INFO  : number of splits:

INFO  : Submitting tokens for job: job_1496277833427_0007

INFO  : The url to track the job: http://mini2:8088/proxy/application_1496277833427_0007/

INFO  : Starting Job = job_1496277833427_0007, Tracking URL = http://mini2:8088/proxy/application_1496277833427_0007/

INFO  : Kill Command = /home/hadoop/xxxxxx/hadoop265/bin/hadoop job  -kill job_1496277833427_0007

INFO  : Hadoop job information for Stage-: number of mappers: ; number of reducers:

INFO  : -- ::, Stage- map = %,  reduce = %

INFO  : -- ::, Stage- map = %,  reduce = %, Cumulative CPU 5.05 sec

INFO  : MapReduce Total cumulative CPU time:  seconds  msec

INFO  : Ended Job = job_1496277833427_0007

+-------+---------+-------+---------+--+

| a.id  | a.name  | b.id  | b.name  |

+-------+---------+-------+---------+--+

|      | qq      |      | qq      |

|      | ww      |      |       |

|      | ee      |      | dd      |

|      | rr      |      | rr      |

|      | yy      |      | fgf     |

|      | aa      |      | as      |

+-------+---------+-------+---------+--+

full outer join ，两边的数据都会出来只不过on条件没有对应上的一端会显示为null

: jdbc:hive2://localhost:10000> select a.*,b.* from a full outer join b on a.id = b.id;

INFO  : Number of reduce tasks not specified. Estimated from input data size:

INFO  : In order to change the average load for a reducer (in bytes):

INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>

INFO  : In order to limit the maximum number of reducers:

INFO  :   set hive.exec.reducers.max=<number>

INFO  : In order to set a constant number of reducers:

INFO  :   set mapreduce.job.reduces=<number>

INFO  : number of splits:

INFO  : Submitting tokens for job: job_1496277833427_0008

INFO  : The url to track the job: http://mini2:8088/proxy/application_1496277833427_0008/

INFO  : Starting Job = job_1496277833427_0008, Tracking URL = http://mini2:8088/proxy/application_1496277833427_0008/

INFO  : Kill Command = /home/hadoop/xxxxxx/hadoop265/bin/hadoop job  -kill job_1496277833427_0008

INFO  : Hadoop job information for Stage-: number of mappers: ; number of reducers:

INFO  : -- ::, Stage- map = %,  reduce = %

INFO  : -- ::, Stage- map = %,  reduce = %

INFO  : -- ::, Stage- map = %,  reduce = %

INFO  : -- ::, Stage- map = %,  reduce = %, Cumulative CPU 6.52 sec

INFO  : -- ::, Stage- map = %,  reduce = %, Cumulative CPU 9.17 sec

INFO  : -- ::, Stage- map = %,  reduce = %, Cumulative CPU 12.65 sec

INFO  : MapReduce Total cumulative CPU time:  seconds  msec

INFO  : Ended Job = job_1496277833427_0008

+-------+---------+-------+---------+--+

| a.id  | a.name  | b.id  | b.name  |

+-------+---------+-------+---------+--+

|      | qq      |      | qq      |

|      | ww      |      |       |

|      | ee      |      | dd      |

|      | rr      |      | rr      |

|      | tt      | NULL  | NULL    |

|      | yy      |      | fgf     |

|      | aa      |      | as      |

|      | ss      | NULL  | NULL    |

| NULL  | NULL    |      |       |

|     | zz      | NULL  | NULL    |

| NULL  | NULL    |     |       |

| NULL  | NULL    |     | ww      |

| NULL  | NULL    |     |       |

| NULL  | NULL    |     | 4r      |

| NULL  | NULL    |     |        |

+-------+---------+-------+---------+--+

 rows selected (371.304 seconds)

select a.*from a left semi join b on a.id = b.id; -- from 前不能写b.* 否则会报错（ Error while compiling statement: FAILED: SemanticException [Error 10009]: Line 1:11 Invalid table alias 'b' (state=42000,code=10009)）

替代exist in 的用法，返回值只是inner join 中左边的一般，

+-------+---------+--+

| a.id  | a.name  |

+-------+---------+--+

|      | qq      |

|      | ww      |

|      | ee      |

|      | rr      |

|      | yy      |

|      | aa      |

+-------+---------+--+

没有 right semi join

left semi join 是exist in 的高效实现，比inner join 效率高

hive中的join的更多相关文章

hive中left join、left outer join和left semi join的区别
先说结论,再举例子. hive中,left join与left outer join等价. left semi join与left outer join的区别:left semi join相当 ...
SQL join中级篇--hive中 mapreduce join方法分析
1. 概述. 本文主要介绍了mapreduce框架上如何实现两表JOIN. 2. 常见的join方法介绍假设要进行join的数据分别来自File1和File2. 2.1 reduce side jo ...
关于Hive中的join和left join的理解
一.join与left join的全称 JOIN是INNER JOIN的简写,LEFT JOIN是LEFT OUTER JOIN的简写. 二.join与left join的应用场景 JOIN一般用于A ...
Hive中Join的原理和机制
转自:http://lxw1234.com/archives/2015/06/313.htm 笼统的说,Hive中的Join可分为Common Join(Reduce阶段完成join)和Map Joi ...
Hive中Join的类型和用法
关键字:Hive Join.Hive LEFT|RIGTH|FULL OUTER JOIN.Hive LEFT SEMI JOIN.Hive Cross Join Hive中除了支持和传统数据库中一样 ...
Hive中JOIN操作
1. 只支持相等JOIN. 2. 多表连接当使用不同的列进行JOIN时,会产生多个MR作业. 3. 最后的表的数据是从流中读取,而前面的会在内存中缓存,因此最好把最大的表放在最后. SELECT /* ...
hive 配置文件以及join中null值的处理
一.Hive的參数设置 1. 三种设定方式:配置文件 · 用户自己定义配置文件:$HIVE_CONF_DIR/hive-site.xml · 默认配置文件:$HIVE_CONF_DIR/hi ...
hive中与hbase外部表join时内存溢出（hive处理mapjoin的优化器机制）
与hbase外部表(wizad_mdm_main)进行join出现问题: CREATE TABLE wizad_mdm_dev_lmj_edition_result as select * from ...
hive中的子查询改join操作（转）
这些子查询在oracle和mysql等数据库中都能执行,但是在hive中却不支持,但是我们可以把这些查询语句改为join操作: -- 1.子查询 select * from A a where a.u ...

随机推荐

android中实现内容搜索
在编写android搜索代码的时候,怎样去实现搜索功能,考虑中的有两种: 自己定义搜索方法: 1.自己定义搜索输入框,搜索图标,搜索button 2.自己定义语音输入方法 3.自己定义经常使用热词内容 ...
解决MySQL数据导入报错Got a packet bigger than‘max_allowed_packet’bytes
临时修改:mysql>set global max_allowed_packet=524288000;修改 #512M 这条语句可以在小黑窗里执行,也可以在navicat查询新建查询里执行.
What is Continuous Integration?
什么叫持续集成? 原文: https://docs.microsoft.com/en-us/azure/devops/what-is-continuous-integration ---------- ...
2017.11.21 postgre更新时需要联合其他表的信息
现在需要更新t_user表,但是前台传来的参数 tenant_name 并不在这个表中,需要联合另一个表t_tenant. 要注意的一点是:set后面的字段不要写成 u.fd_validity,否则会 ...
docker实战——构建Jekyll
构建第一个应用要构建的第一个应用是Jekyll框架的自定义网站.我们会构建一下两个镜像. 一个镜像安装Jekyll以及其他用于构建Jekyll网站的必要的软件包. 一个镜像通过Apache来让Jek ...
java之 ------ 图形界面（三）
import java.awt.*; import java.awt.event.*; import javax.swing.*; import javax.swing.border.TitledBo ...
spring中使用 @value 简化配置文件的读取
1.在applicationContext.xml文件中配置properties文件 <bean id="propertyConfigurer" class="or ...
依据错误原理解决Hibernate执行出现No CurrentSessionContext configured!错误
(1)异常信息例如以下: 严重: Servlet.service() for servlet action threw exception java.lang.RuntimeException: &l ...
Django—— 缓存框架
译者注:1.无用的,吹嘘的说辞不翻译:2.意译,很多地方不准确. 动态网站最为重要的一点就是好,网页是动态的.每一次用户请求页面,网站就要进行各种计算——从数据库查询,到render模板,到各种逻辑运 ...
iOS开发－重写description方法，自定义控制台(log)信息
description是所有类都有的一个方法. 我们重写这个方法,可以自定义实例输出的信息. 比如我们创建一个Person类: 在.h文件中添加两个属性: #import <Foundation ...

hive中的join

hive中的join的更多相关文章

随机推荐

热门专题