Hash Join: Basic Steps
Joins https://docs.oracle.com/database/121/TGSQL/tgsql_join.htm#TGSQL242
tidb/index_lookup_hash_join.go at master · pingcap/tidb https://github.com/pingcap/tidb/blob/master/executor/index_lookup_hash_join.go
Hash Join: Basic Steps
The optimizer uses the smaller data source to build a hash table on the join key in memory, and then scans the larger table to find the joined rows.
The basic steps are as follows:
The database performs a full scan of the smaller data set, called the build table, and then applies a hash function to the join key in each row to build a hash table in the PGA.
In pseudocode, the algorithm might look as follows:
FOR small_table_row IN (SELECT * FROM small_table)
LOOP
slot_number := HASH(small_table_row.join_key);
INSERT_HASH_TABLE(slot_number,small_table_row);
END LOOP;The database probes the second data set, called the probe table, using whichever access mechanism has the lowest cost.
Typically, the database performs a full scan of both the smaller and larger data set. The algorithm in pseudocode might look as follows:
FOR large_table_row IN (SELECT * FROM large_table)
LOOP
slot_number := HASH(large_table_row.join_key);
small_table_row = LOOKUP_HASH_TABLE(slot_number,large_table_row.join_key);
IF small_table_row FOUND
THEN
output small_table_row + large_table_row;
END IF;
END LOOP;For each row retrieved from the larger data set, the database does the following:
Applies the same hash function to the join column or columns to calculate the number of the relevant slot in the hash table.
For example, to probe the hash table for department ID
30
, the database applies the hash function to30
, which generates the hash value4
.Probes the hash table to determine whether rows exists in the slot.
If no rows exist, then the database processes the next row in the larger data set. If rows exist, then the database proceeds to the next step.
Checks the join column or columns for a match. If a match occurs, then the database either reports the rows or passes them to the next step in the plan, and then processes the next row in the larger data set.
If multiple rows exist in the hash table slot, the database walks through the linked list of rows, checking each one. For example, if department
30
hashes to slot4
, then the database checks each row until it finds30
.
Example 9-4 Hash Joins
An application queries the oe.orders
and oe.order_items
tables, joining on the order_id
column.
SELECT o.customer_id, l.unit_price * l.quantity
FROM orders o, order_items l
WHERE l.order_id = o.order_id;
The execution plan is as follows:
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 665 | 13300 | 8 (25)|
|* 1 | HASH JOIN | | 665 | 13300 | 8 (25)|
| 2 | TABLE ACCESS FULL | ORDERS | 105 | 840 | 4 (25)|
| 3 | TABLE ACCESS FULL | ORDER_ITEMS | 665 | 7980 | 4 (25)|
-------------------------------------------------------------------------- Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("L"."ORDER_ID"="O"."ORDER_ID")
Because the orders
table is small relative to the order_items
table, which is 6 times larger, the database hashes orders
. In a hash join, the data set for the build table always appears first in the list of operations (Step 2). In Step 3, the database performs a full scan of the larger order_items
later, probing the hash table for each row.
How Hash Joins Work When the Hash Table Does Not Fit in the PGA
The database must use a different technique when the hash table does not fit entirely in the PGA. In this case, the database uses a temporary space to hold portions (called partitions) of the hash table, and sometimes portions of the larger table that probes the hash table.
The basic process is as follows:
The database performs a full scan of the smaller data set, and then builds an array of hash buckets in both the PGA and on disk.
When the PGA hash area fills up, the database finds the largest partition within the hash table and writes it to temporary space on disk. The database stores any new row that belongs to this on-disk partition on disk, and all other rows in the PGA. Thus, part of the hash table is in memory and part of it on disk.
The database takes a first pass at reading the other data set.
For each row, the database does the following:
Applies the same hash function to the join column or columns to calculate the number of the relevant hash bucket.
Probes the hash table to determine whether rows exist in the bucket in memory.
If the hashed value points to a row in memory, then the database completes the join and returns the row. If the value points to a hash partition on disk, however, then the database stores this row in the temporary tablespace, using the same partitioning scheme used for the original data set.
The database reads each on-disk temporary partition one by one
The database joins each partition row to the row in the corresponding on-disk temporary partition.
Hash Join Controls
The USE_HASH
hint instructs the optimizer to use a hash join when joining two tables together.
See Also:
Hash Join: Basic Steps的更多相关文章
- SQL Tuning 基础概述06 - 表的关联方式:Nested Loops Join,Merge Sort Join & Hash Join
nested loops join(嵌套循环) 驱动表返回几条结果集,被驱动表访问多少次,有驱动顺序,无须排序,无任何限制. 驱动表限制条件有索引,被驱动表连接条件有索引. hints:use_n ...
- Sort merge join、Nested loops、Hash join(三种连接类型)
目前为止,典型的连接类型有3种: Sort merge join(SMJ排序-合并连接):首先生产driving table需要的数据,然后对这些数据按照连接操作关联列进行排序:然后生产probed ...
- 视图合并、hash join连接列数据分布不均匀引发的惨案
表大小 SQL> select count(*) from agent.TB_AGENT_INFO; COUNT(*) ---------- 1751 SQL> select count( ...
- 最新电Call记录统计-full hash join用法
declare @time datetime set @time='2016-07-01' --最新的电Call记录统计查询--SELECT t.zuoxi1,t.PhoneCount,t.Phone ...
- Sql优化(一) Merge Join vs. Hash Join vs. Nested Loop
原创文章,首发自本人个人博客站点,转载请务必注明出自http://www.jasongj.com Nested Loop,Hash Join,Merge Join介绍 Nested Loop: 对于被 ...
- Oracle 表的连接方式(2)-----HASH JOIN的基本机制3
HASH JOIN的模式 hash join有三种工作模式,分别是optimal模式,onepass模式和multipass模式,分别在v$sysstat里面有对应的统计信息: SQL> sel ...
- Oracle 表的连接方式(2)-----HASH JOIN的基本机制2
Hash算法原理 对于什么是Hash算法原理?这个问题有点难度,不是很好说清楚,来做一个比喻吧:我们有很多的小猪,每个的体重都不一样,假设体重分布比较平均(我们考虑到公斤级别),我们按照体重来分,划分 ...
- Oracle 表的连接方式(2)-----HASH JOIN的基本机制1
我们对hash join的常见误解,一般包括两个: 第一个误解:是我们经常以为hash join需要对两个做join的表都做全表扫描 第二个误解:是经常以为hash join会选择比较小的表做buil ...
- SQL Server的三种物理连接之Hash Join(三)
简介 在 SQL Server 2012 在一些特殊的例子下会看到下面的图标: Hash Join分为两个阶段,分别为生成和探测阶段. 首先是生成阶段,将输入源中的每一个条目经过散列函数的计算都放到不 ...
随机推荐
- three.js WebGLRenderTarget
今天郭先生说一说WebGLRenderTarget,它是一个缓冲,就是在这个缓冲中,视频卡为正在后台渲染的场景绘制像素. 它用于不同的效果,例如把它做为贴图使用或者图像后期处理.线案例请点击博客原文. ...
- 粉丝投稿!从2月份的面试被拒到如今的阿里P7,说一说自己学java以来的经验!
个人近期面试情况 今年二月以来,我的面试除了一个用友的,基本其他都被毙了,可以说是非常残酷的.其中有很多自己觉得还面的不错的岗位,比如百度.跟谁学.好未来等公司.说实话,打击比较大. 情况基本上是从三 ...
- ESXi 中重新启动管理代理
使用直接控制台用户界面 (DCUI)重启管理代理: 连接到您的 ESXi 主机的控制台. 按 F2 自定义系统. 以 root 身份登录. 使用上下箭头导航至故障排除选项>重新启动管理代理. 按 ...
- python的二维数组操作--坑
用到python list的二维数组,发现有一些需要注意的地方. 第一种赋值方法: list0 = [[0]*3]*4 list0[0][1] = 1 print(list0) 输出结果为: [[0, ...
- TurtleBot3 Waffle (tx2版华夫)(4)笔记本与TX2的通信
4.1. 使用vnc控制华夫Turbot3-Tx2开发板 1) 电脑端安装vnc viewer,您可以选择应用商城下载安装即可: 2) 下载后打开,键入Turbot3的ip à回车à选择连接: 3) ...
- 如何利用Typora编写博客,快速发布到多平台?
在不同的平台发布同样的文章,最让人头疼的就是图片问题,如果要手动一个个去重新上传,耗时耗力,还容易搞错.下面分享的方法,可以将Typora编写的文章快速发布到CSDN.微信公众号.博客园.简书等平台. ...
- 【老孟Flutter】Flutter 中与平台相关的生命周期
老孟导读:关于生命周期的文章共有2篇,一篇(此篇)是介绍 Flutter 中Stateful 组件的生命周期. 第二篇是 Flutter 中与平台相关的生命周期, 博客地址:http://laomen ...
- HarmonyOS(LiteOs_m) 官方例程移植到STM32初体验
HarmonyOS(LiteOs_m) 官方例程移植到STM32初体验 硬件平台 基于正点原子战舰V3开发板 MCU:STM32F103ZET6 片上SRAM大小:64KBytes 片上FLASH大小 ...
- Linux服务器以及系统性能排查常用命令
一.在Linux系统中排查CPU故障的方法和技巧 1.top命令 Linux内部命令,可以查看实时的CPU的使用情况,也可以查看CPU最近一段时间CPU的使用情况 Linux下常用的性能分析工具,能够 ...
- MQ for linux安装与卸载【转】
MQ for linux安装与卸载[转] 一.安装步骤:1. 用root帐号登录系统2. MQ安装程序需将代码安装到目录/opt/mqm下,将数据保存到目录/var/mqm下,需确保相关目录下有足够的 ...