[20181130]hash冲突导致查询缓慢.txt
[20181130]hash冲突导致查询缓慢.txt
--//昨天看了链接https://jonathanlewis.wordpress.com/2018/11/26/shrink-space-2/,演示了Shrink Space导致
--//执行语句缓慢的情况,我自己重复测试,实际上这样发生的概率还是很低的,我个人认为,至于Shrink Space是否好坏,
--//我个人还是根据实际的情况来确定.
1.环境:
SCOTT@book> @ ver1
PORT_STRING VERSION BANNER
------------------------------ -------------- --------------------------------------------------------------------------------
x86_64/Linux 2.4.xx 11.2.0.4.0 Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
SCOTT@book> rename emp to empxxx;
Table renamed.
--//作者测试的表名与scoot用户下冲突,我先修改原系统的表名.
2.建立测试脚本:
create table emp(
dept_no not null,
sal,
emp_no not null,
padding,
constraint e_pk primary key(emp_no)
)
as
with generator as (
select null
from dual
connect by
level <= 1e4 -- > comment to avoid wordpress format issue
)
select
mod(rownum,6),
rownum,
rownum,
rpad('x',60)
from
generator v1,
generator v2
where
rownum <= 2e4 -- > comment to avoid wordpress format issue
;
insert into emp values(432, 20001, 20001, rpad('x',60));
delete /*+ full(emp) */ from emp where emp_no <= 1000; -- > comment to avoid wordpress format issue
--//注:执行时要删除后面的注解.不然报错.作者应该把分号放在最后才能正常执行ok.
commit;
begin
dbms_stats.gather_table_stats(
ownname => user,
tabname => 'EMP',
method_opt => 'for all columns size 1'
);
end;
/
3.测试:
SCOTT@book> alter session set statistics_level = all;
Session altered.
select
/*+ gather_plan_statistics pre-shrink */
count(*)
from (
select /*+ no_merge */
outer.*
from
emp outer
where
outer.sal > (
select /*+ no_unnest */
avg(inner.sal)
from
emp inner
where
inner.dept_no = outer.dept_no
)
)
;
COUNT(*)
----------
9998
SCOTT@book> @ dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID 9bkx1f5cpcv14, child number 1
-------------------------------------
select /*+ gather_plan_statistics pre-shrink */
count(*) from ( select /*+ no_merge */
outer.* from emp outer where
outer.sal > ( select /*+ no_unnest */
avg(inner.sal) from
emp inner where
inner.dept_no = outer.dept_no
) )
Plan hash value: 322796046
------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 569 (100)| | 1 |00:00:00.04 | 1912 |
| 1 | SORT AGGREGATE | | 1 | 1 | | | | 1 |00:00:00.04 | 1912 |
| 2 | VIEW | | 1 | 143 | | 569 (1)| 00:00:07 | 9998 |00:00:00.04 | 1912 |
|* 3 | FILTER | | 1 | | | | | 9998 |00:00:00.03 | 1912 |
| 4 | TABLE ACCESS FULL | EMP | 1 | 20001 | 156K| 71 (0)| 00:00:01 | 20001 |00:00:00.01 | 239 |
| 5 | SORT AGGREGATE | | 7 | 1 | 8 | | | 7 |00:00:00.02 | 1673 |
|* 6 | TABLE ACCESS FULL| EMP | 7 | 2857 | 22856 | 71 (0)| 00:00:01 | 20001 |00:00:00.01 | 1673 |
------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1
2 - SEL$2 / from$_subquery$_001@SEL$1
3 - SEL$2
4 - SEL$2 / OUTER@SEL$2
5 - SEL$3
6 - SEL$3 / INNER@SEL$3
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("OUTER"."SAL">)
6 - filter("INNER"."DEPT_NO"=:B1)
--//你可以发现内层循环扫描emp表7次.因为有7个部门.也就是oracle缓存了执行过.虽然最后1个记录是dept_no=432存在冲突,仅仅1条,
--//影响不大.
4.测试Shrink Space后:
SCOTT@book> alter table emp enable row movement;
Table altered.
SCOTT@book> alter table emp shrink space compact;
Table altered.
SCOTT@book> select * from emp where rownum<=4;
DEPT_NO SAL EMP_NO PADDING
---------- ---------- ---------- ---------
432 20001 20001 x
4 19978 19978 x
5 19979 19979 x
0 19980 19980 x
--//这样dept_no=432被移动到前面.
select
/*+ gather_plan_statistics post-shrink */
count(*)
from (
select /*+ no_merge */
outer.*
from emp outer
where outer.sal >
(
select /*+ no_unnest */ avg(inner.sal)
from emp inner
where inner.dept_no = outer.dept_no
)
)
;
COUNT(*)
----------
9498
SCOTT@book> @ dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID gx7xb7rhfd2zf, child number 0
-------------------------------------
select /*+ gather_plan_statistics post-shrink */
count(*) from ( select /*+ no_merge */
outer.* from emp outer where outer.sal >
( select /*+ no_unnest */ avg(inner.sal)
from emp inner where
inner.dept_no = outer.dept_no ) )
Plan hash value: 322796046
------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 569 (100)| | 1 |00:00:03.43 | 783K|
| 1 | SORT AGGREGATE | | 1 | 1 | | | | 1 |00:00:03.43 | 783K|
| 2 | VIEW | | 1 | 143 | | 569 (1)| 00:00:07 | 9498 |00:00:03.43 | 783K|
|* 3 | FILTER | | 1 | | | | | 9498 |00:00:03.43 | 783K|
| 4 | TABLE ACCESS FULL | EMP | 1 | 20001 | 156K| 71 (0)| 00:00:01 | 19001 |00:00:00.01 | 247 |
| 5 | SORT AGGREGATE | | 3172 | 1 | 8 | | | 3172 |00:00:03.42 | 783K|
|* 6 | TABLE ACCESS FULL| EMP | 3172 | 2857 | 22856 | 71 (0)| 00:00:01 | 10M|00:00:02.71 | 783K|
------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1
2 - SEL$2 / from$_subquery$_001@SEL$1
3 - SEL$2
4 - SEL$2 / OUTER@SEL$2
5 - SEL$3
6 - SEL$3 / INNER@SEL$3
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("OUTER"."SAL">)
6 - filter("INNER"."DEPT_NO"=:B1)
--//注:我的测试机器比较快,没有作者测试的9秒,仅仅接近4秒完成,不过还是看出比原来执行慢.注意看id=6,循环执行次数是3172.
--//也就是dept_no=432与dept_no=0,1,2,3,4,5存在hash冲突,这样每次执行内层循环dept_no=:B1是都要重复调用.
SCOTT@book> select dept_no,count(*) from emp group by dept_no order by 1;
DEPT_NO COUNT(*)
---------- ----------
0 3167
1 3167
2 3167
3 3166
4 3166
5 3167
432 1
7 rows selected.
--//假设与dept_no=1出现hash冲突.
--//dept_no=432 循环1次
--//dept_no=0 循环1次
--//dept_no=1 循环3167次
--//dept_no=2 循环1次
--//dept_no=3 循环1次
--//dept_no=4 循环1次
--//dept_no=5 循环1次
--//这样累加: 1+1+3167+1+1+1+1 = 3173 ,不对相差1.我做了一些细节,证明hash冲突是dept_no=4.
5.其它有趣的测试:
--//执行如下,只要dept_no in 里面包括4,432查询就很慢(至少查询3个部门).就会有点慢.
--//也就是证明hash冲突的是dept_no=4.
select
/*+ gather_plan_statistics post-shrink */
count(*)
from (
select /*+ no_merge */
outer.*
from emp outer
where outer.sal >
(
select /*+ no_unnest */ avg(inner.sal)
from emp inner
where inner.dept_no = outer.dept_no
)
) where dept_no in (432,4,5)
;
--//如果你执行如下,你会发现执行很快:
select
/*+ gather_plan_statistics post-shrink */
count(*)
from (
select /*+ no_merge */
outer.*
from emp outer
where outer.sal >
(
select /*+ no_unnest */ avg(inner.sal)
from emp inner
where inner.dept_no = outer.dept_no
)
) where dept_no in (432,4)
;
SCOTT@book> @ dpc '' ''
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID 9v9984wd6k9t5, child number 0
-------------------------------------
select /*+ gather_plan_statistics post-shrink */
count(*) from ( select /*+ no_merge */
outer.* from emp outer where outer.sal >
( select /*+ no_unnest */ avg(inner.sal)
from emp inner where
inner.dept_no = outer.dept_no ) ) where dept_no
in (432,4)
Plan hash value: 322796046
------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | 214 (100)| | 1 |00:00:00.01 | 741 |
| 1 | SORT AGGREGATE | | 1 | 1 | | | | 1 |00:00:00.01 | 741 |
| 2 | VIEW | | 1 | 143 | | 214 (1)| 00:00:03 | 1583 |00:00:00.01 | 741 |
|* 3 | FILTER | | 1 | | | | | 1583 |00:00:00.01 | 741 |
|* 4 | TABLE ACCESS FULL | EMP | 1 | 5715 | 45720 | 71 (0)| 00:00:01 | 3167 |00:00:00.01 | 247 |
| 5 | SORT AGGREGATE | | 2 | 1 | 8 | | | 2 |00:00:00.01 | 494 |
|* 6 | TABLE ACCESS FULL| EMP | 2 | 2857 | 22856 | 71 (0)| 00:00:01 | 3167 |00:00:00.01 | 494 |
------------------------------------------------------------------------------------------------------------------------
--//循环仅仅2次.这是因为参考链接:
http://blog.itpub.net/267265/viewspace-2155927/
https://blogs.oracle.com/oraclemagazine/on-caching-and-evangelizing-sql
-//摘要如下:
When you're using a scalar subquery, Oracle Database will set up a small in-memory hash table for the subquery and its
results each time it runs the query. So, when you run the previous query, Oracle Database sets up in memory a hash table
that looks like this:
Oracle Database will use this hash table to remember the scalar subquery and the inputs to it—just :DEPTNO in this case
—and the output from it. At the beginning of every query execution, this cache is empty, but suppose you run the query
and the first PROJECTS row you retrieve has a DEPTNO value of 10. Oracle Database will assign the number 10 to a hash
value between 1 and 255 (the size of the hash table cache in Oracle Database 10g and Oracle Database 11g currently) and
will look in that hash table slot to see if the answer exists. In this case, it will not, so Oracle Database must run
the scalar subquery with the input of 10 to get the answer. If that answer (count) is 42, the hash table may look
something like this:
//注:补充说明我测试10.2.0.5,buckets=512而不是255.有机会测试11.2.0.4的情况.
Select count(*) from emp where emp.deptno = :deptno
:deptno Count(*)
You'll have saved the DEPTNO value of 10 and the answer (count) of 42 in some slot—probably not the first or last slot,
but whatever slot the hash value 10 is assigned to. Now suppose the second row you get back from the PROJECTS table
includes a DEPTNO value of 20. Oracle Database will again look in the hash table after assigning the value 20, and it
will discover "no result in the cache yet." So it will run the scalar subquery, get the result, and put it into the hash
table cache. Now the cache may look like this:
Select count(*) from emp where emp.deptno = :deptno
:deptno Count(*)
Select count(*) from emp where emp.deptno = :deptno
:deptno Count(*)
… …
10 42
Now suppose the query returns a third row and it again includes a DEPTNO value of 10. This time, Oracle Database will
see DEPTNO = 10, find that it already has that value in the hash table cache, and will simply return 42 from the cache
instead of executing the scalar subquery. In fact, it will never have to run that scalar subquery for the DEPTNO values
of 10 or 20 again for that query—it will already have the answer.
What happens if the number of unique DEPTNO values exceeds the size of the hash table? What if there are more than 255
values? Or, more generally, if more than one DEPTNO value is assigned to the same slot in the hash table, what happens
in a hash collision?
The answer is the same for all these questions and is rather simple: Oracle Database will not be able to cache the
second or nth value to that slot in the hash table. For example, what if the third row returned by the query contains
the DEPTNO = 30 value? Further, suppose that DEPTNO = 30 is to be assigned to exactly the same hash table slot as DEPTNO
= 10. The database won't be able to effectively cache DEPTNO = 30 in this case—the value will never make it into the
hash table. It will, however, be "partially cached." Oracle Database still has the hash table with all the previous
executions, but it also keeps the last scalar subquery result it had "next to" the hash table. That is, if the fourth
row also includes a DEPTNO = 30 value, Oracle Database will discover that the result is not in the hash table but is
"next to" the hash table, because the last time it ran the scalar subquery, it was run with an input of 30. On the other
hand, if the fourth row includes a DEPTNO = 40 value, Oracle Database will run the scalar subquery with the DEPTNO = 40
value (because it hasn't seen that value yet during this query execution) and overwrite the DEPTNO = 30 result. The next
time Oracle Database sees DEPTNO = 30 in the result set, it'll have to run that scalar subquery again.
--//答案在这一段落中,如果查询结果临近它会从前面的查询获得结果,而不用进入循环.我仅仅查询dept_no in (432,4),这样后面全部是
--//dept_no=4返回,这样可以从临近的查询获得结果.实际上你看作者的表设计就知道答案:
SCOTT@book> select * from emp where rownum<=10;
DEPT_NO SAL EMP_NO PADDING
---------- ---------- ---------- --------
432 20001 20001 x
4 19978 19978 x
5 19979 19979 x
0 19980 19980 x
1 19981 19981 x
2 19982 19982 x
3 19983 19983 x
4 19984 19984 x
5 19985 19985 x
0 19986 19986 x
10 rows selected.
--//dept_no记录不是聚集在一起的.
总结:
--//实际上这个例子我记忆在作者<基于成本的优化>的书中提到过.当时想作者如何知道那个数存在冲突.感觉作者很厉害.
--//另外写一篇blog猜测那些hash存在冲突的.
[20181130]hash冲突导致查询缓慢.txt的更多相关文章
- [20181130]如何猜测那些值存在hash冲突.txt
[20181130]如何猜测那些值存在hash冲突.txt --//今年6月份开始kerrycode的1个帖子提到子查询结果缓存在哈希表中情况:--//链接:http://www.cnblogs.co ...
- [20180626]函数与标量子查询14.txt
[20180626]函数与标量子查询14.txt --//前面看http://www.cnblogs.com/kerrycode/p/9099507.html链接,里面提到: 通俗来将,当使用标量子查 ...
- Hash冲突的四种解决办法
一.哈希表简介 非哈希表的特点:关键字在表中的位置和它自检不存在一个确定的关系,查找的过程为给定值一次和各个关系自进行比较,查找的效率取决于给定值进行比较的次数. 哈希表的特点:关键字在表中位置和它自 ...
- Java集合--Hash、Hash冲突
一.Hash 散列表(Hash table,也叫哈希表),是根据键(Key)而直接访问在内存存储位置的数据结构.也就是说,它通过计算一个关于键值的函数,将所需查询的数据映射到表中一个位置来访问记录,这 ...
- 没想到 Hash 冲突还能这么玩,你的服务中招了吗?
背景 其实这个问题我之前也看到过,刚好在前几天,洪教授在某个群里分享的一个<一些有意思的攻击手段.pdf>,我觉得这个话题还是有不少人不清楚的,今天我就准备来“实战”一把,还请各位看官轻拍 ...
- hash冲突随笔
一:hash表 也叫散列表,以key-value的形式存储数据,就是将需要存储的关键码值通过hash函数映射到表中的位置,可加快访问速度. 二:hash冲突 如果两个相同的关键码值通过hash函数映射 ...
- [20190306]奇怪的查询结果.txt
[20190306]奇怪的查询结果.txt--//链接http://www.itpub.net/thread-2108588-1-1.html提到一个非常古怪的问题,我自己重复测试看看:1.环境:SC ...
- 解决hash冲突之分离链接法
解决hash冲突之分离链接法 分离链接法:其做法就是将散列到同一个值的所有元素保存到一个表中. 这样讲可能比较抽象,下面看一个图就会很清楚,图如下 相应的实现可以用分离链接散列表来实现(其实就是一个l ...
- Map之HashMap的get与put流程,及hash冲突解决方式
在java中HashMap作为一种Map的实现,在程序中我们经常会用到,在此记录下其中get与put的执行过程,以及其hash冲突的解决方式: HashMap在存储数据的时候是key-value的键值 ...
随机推荐
- 解决Unity中模型部件的MeshCollider不随动画一起运动的问题
Unity的3d游戏开发中,经常遇到需要将模型的某一部分(比如武器),单独做碰撞处理的情况. 导入模型后,给武器部分添加MeshCollider,MeshCollider的Mesh通常包含在模型里,如 ...
- mongo-spark-读取不同的库数据和写入不同的库中
mongo-spark-读取不同的库数据和写入不同的库中 package com.example.app import com.mongodb.spark.config.{ReadConfig, Wr ...
- 浅谈基于WOPI协议实现跨浏览器的Office在线编辑解决方案
如今,基于Web版的Office 在线预览与编辑功能已成为一种趋势,而关于该技术的实现却成为了国内大部份公司的技术挑战,挑战主要存在于两方面: 其一:目前国内乃至微软本身,还没有相对较为完善的解决方案 ...
- 浅谈JavaScript之事件(上)
一 简述JavaScript及其在浏览器中的地位 (一) 浏览器主要构成 虽然不同浏览器之间存在差异(如Google Chrome,Firefox,Safari和IE等),但单从浏览器构成来说,大 ...
- Java设计模式之《构建者模式》及应用场景
原创作品,可以转载,但是请标注出处地址:http://www.cnblogs.com/V1haoge/p/6553374.html 构建者模式,又称建造者模式,将一部负责对象的构建分为许多小对象的构建 ...
- maven创建一个简单的web项目
1.确认maven插件和配置在eclipse中已经完成 如果没完成,可参考这篇博客:http://www.cnblogs.com/mmzs/p/8191979.html 2.在eclipse中用mav ...
- linux的文档和目录结构
在Linux底下,所有的文件与目录都是由根目录开始,是目录与文件的源头,然后一个个的分支下来,如同树枝状,因此称为这种目录配置为:目录树. 目录树的特点是什么呢? 目录树的起始点是根目录(/,root ...
- 类型,对象,线程栈,托管堆在运行时的关系,以及clr如何调用静态方法,实例方法,和虚方法(第二次修改)
1.线程栈 window的一个进程加载clr.该进程可能含有多个线程,线程创建的时候会分配1MB的栈空间. 如图: void Method() { string name="zhangsan ...
- 【worker】js中的多线程
因为下个项目中要用到一些倒计时的功能,所以就提前准备了一下,省的到时候出现一下界面不友好和一些其他的事情.正好趁着这个机会也加深一下html5中的多线程worker的用法和理解. Worker简介 J ...
- springMVC_10拦截器
一,简介 拦截器概念和struts概念一致 实现拦截器 实现HandlerInterceptor接口 配置拦截器 <mvc:interceptors> <mvc:intercepto ...