Postgres的索引01

一.PG 9.3有以下索引类型

1.b-tree

1.1支持前导模糊查询，如xxx%或者^'xxx'
1.2忽略大小写字符前导模糊查询，如ILIKE 'XXX%'或者~*'^xxx'
1.3支持常见的条件运算符< = <= = >= >

2.hash

仅支持=条件运算符

3.gin

支持多列值索引，例如数据类型，全文检索类型
<@ 被包含 array[1,2,3] <@ array[2,3,4]
@> 包含 array[1,2,3] @> array[2]
= 相等 array[1,2,3] = array[1,2,3]
&& 相交 array[1,2,3]&& array[2]

4.gist

不是单类索引，算是一种索引框架，支持许多不同的索引策略，可以自定义条件运算符
支持近邻排序，如取某一个点的10个近邻

select * from places order by localtion <-> point '(101,456)' limit 10;

<< -- 严格在左侧, 例如circle '((0,0),1)' << circle '((5,0),1)'
&< -- 表示左边的平面体不会扩展到超过右边的平面体的右边. 例如box '((0,0),(1,1))' &< box '((0,0),(2,2))'
&> -- 表示左边的平面体不会扩展到超过右边的平面体的左边. 例如box '((0,0),(3,3))' &> box '((0,0),(2,2))'
>> -- 严格在右
<<| -- 严格在下
&<| -- 不会扩展到超出上面
|&> -- 不会扩展到超出下面
|>> -- 严格在上
@> -- 包含
<@ -- 被包含
~= -- 相同
&& -- 相交

http://www.postgresql.org/docs/9.3/static/functions-geometry.html

5.sp-gist

与gist类似，也是一张索引框架，支持基于磁盘存储的非平衡数据结构，如四叉树、k-d树、radix树
支持操作符 << >> ~= <@
<^ 在下面，circle'((0,0),1)' <^ circle'((0,5),1) 左边的圆在右边的圆的下边
>^ 在上面，circle'((0,5),1)' 》^ circle'((0,0),1) 左边的圆在右边的圆的上边

二.使用索引的好处

1.利用索引进行排序减少CPU开销

1.1 查询条件就是索引列

postgres=# \c db1

You are now connected to database "db1" as user "yzw".

db1=# create table test(id int,info text,crt_time timestamp);

CREATE TABLE

db1=# insert into test select generate_series(1,10000), md5(random()::text),clock_timestamp();

INSERT 0 10000

db1=# create index idx_test_1 on test(id);

CREATE INDEX

db1=# explain analyze select * from test where id<100 order by id;

                                                          QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------

 Sort  (cost=396.80..405.13 rows=3333 width=44) (actual time=0.106..0.111 rows=99 loops=1)

   Sort Key: id

   Sort Method: quicksort  Memory: 32kB

   ->  Bitmap Heap Scan on test  (cost=66.12..201.78 rows=3333 width=44) (actual time=0.050..0.059 rows=99 loops=1)

         Recheck Cond: (id < 100)

         Heap Blocks: exact=1

         ->  Bitmap Index Scan on idx_test_1  (cost=0.00..65.28 rows=3333 width=0) (actual time=0.036..0.036 rows=99 loops=1)

               Index Cond: (id < 100)

 Planning time: 0.520 ms

 Execution time: 0.178 ms

(10 rows)

1.2 查询条件不是索引列

db1=# explain analyze select * from test where info='c969799412fed1c8f91eff5e65353a85' order by id;

                                              QUERY PLAN

-------------------------------------------------------------------------------------------------------

 Sort  (cost=219.01..219.01 rows=1 width=45) (actual time=1.112..1.112 rows=1 loops=1)

   Sort Key: id

   Sort Method: quicksort  Memory: 25kB

   ->  Seq Scan on test  (cost=0.00..219.00 rows=1 width=45) (actual time=0.011..1.104 rows=1 loops=1)

         Filter: (info = 'c969799412fed1c8f91eff5e65353a85'::text)

         Rows Removed by Filter: 9999

 Planning time: 0.081 ms

 Execution time: 1.129 ms

(8 rows)

> 为何都有排序的节点Sort Key？

# 关闭enable_seqscan全表扫描后，查询索引列没有了排序节点

db1=# set enable_seqscan=off;

SET

db1=# explain analyze select * from test where id<100 order by id;

                                                      QUERY PLAN

----------------------------------------------------------------------------------------------------------------------

 Index Scan using idx_test_1 on test  (cost=0.29..10.04 rows=100 width=45) (actual time=0.005..0.016 rows=99 loops=1)

   Index Cond: (id < 100)

 Planning time: 0.119 ms

 Execution time: 0.034 ms

(4 rows)

enable_seqscan 9.4默认是on，9.3是off？

2.加速带条件的查询，删除，更新

2.1 正常开启全表扫描和索引扫描情况下，有索引的列查找走索引

db1=# set enable_seqscan=on;

SET

db1=# explain analyze select * from test where id=1;

                                                    QUERY PLAN

------------------------------------------------------------------------------------------------------------------

 Index Scan using idx_test_1 on test  (cost=0.29..8.30 rows=1 width=45) (actual time=0.014..0.015 rows=1 loops=1)

   Index Cond: (id = 1)

 Planning time: 0.067 ms

 Execution time: 0.032 ms

(4 rows)

2.2在没有索引条件下的查询效率，即使有索引列也会走全表扫描

db1=# show enable_indexscan;

 enable_indexscan

------------------

 on

(1 row)

db1=# show enable_bitmapscan;

 enable_bitmapscan

-------------------

 on

(1 row)

db1=# set enable_indexscan=off,enable_bitmapscan=off;

db1=# set enable_indexscan=off;set enable_bitmapscan=off;

SET

SET

db1=# show enable_indexscan;show enable_bitmapscan;

 enable_indexscan

------------------

 off

(1 row)

 enable_bitmapscan

-------------------

 off

(1 row)

# 关闭索引后，变成全表扫描了

db1=# explain analyze select * from test where id=1;

                                           QUERY PLAN

-------------------------------------------------------------------------------------------------

 Seq Scan on test  (cost=0.00..219.00 rows=1 width=45) (actual time=0.012..0.943 rows=1 loops=1)

   Filter: (id = 1)

   Rows Removed by Filter: 9999

 Planning time: 0.138 ms

 Execution time: 0.971 ms

(5 rows)

2.3 加速join操作

db1=# set enable_indexscan=on;set enable_bitmapscan=on;

SET

SET

db1=# insert into test1 select generate_series(1,10000), md5(random()::text),clock_timestamp();

INSERT 0 10000

test1表没有建索引，走全表扫描，test表走id索引，并且出现嵌套循环

db1=# explain analyze select t1.*,t2.* from test t1 join test1 t2 on(t1.id=t2.id and t2.id=1);

                                                        QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------

 Nested Loop  (cost=0.29..227.31 rows=1 width=90) (actual time=0.032..0.896 rows=1 loops=1)

   ->  Index Scan using idx_test_1 on test t1  (cost=0.29..8.30 rows=1 width=45) (actual time=0.019..0.020 rows=1 loops=1)

         Index Cond: (id = 1)

   ->  Seq Scan on test1 t2  (cost=0.00..219.00 rows=1 width=45) (actual time=0.010..0.873 rows=1 loops=1)

         Filter: (id = 1)

         Rows Removed by Filter: 9999

 Planning time: 0.124 ms

 Execution time: 0.927 ms

(8 rows)

给test1表增加索引后，也走索引，test1表的索引数据在内存，因此速度更快

db1=# create index idx_test1_id on test1(id);

CREATE INDEX

db1=# explain analyze select t1.*,t2.* from test t1 join test1 t2 on(t1.id=t2.id and t2.id=1);

                                                          QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------

 Nested Loop  (cost=0.57..16.62 rows=1 width=90) (actual time=0.033..0.034 rows=1 loops=1)

   ->  Index Scan using idx_test_1 on test t1  (cost=0.29..8.30 rows=1 width=45) (actual time=0.011..0.012 rows=1 loops=1)

         Index Cond: (id = 1)

   ->  Index Scan using idx_test1_id on test1 t2  (cost=0.29..8.30 rows=1 width=45) (actual time=0.020..0.020 rows=1 loops=1)

         Index Cond: (id = 1)

 Planning time: 0.240 ms

 Execution time: 0.059 ms

(7 rows)

merge join，两个join的表按照join列做好排序后，再进行join，也能用上索引，通常来说，能够使用merge join的地方，使用hash join更快

db1=# show enable_hashjoin;

 enable_hashjoin

-----------------

 on

(1 row)

db1=# show enable_mergejoin;

 enable_mergejoin

------------------

 on

(1 row)

# 关闭hashjoin

set enable_hashjoin=off;

db1=# explain analyze select t1.*,t2.* from test t1 join test1 t2 on t1.id=t2.id;

                                                               QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------

 Merge Join  (cost=0.57..884.57 rows=10000 width=90) (actual time=0.020..10.837 rows=10000 loops=1)

   Merge Cond: (t1.id = t2.id)

   ->  Index Scan using idx_test_1 on test t1  (cost=0.29..367.29 rows=10000 width=45) (actual time=0.006..2.453 rows=10000 loops=1)

   ->  Index Scan using idx_test1_id on test1 t2  (cost=0.29..367.29 rows=10000 width=45) (actual time=0.006..3.625 rows=10000 loops=1)

 Planning time: 0.309 ms

 Execution time: 11.304 ms

(6 rows)

# 如果没有索引，效率最差，先全表扫描，然后排序，再join

db1=# explain analyze select t1.*,t2.* from test t1 join test1 t2 on t1.id=t2.id;

                                                       QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------

 Merge Join  (cost=1716.77..1916.77 rows=10000 width=90) (actual time=3.090..7.286 rows=10000 loops=1)

   Merge Cond: (t1.id = t2.id)

   ->  Sort  (cost=858.39..883.39 rows=10000 width=45) (actual time=1.571..2.007 rows=10000 loops=1)

         Sort Key: t1.id

         Sort Method: quicksort  Memory: 1166kB

         ->  Seq Scan on test t1  (cost=0.00..194.00 rows=10000 width=45) (actual time=0.005..0.789 rows=10000 loops=1)

   ->  Sort  (cost=858.39..883.39 rows=10000 width=45) (actual time=1.514..2.039 rows=10000 loops=1)

         Sort Key: t2.id

         Sort Method: quicksort  Memory: 1166kB

         ->  Seq Scan on test1 t2  (cost=0.00..194.00 rows=10000 width=45) (actual time=0.003..0.748 rows=10000 loops=1)

 Planning time: 0.171 ms

 Execution time: 7.614 ms

(12 rows)

# 自动使用hash join

db1=# set enable_hashjoin=on;set enable_indexscan=on;set enable_bitmapscan=on;

SET

db1=# explain analyze select t1.*,t2.* from test t1 join test1 t2 on t1.id=t2.id;

                                                       QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------

 Hash Join  (cost=319.00..763.00 rows=10000 width=90) (actual time=2.208..7.150 rows=10000 loops=1)

   Hash Cond: (t1.id = t2.id)

   ->  Seq Scan on test t1  (cost=0.00..194.00 rows=10000 width=45) (actual time=0.005..0.966 rows=10000 loops=1)

   ->  Hash  (cost=194.00..194.00 rows=10000 width=45) (actual time=2.160..2.160 rows=10000 loops=1)

         Buckets: 1024  Batches: 1  Memory Usage: 782kB

         ->  Seq Scan on test1 t2  (cost=0.00..194.00 rows=10000 width=45) (actual time=0.003..0.959 rows=10000 loops=1)

 Planning time: 0.211 ms

 Execution time: 7.502 ms

(8 rows)

3.加速外键约束更新和删除操作

create table p(id int primary key, info text, crt_time timestamp);

create table f(id int primary key, p_id int references p(id) on delete cascade on update cascade, info text, crt_time timestamp);

insert into p select generate_series(1,10000), md5(random()::text), clock_timestamp();

insert into f select generate_series(1,10000), generate_series(1,10000), md5(random()::text), clock_timestamp();

f表的p_id列未加索引情况下

db1=# explain (analyze,verbose,costs,buffers,timing) update p set id=1 where id=0;

                                                       QUERY PLAN

------------------------------------------------------------------------------------------------------------------------

 Update on public.p  (cost=0.29..8.30 rows=1 width=47) (actual time=0.053..0.053 rows=0 loops=1)

   Buffers: shared hit=7

   ->  Index Scan using p_pkey on public.p  (cost=0.29..8.30 rows=1 width=47) (actual time=0.019..0.019 rows=1 loops=1)

         Output: 1, info, crt_time, ctid

         Index Cond: (p.id = 0)

         Buffers: shared hit=3

 Planning time: 0.068 ms

 Trigger RI_ConstraintTrigger_a_16424 for constraint f_p_id_fkey on p: time=1.225 calls=1 # p表上耗时长

 Trigger RI_ConstraintTrigger_c_16426 for constraint f_p_id_fkey on f: time=0.068 calls=1

 Execution time: 1.377 ms

(10 rows)

增加p表索引后

create index idx_f_1 on f(p_id);

db1=#  explain (analyze,verbose,costs,buffers,timing) update p set id=0 where id=1;

                                                       QUERY PLAN

------------------------------------------------------------------------------------------------------------------------

 Update on public.p  (cost=0.29..8.30 rows=1 width=47) (actual time=0.055..0.055 rows=0 loops=1)

   Buffers: shared hit=7

   ->  Index Scan using p_pkey on public.p  (cost=0.29..8.30 rows=1 width=47) (actual time=0.022..0.023 rows=1 loops=1)

         Output: 0, info, crt_time, ctid

         Index Cond: (p.id = 1)

         Buffers: shared hit=3

 Planning time: 0.079 ms

 Trigger RI_ConstraintTrigger_a_16424 for constraint f_p_id_fkey on p: time=0.132 calls=1 # p表耗时短

 Trigger RI_ConstraintTrigger_c_16426 for constraint f_p_id_fkey on f: time=0.085 calls=1

 Execution time: 0.307 ms

(10 rows)

4.索引在排他约束中的使用

要求左右操作符互换对结果没有影响，例如x=y,y=x结果都是true或者unknown

db1=# CREATE TABLE test2(id int,geo point,EXCLUDE USING btree (id WITH pg_catalog.=));

CREATE TABLE

db1=# insert into test2 (id) values (1);

INSERT 0 1

db1=# insert into test2 (id) values (1);

ERROR:  conflicting key value violates exclusion constraint "test2_id_excl"

DETAIL:  Key (id)=(1) conflicts with existing key (id)=(1).

> 模拟unique

5.加速唯一值约束、排他约束

主键
唯一键

CREATE TABLE test3(id int,geo point,EXCLUDE USING spGIST (geo WITH pg_catalog.~=));

select * from pg_indexes where tablename='test3';

db1=# select * from pg_indexes where tablename='test3';

 schemaname | tablename |   indexname    | tablespace |                        indexdef

------------+-----------+----------------+------------+---------------------------------------------------------

 public     | test3     | test3_geo_excl |            | CREATE INDEX test3_geo_excl ON test3 USING spgist (geo)

(1 row)

三.索引的弊端

随着表的记录块的变迁需要更新，因此会对这类操作带来一定的性能影响
块不变更的情况下触发hot特性，可以不需要更新索引
写多读少的场景，索引弊端可能大于其好处

四.注意事项

1.正常创建索引时，会阻断除查询意外的其他操作
2.使用并行CONCURRENTLY选项后，可以允许同时对标的DML操作，但是对于频繁DML的表，这种创建索引的时间非常长
3.某些索引不记录WAL，所以如果有利于WAL进行数据恢复的情况，如crash recovery，流复制，warm standby等，这类索引在使用前需要重建（HASH索引）

Postgres的索引01的更多相关文章

postgres索引创建、存储过程的创建以及在c#中的调用
postgres创建索引参考 http://www.cnblogs.com/stephen-liu74/archive/2012/05/09/2298182.html CREATE TABLE tes ...
[翻译] 为什么Uber的数据库从Postgres 切换到 MySql
Uber工程师团队发布了一个重要的博客文章:他们的数据库从Postgres从移动到MySQL.毫不夸张地说,阅读这篇文章是一种享受,特别是因为他们提到这些细节:磁盘格式和那对他们2个数据库的表现的影响 ...
SQL Server查询所有存储过程信息、触发器、索引
1. [代码]查询所有存储过程 01 select Pr_Name as [存储过程], [参数]=stuff((select ','+[Parameter] 02 from ( 03 se ...
day--41 mysql索引原理与慢查询优化
mysql索引原理与慢查询优化一:什么是索引 01:索引的出现是为了提高查询数据的效率 02:索引在mysql叫做“键” 或则“key“(primary key,uniquekey ,还有一个inde ...
转：为什么Uber宣布从Postgres切换到MySQL?
转: http://mp.weixin.qq.com/s?__biz=MzAwMDU1MTE1OQ==&mid=2653547609&idx=1&sn=cbb55ee823dd ...
优化MySchool数据库设计之【巅峰对决】
优化MySchool数据库设计之独孤九剑船舶停靠在港湾是很安全的,但这不是造船的目的 By:北大青鸟五道口原玉明老师 1.学习方法: 01.找一本好书初始阶段不适合,可以放到第二个阶段,看到知识 ...
EMVTag系列15《选择应用响应数据》
1. 接触交易选择应用响应数据标签长度数据域 9102 A5 变长 FCI专用模板强制 50 1–16 应用标签纯电子现金:PBOC DEBIT 借记卡:PBOC DEBIT 贷记卡:PBO ...
mysql 基本使用
SQL分类 -------------------数据库------------ 创建数据库 create database xxx; 查询所有的数据库 show databases; 查询当前数据 ...
PostgreSQL指南
PostgreSQL指南历史简介最近几年Postgres的关注度变得越来越高. 它加快了Postgres的发展步伐, 与此同时其他的关系数据库系统的发展放缓. 在数据库领域中 Postgre S ...

随机推荐

python 开发一款图片压缩工具（四）：上传图床
上一篇使用了 pngquant 图片压缩工具进行压缩,并通过 click 命令行工具构建了 picom 包.这篇的主要功能是实现图片上传. 图片上传功能的实现通过 pngquant 压缩图片后,得到 ...
尤雨溪的vue怎么学,应该从vue-cli开始,为什么?
带手机验证码登陆, 带全套购物车系统带数据库前后端分离开发带定位用户功能数据库代码为本地制作好了带支付宝支付系统带django开发服务器接口教程地址: https://www.dua ...
mac OS 安装 Eclipse
安装Eclipse前先确认你的Mac上是否已安装Java运行环境.进入终端,输入"java -version",如果返回了java版本号则说明已安装访问Eclipse官方首页ht ...
Failed building wheel for cytoolz
2019独角兽企业重金招聘Python工程师标准>>> 当我使用 pip instlal cytoolz 时, 报以下错误: error: Microsoft Visual C++ ...
Highcharts的自适应DOM或者DIV，JS方法实现
那我们就按照官网的一分钟极速入门代码来说 // 图表配置 var options = { chart: { type: 'bar' //指定图表的类型,默认是折线图(line) }, title: { ...
无向图求割（找桥）tarjan
本博客参考了李煜东的<算法竞赛进阶指南>,大家要是觉得这篇文章写的不错请大家支持正版.豆瓣图书我在之前的博客中讲解了搜索序时间戳,这次我们讲讲追溯值的概念. 追溯值: 设subtree( ...
一只简单的网络爬虫（基于linux C/C++）————守护进程
守护进程,也就是通常说的Daemon进程,是Linux中的后台服务进程.它是一个生存期较长的进程,通常独立于控制终端并且周期性地执行某种任务或等待处理某些发生的事件.守护进程常常在系统引导装入时启动, ...
Linux时间的相关的操作
时间(修改时区,修改时间,同步网络时间) 查看当前系统时间 date 修改时区 cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime 修改当前系统时间 ...
(一)Redis介绍
1 背景在早期的互联网Web 1.0时代,大部分企业还是采用传统的企业级单体应用架构,而一时间蜂拥而至的巨大用户流量使得这种架构难以支撑,通过对诸多系统架构实施以及对巨大用户流量的分析过程中发现,其 ...
Spring官网阅读（五）BeanDefinition（下）
上篇文章已经对BeanDefinition做了一系列的介绍,这篇文章我们开始学习BeanDefinition合并的一些知识,完善我们整个BeanDefinition的体系,Spring在创建一个bea ...

Postgres的索引01

一.PG 9.3有以下索引类型

1.b-tree

2.hash

3.gin

4.gist

5.sp-gist

二.使用索引的好处

1.利用索引进行排序减少CPU开销

2.加速带条件的查询，删除，更新

3.加速外键约束更新和删除操作

4.索引在排他约束中的使用

5.加速唯一值约束、排他约束

三.索引的弊端

四.注意事项

Postgres的索引01的更多相关文章

随机推荐

热门专题