1. 索引的特性

1.1 加快条件的检索的特性

当表数据量越来越大时查询速度会下降，在表的条件字段上使用索引，快速定位到可能满足条件的记录，不需要遍历所有记录。

create table t(id int, info text);

insert into t select generate_series(1,10000),'lottu'||generate_series(1,10000);

create table t1 as select * from t;

create table t2 as select * from t;

create index ind_t2_id on t2(id);

lottu=# analyze t1;

ANALYZE

lottu=# analyze t2;

ANALYZE

# 没有索引

lottu=# explain (analyze,buffers,verbose) select * from t1 where id < 10;

                                             QUERY PLAN

-----------------------------------------------------------------------------------------------------

 Seq Scan on lottu.t1  (cost=0.00..180.00 rows=9 width=13) (actual time=0.073..5.650 rows=9 loops=1)

   Output: id, info

   Filter: (t1.id < 10)

   Rows Removed by Filter: 9991

   Buffers: shared hit=55

 Planning time: 25.904 ms

 Execution time: 5.741 ms

(7 rows)

# 有索引

lottu=# explain (analyze,verbose,buffers) select * from t2 where id < 10;

                                                     QUERY PLAN

---------------------------------------------------------------------------------------------------------------------

 Index Scan using ind_t2_id on lottu.t2  (cost=0.29..8.44 rows=9 width=13) (actual time=0.008..0.014 rows=9 loops=1)

   Output: id, info

   Index Cond: (t2.id < 10)

   Buffers: shared hit=3

 Planning time: 0.400 ms

 Execution time: 0.052 ms

(6 rows)

#在这个案例中：执行同一条SQL。t2有索引的执行数据是0.052 ms；t1没有索引的是：5.741 ms;

1.2 有序的特性

索引本身就是有序的。

#没有索引

lottu=# explain (analyze,verbose,buffers) select * from t1 where id > 2 order by id;

                                                   QUERY PLAN

-----------------------------------------------------------------------------------------------------------------

Sort  (cost=844.31..869.31 rows=9999 width=13) (actual time=8.737..11.995 rows=9998 loops=1)

   Output: id, info

   Sort Key: t1.id

   Sort Method: quicksort  Memory: 853kB

   Buffers: shared hit=55

   ->  Seq Scan on lottu.t1  (cost=0.00..180.00 rows=9999 width=13) (actual time=0.038..5.133 rows=9998 loops=1)

         Output: id, info

         Filter: (t1.id > 2)

         Rows Removed by Filter: 2

         Buffers: shared hit=55

 Planning time: 0.116 ms

 Execution time: 15.205 ms

(12 rows)

 #有索引

lottu=# explain (analyze,verbose,buffers) select * from t2 where id > 2 order by id;

                                                         QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------

 Index Scan using ind_t2_id on lottu.t2  (cost=0.29..353.27 rows=9999 width=13) (actual time=0.030..5.304 rows=9998 loops=1)

   Output: id, info

   Index Cond: (t2.id > 2)

   Buffers: shared hit=84

 Planning time: 0.295 ms

 Execution time: 7.027 ms

(6 rows)

#在这个案例中：执行同一条SQL。

t2有索引的执行数据是7.027 ms；t1没有索引的是：15.205 ms;
t1没有索引执行还占用了 Memory: 853kB。

2. 索引扫描方式

索引的扫描方式有3种

2.1 Indexscan

先查索引找到匹配记录的ctid，再通过ctid查堆表

2.2 bitmapscan

先查索引找到匹配记录的ctid集合，把ctid通过bitmap做集合运算和排序后再查堆表

2.3 Indexonlyscan

如果索引字段中包含了所有返回字段，对可见性映射 (vm)中全为可见的数据块，不查堆表直接返回索引中的值。

这里谈谈Indexscan扫描方式和Indexonlyscan扫描方式
对这两种扫描方式区别；借用oracle中索引扫描方式来讲；Indexscan扫描方式会产生回表读。根据上面解释来说；Indexscan扫描方式：查完索引之后还需要查表。 Indexonlyscan扫描方式只需要查索引。也就是说：Indexonlyscan扫描方式要优于Indexscan扫描方式？我们来看看

现有表t；在字段id上面建来ind_t_id索引

1. t表没有VM文件。

lottu=# \d+ t

                           Table "lottu.t"

 Column |  Type   | Modifiers | Storage  | Stats target | Description

--------+---------+-----------+----------+--------------+-------------

 id     | integer |           | plain    |              |

 info   | text    |           | extended |              |

Indexes:

    "ind_t_id" btree (id)

lottu=# explain (analyze,buffers,verbose) select id from t where id < 10;

                                                      QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------

 Index Only Scan using ind_t_id on lottu.t  (cost=0.29..8.44 rows=9 width=4) (actual time=0.009..0.015 rows=9 loops=1)

   Output: id

   Index Cond: (t.id < 10)

   Heap Fetches: 9

   Buffers: shared hit=3

 Planning time: 0.177 ms

 Execution time: 0.050 ms

(7 rows)

#人为更改执行计划

lottu=# set enable_indexonlyscan = off;

SET

lottu=# explain (analyze,buffers,verbose) select id from t where id < 10;

                                                    QUERY PLAN

------------------------------------------------------------------------------------------------------------------

 Index Scan using ind_t_id on lottu.t  (cost=0.29..8.44 rows=9 width=4) (actual time=0.008..0.014 rows=9 loops=1)

   Output: id

   Index Cond: (t.id < 10)

   Buffers: shared hit=3

 Planning time: 0.188 ms

 Execution time: 0.050 ms

(6 rows)

# 可以发现两者几乎没有差异；唯一不同的是Indexonlyscan扫描方式存在扫描的Heap Fetches时间。 这个时间是不在Execution time里面的。

2. t表有VM文件

lottu=# delete from t where id >200 and id < 500;

DELETE 299

lottu=# vacuum t;

VACUUM

lottu=# analyze t;

ANALYZE

lottu=# explain (analyze,buffers,verbose) select id from t where id < 10;

                                                      QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------

 Index Only Scan using ind_t_id on lottu.t  (cost=0.29..4.44 rows=9 width=4) (actual time=0.008..0.012 rows=9 loops=1)

   Output: id

   Index Cond: (t.id < 10)

   Heap Fetches: 0

   Buffers: shared hit=3

 Planning time: 0.174 ms

 Execution time: 0.048 ms

(7 rows)

lottu=# set enable_indexonlyscan = off;

SET

lottu=# explain (analyze,buffers,verbose) select id from t where id < 10;

                                                    QUERY PLAN

------------------------------------------------------------------------------------------------------------------

 Index Scan using ind_t_id on lottu.t  (cost=0.29..8.44 rows=9 width=4) (actual time=0.012..0.022 rows=9 loops=1)

   Output: id

   Index Cond: (t.id < 10)

   Buffers: shared hit=3

 Planning time: 0.179 ms

 Execution time: 0.077 ms

(6 rows)

总结：

Index Only Scan在没有VM文件的情况下, 速度比Index Scan要慢, 因为要扫描所有的Heap page。差异几乎不大。
Index Only Scan存在VM文件的情况下，是要比Index Scan要快。

知识点1：

VM文件：称为可见性映射文件；该文件存在表示：该数据块没有需要清理的行。即已经做了vaccum操作。

知识点2：

人为选择执行计划。可设置enable_xxx参数有

enable_bitmapscan
enable_hashagg
enable_hashjoin
enable_indexonlyscan
enable_indexscan
enable_material
enable_mergejoin
enable_nestloop
enable_seqscan
enable_sort
enable_tidscan

参考文献

参考德哥：《PostgreSQL 性能优化培训 3 DAY.pdf》
https://www.postgresql.org/docs/9.6/static/runtime-config-query.html

3. 索引的类型

PostgreSQL 支持索引类型有: B-tree, Hash, GiST, SP-GiST, GIN and BRIN。

postgresql----Btree索引:http://www.cnblogs.com/alianbog/p/5621749.html
postgresql----hash索引：一般只用于简单等值查询。不常用。
postgresql----Gist索引:http://www.cnblogs.com/alianbog/p/5628543.html

4. 索引的管理

4.1 创建索引

创建索引语法：

lottu=# \h create index

Command:     CREATE INDEX

Description: define a new index

Syntax:

CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] name ] ON table_name [ USING method ]

    ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )

    [ WITH ( storage_parameter = value [, ... ] ) ]

    [ TABLESPACE tablespace_name ]

    [ WHERE predicate ]

接下来我们以t表为例。

1. 关键字【UNIQUE】

#创建唯一索引；主键就是一种唯一索引

CREATE UNIQUE INDEX ind_t_id_1 on t (id);

2. 关键字【CONCURRENTLY】

# 这是并发创建索引。跟oracle的online创建索引作用是一样的。创建索引过程中；不会阻塞表更新，插入，删除操作。当然创建的时间就会很漫长。

CREATE INDEX CONCURRENTLY ind_t_id_2 on t (id);

3. 关键字【IF NOT EXISTS】

#用该命令是用于确认索引名是否存在。若存在；也不会报错。

CREATE INDEX IF NOT EXISTS ind_t_id_3 on t (id);

4. 关键字【USING】

# 创建哪种类型的索引。 默认是B-tree。

CREATE INDEX ind_t_id_4 on t using btree (id);

5 关键字【[ ASC | DESC ] [ NULLS { FIRST | LAST]】

# 创建索引是采用降序还是升序。 若字段存在null值，是把null值放在前面还是最后：例如采用降序，null放在前面。

CREATE INDEX ind_t_id_5 on t (id desc nulls first)

6. 关键字【WITH ( storage_parameter = value)】

#索引的填充因子设为。例如创建索引的填充因子设为75

CREATE INDEX ind_t_id_6 on t (id) with (fillfactor = 75);

7. 关键字【TABLESPACE】

#是把索引创建在哪个表空间。

CREATE INDEX ind_t_id_7 on t (id) TABLESPACE tsp_lottu;

8. 关键字【WHERE】

#只在自己感兴趣的那部分数据上创建索引，而不是对每一行数据都创建索引，此种方式创建索引就需要使用WHERE条件了。

CREATE INDEX ind_t_id_8 on t (id) WHERE id < 1000;

4.2 修改索引

修改索引语法

lottu=# \h alter index

Command:     ALTER INDEX

Description: change the definition of an index

Syntax:

#把索引重新命名

ALTER INDEX [ IF EXISTS ] name RENAME TO new_name

#把索引迁移表空间

ALTER INDEX [ IF EXISTS ] name SET TABLESPACE tablespace_name

#把索引重设置填充因子

ALTER INDEX [ IF EXISTS ] name SET ( storage_parameter = value [, ... ] )

#把索引的填充因子设置为默认值

ALTER INDEX [ IF EXISTS ] name RESET ( storage_parameter [, ... ] )

#把表空间TSP1中索引迁移到新表空间

ALTER INDEX ALL IN TABLESPACE name [ OWNED BY role_name [, ... ] ]

    SET TABLESPACE new_tablespace [ NOWAIT ]

4.3 删除索引

删除索引语法

lottu=# \h drop index

Command:     DROP INDEX

Description: remove an index

Syntax:

DROP INDEX [ CONCURRENTLY ] [ IF EXISTS ] name [, ...] [ CASCADE | RESTRICT ]

5. 索引的维护

索引能带来加快对表中记录的查询，排序，以及唯一约束的作用。索引也是有代价

索引需要增加数据库的存储空间。
在表记录执行插入，更新，删除操作。索引也要更新。

5.1 查看索引的大小

select pg_size_pretty(pg_relation_size('ind_t_id'));

5.2 索引的利用率

--通过pg_stat_user_indexes.idx_scan可检查利用索引进行扫描的次数；这样可以确认那些索引可以清理掉。

select idx_scan from pg_stat_user_indexes where indexrelname = 'ind_t_id';

5.3 索引的重建

--如果一个表经过频繁更新之后，索引性能不好；需要重建索引。

lottu=# select pg_size_pretty(pg_relation_size('ind_t_id_1'));

 pg_size_pretty

----------------

 2200 kB

(1 row)

lottu=# delete from t where id > 1000;

DELETE 99000

lottu=# analyze t;

ANALYZE

lottu=# select pg_size_pretty(pg_relation_size('ind_t_id_1'));

 pg_size_pretty

----------------

 2200 kB

lottu=# insert into t select generate_series(2000,100000),'lottu';

INSERT 0 98001

lottu=# select pg_size_pretty(pg_relation_size('ind_t_id_1'));

 pg_size_pretty

----------------

 4336 kB

(1 row)

lottu=# vacuum full t;

VACUUM

lottu=# select pg_size_pretty(pg_relation_size('ind_t_id_1'));

 pg_size_pretty

----------------

 2176 kB

重建方法：

1. reindex：reindex不支持并行重建【CONCURRENTLY】;索引会锁表；会进行阻塞。

2. vacuum full; 对表进行重构；索引也会重建；同样也会锁表。

3. 创建一个新索引(索引名不同)；再删除旧索引。

（转）浅谈PostgreSQL的索引的更多相关文章

浅谈PostgreSQL的索引
1. 索引的特性 1.1 加快条件的检索的特性当表数据量越来越大时查询速度会下降,在表的条件字段上使用索引,快速定位到可能满足条件的记录,不需要遍历所有记录. create table t(id i ...
浅谈B+树索引的分裂优化(转)
http://www.tamabc.com/article/85038.html 从MySQL Bug#67718浅谈B+树索引的分裂优化原文链接:http://hedengcheng.com/ ...
浅谈postgresql的GIN索引(通用倒排索引)
1.倒排索引原理倒排索引来源于搜索引擎的技术,可以说是搜索引擎的基石.正是有了倒排索引技术,搜索引擎才能有效率的进行数据库查找.删除等操作.在详细说明倒排索引之前,我们说一下与之相关的正排索引并与之 ...
从MySQL Bug#67718浅谈B+树索引的分裂优化（转）
原文链接:http://hedengcheng.com/?p=525 问题背景今天,看到Twitter的DBA团队发布了其最新的MySQL分支:Changes in Twitter MySQL 5. ...
MySQL Bug#67718 浅谈B+树索引的分裂优化
原文链接:http://hedengcheng.com/?p=525 问题背景今天,看到Twitter的DBA团队发布了其最新的MySQL分支:Changes in Twitter MySQL 5. ...
浅谈PostgreSQL用户权限
问题经常在PG群里看到有人在问“为什么我对表赋予了权限:但是还是不能访问表” 解析若你看懂德哥这篇文章PostgreSQL逻辑结构和权限体系介绍:上面对你就不是困扰你的问题解决这个问题很简单:在 ...
浅谈MYSQL的索引以及它的数据结构
什么是索引 mysql的数据是持久化到磁盘的,写SQL查询数据也就是在磁盘的某个位置查找符合条件的数据,但是磁盘IO比起内存效率是极慢的,特别是数据量大的时候,这时候就需要引入索引来提高查询效率: 在 ...
浅谈SQL优化入门：3、利用索引
0.写在前面的话关于索引的内容本来是想写的,大概收集了下资料,发现并没有想象中的简单,又不想总结了,纠结了一下,决定就大概写点浅显的,好吧,就是懒,先挖个浅坑,以后再挖深一点.最基本的使用很简单,直 ...
c#Winform程序调用app.config文件配置数据库连接字符串 SQL Server文章目录浅谈SQL Server中统计对于查询的影响有关索引的DMV SQL Server中的执行引擎入门【译】表变量和临时表的比较对于表列数据类型选择的一点思考 SQL Server复制入门(一)----复制简介操作系统中的进程与线程
c#Winform程序调用app.config文件配置数据库连接字符串你新建winform项目的时候,会有一个app.config的配置文件,写在里面的<connectionStrings n ...

随机推荐

SVG介绍
SVG介绍 SVG是指可缩放矢量图(Scalable Vector Graphics).SVG使用XML格式来定义图形.SVG可以直接嵌入到HTML页面中. 位图和矢量图位图(Bitmap)是由很多 ...
Java 二叉树一些基本操作
求二叉树中节点个数: /*1. 求二叉树中的节点个数递归解法: (1)如果二叉树为空,节点个数为0 (2)如果二叉树不为空,二叉树节点个数 = 左子树节点个数 + 右子树节点个数 + 1 */ pu ...
poj 2105 IP Address
IP Address Time Limit: 1000MS Memory Limit: 30000K Total Submissions: 18951 Accepted: 10939 Desc ...
ddddddeeeessssssttttrrrrrrooooooyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
我遥远的 POI 计划啊 https://loj.ac/problems/search?keyword=POI2011 atcoder 一套动态 DP SAM 随便看 XSY 的题 UOJ Roun ...
MySQL 字段全部转换成小写
原因: 因为框架某些字段大写有时候不被正确识别,所以字段都修改成小写; 特别说明:因为这里只有表,没有视图,存储过程等等其它所以我可以直接这么写; 步骤: 1.导出结构语句 2. 执行C# 脚本,替换 ...
ThreadLocal介绍以及源码分析
ThreadLocal 线程主变量前面部分引用其他优秀博客,后面源码自己分析的,如有冒犯请私聊我. 用Java语言开发的同学对 ThreadLocal 应该都不会陌生,这个类的使用场景很多,特别是在 ...
使用SSH连接LINUX的命令
查看端口号是否被占用 netstat -tunlp|grep 端口号杀掉 kill-9 pid 后台运行 nohup 应用程序名 & disown -a && exit 屏幕 ...
mysql 查询近几天的结果按每天的粒度查询
),DATE_FORMAT(FROM_UNIXTIME(createtime), '%Y-%m-%d') as time from bskuser group by time
flask用session记录状态
html <form action="/login" method="POST"> <input type="text" ...
Python-模块3-re
今天我们就说一个模块,那就是re,不过想要了解re模块,我们得先了解一下什么是正则表达式,有助于我们更好的学习re模块一.正则表达式首先, 我们在网页上进行注册或者登陆的时候经常能看到一些格式上的 ...

（转）浅谈PostgreSQL的索引