GIN and RUM 索引性能比较

gin索引字段entry构造的TREE，在末端posting tree|list 里面存储的是entry对应的行号. 别无其他信息。rum索引，与GIN类似，但是在posting list|tree的每一个ctid(itempoint)后面会追加一些属性值。因此，有些场景，使用rum 索引，性能会优很多。以下举个例子比较下。

Note: KingbaseES v8r6c5b0023 版本，附带了rum插件。

一、构造数据

create table t1 as select name,short_desc from pg_settings;

alter table t1 add column tsv tsvector;

update t1 set tsv=to_tsvector(short_desc);

--number for 紧邻的只有一条

test=# select short_desc from t1 where to_tsvector(short_desc) @@ to_tsquery('number <-> for');

       short_desc

-------------------------

 Top SQL number for kddm

(1 row)

--同时包含number and for 的有7条

test=# select short_desc from t1 where to_tsvector(short_desc) @@ to_tsquery('number & for');

                                  short_desc

-------------------------------------------------------------------------------

Sets the number of digits displayed for floating-point values.

Sets the maximum number of simultaneously open files for each server process.

Sets the number of connection slots reserved for superusers.

Top SQL number for kddm

Sets the number of disk-page buffers in shared memory for WAL.

Sets the number of WAL files held for standby servers.

Sets the number of locks used for concurrent xlog insertions.

(7 rows)

二、例子1：距离查询

1、gin 索引

创建gin 索引：

create index ind_t1_gin on t1 using gin(tsv);

查看gin索引执行计划：通过索引返回7条记录，也就是索引没有包含位置的信息，需要访问表数据。

test=# explain analyze select short_desc from t1 where tsv @@ to_tsquery('number <-> for');

                                                     QUERY PLAN

--------------------------------------------------------------------------------------------------------------------

 Bitmap Heap Scan on t1  (cost=12.32..30.68 rows=9 width=55) (actual time=0.111..0.195 rows=1 loops=1)

   Recheck Cond: (tsv @@ to_tsquery('number <-> for'::text))

   Rows Removed by Index Recheck: 6

   Heap Blocks: exact=4

   ->  Bitmap Index Scan on ind_t1_gin  (cost=0.00..12.32 rows=9 width=0) (actual time=0.058..0.058 rows=7 loops=1)

         Index Cond: (tsv @@ to_tsquery('number <-> for'::text))

 Planning Time: 0.199 ms

 Execution Time: 0.227 ms

(8 rows)

2、rum 索引

创建索引：

test=# create extension rum;

CREATE EXTENSION

test=# create index ind_t1_rum on t1 using rum(tsv);

CREATE INDEX

查看执行计划：可以看到索引返回的记录就一条，也就是索引包含有位置信息。

test=# explain analyze select short_desc from t1 where tsv @@ to_tsquery('number <-> for');

                                                     QUERY PLAN

--------------------------------------------------------------------------------------------------------------------

 Bitmap Heap Scan on t1  (cost=12.32..30.68 rows=9 width=55) (actual time=0.041..0.042 rows=1 loops=1)

   Recheck Cond: (tsv @@ to_tsquery('number <-> for'::text))

   Heap Blocks: exact=1

   ->  Bitmap Index Scan on ind_t1_rum  (cost=0.00..12.32 rows=9 width=0) (actual time=0.038..0.039 rows=1 loops=1)

         Index Cond: (tsv @@ to_tsquery('number <-> for'::text))

 Planning Time: 0.297 ms

 Execution Time: 0.068 ms

(7 rows)

三、例子2：相关性排序

1、gin 索引

需要所有符合的数据，再进行排序

test=# create index ind_t1_gin on t1 using gin(tsv);

CREATE INDEX

test=# explain analyze select short_desc from t1 where tsv @@ to_tsquery('number & for') order by tsv <=> to_tsquery('number & for') limit 1;

                                                           QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------

 Limit  (cost=33.00..33.00 rows=1 width=59) (actual time=0.134..0.135 rows=1 loops=1)

   ->  Sort  (cost=33.00..33.02 rows=9 width=59) (actual time=0.133..0.134 rows=1 loops=1)

         Sort Key: ((tsv <=> to_tsquery('number & for'::text)))

         Sort Method: top-N heapsort  Memory: 25kB

         ->  Bitmap Heap Scan on t1  (cost=12.32..32.95 rows=9 width=59) (actual time=0.102..0.125 rows=7 loops=1)

               Recheck Cond: (tsv @@ to_tsquery('number & for'::text))

               Heap Blocks: exact=4

               ->  Bitmap Index Scan on ind_t1_gin  (cost=0.00..12.32 rows=9 width=0) (actual time=0.084..0.084 rows=7 loops=1)

                     Index Cond: (tsv @@ to_tsquery('number & for'::text))

 Planning Time: 0.237 ms

 Execution Time: 0.177 ms

(11 rows)

2、rum 索引

test=# explain analyze select short_desc from t1 where tsv @@ to_tsquery('number & for') order by tsv <=> to_tsquery('number & for') limit 1;

                                                      QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------

 Limit  (cost=8.25..12.52 rows=1 width=59) (actual time=0.077..0.077 rows=1 loops=1)

   ->  Index Scan using ind_t1_rum on t1  (cost=8.25..46.68 rows=9 width=59) (actual time=0.076..0.076 rows=1 loops=1)

         Index Cond: (tsv @@ to_tsquery('number & for'::text))

         Order By: (tsv <=> to_tsquery('number & for'::text))

 Planning Time: 0.236 ms

 Execution Time: 0.101 ms

(6 rows)

GIN and RUM 索引性能比较的更多相关文章

第七章——DMVs和DMFs（2）——用DMV和DMF监控索引性能
原文:第七章--DMVs和DMFs(2)--用DMV和DMF监控索引性能本文继续介绍使用DMO来监控,这次讲述的是监控索引性能.索引是提高查询性能的关键性手段.即使你的表上有合适的索引,你也要时时刻 ...
Oracle B-tree、位图、全文索引三大索引性能比较及优缺点汇总
引言:大家都知道“效率”是数据库中非常重要的一个指标,如何提高效率大家可能都会想起索引,但索引又这么多种,什么场合应该使用什么索引呢?哪种索引可以提高我们的效率,哪种索引可以让我们的效率大大降低(有时 ...
oracle使用索引和不使用索引性能分析
首先准备一张百万条数据的表,这样分析数据差距更形象! 下面用分页表数据对表进行分析,根据EMP_ID 字段排序,使用索引和不使用索引性能差距! sql查询语法准备,具体业务根据具体表书写sql语法: ...
MySQL 索引性能分析概要
上一篇文章 MySQL 索引设计概要介绍了影响索引设计的几大因素,包括过滤因子.索引片的宽窄与大小以及匹配列和过滤列.在文章的后半部分介绍了数据库索引设计与优化一书中,理想的三星索引的设计流程和 ...
mysql索引性能验证,高性能的索引策略
索引性能验证 1.无索引列的查询在where条件中查询没有添加索引的列,性能会比较差.我们可以先在sqlyog中打开表t_user的数据,然后复制一个名字出来进行查询. /*无索引列的查询,索引不会 ...
Mysql 复合键索引性能
数据库的常见的索引一般是单个字段,如果多个字段的组合,那么就组成了复合索引.对于组合索引,如果对其中一字段做为条件查询,会出现什么情况呢? 一.例子 mysql> show create ta ...
Oracle中索引的使用索引性能优化调整
索引是由Oracle维护的可选结构,为数据提供快速的访问.准确地判断在什么地方需要使用索引是困难的,使用索引有利于调节检索速度. 当建立一个索引时,必须指定用于跟踪的表名以及一个或多个表列.一旦建立了 ...
MongoDB学习笔记（四）--索引 && 性能优化
索引基础索引 ...
Mysql优化系列之索引性能
实际上,前面的数据类型和表结构设计优化不能算优化,只能算规范,也就是说在设计表的时候,应该且必须做到这些索引是sql优化的核心部分,在<高性能Mysql>中单独抽出一章讲,也印证了其重要 ...

随机推荐

ARCGIS API for Python进行城市区域提取
ArcGIS API for Python主要用于Web端的扩展和开发,提供简单易用.功能强大的Python库,以及大数据分析能力,可轻松实现实时数据.栅格数据.空间数据等多源数据的接入和GIS分析 ...
Nginx越界读取缓存漏洞 CVE-2017-7529
1.漏洞描述 Nginx在反向代理站点的时候,通常会将一些文件进行缓存,特别是静态文件.缓存的部分存储在文件中,每个缓存文件包括"文件头"+"HTTP返回包头" ...
如何在Excel/WPS表格中批量查询快递信息？
如何在Excel/WPS表格中批量查询快递信息? 干电商的小伙伴们还在为如何批量查询快递物流信息发愁吗?别着急,这篇文章或许能够帮助到您. 首先给大家看一下查询的具体成果: 第一步:安装Excel网络 ...
基于EasyExcel的大数据量导入并去重
源码:https://gitee.com/antia11/excel-data-import-demo 背景:客户需要每周会将上传一个 Excel 数据文件,数据量单次为 20W 以上,作为其他模块和 ...
毫秒值的概念和作用与Date类的构造方法和成员方法
日期时间类 Date类 java.Util.Date:表示日期和实践类类Date表示特定的瞬间,精确到毫秒毫秒:千分之疫苗 1000毫秒 =1秒特定的瞬间:一个时间点,一刹那使劲啊 2088-0 ...
composer常用命令（部分摘抄）
1. 仅更新单个库 composer update foo/bar 2. 不编辑composer.json的情况下安装库 composer require "foo/bar:1.0.0&qu ...
Linux ssh协议
基础知识 ssh:secure shell protocol,安全的远程登录作用:是建立在应用层基础上的安全协议,实现数据传输过程中数据的加密,代替telent协议使用tcp协议,端口号为22 s ...
Redis缓存雪崩、缓存穿透、缓存击穿
缓存雪崩 Redis中的缓存数据是有过期时间的,当在同一时间大量的缓存同时失效时就会造成缓存雪崩. 解决方案 1.设置Redis中的key永不过期,缺点是会占用很多内存 2.使用Redis的分布式锁S ...
技术分享 | innodb_buffer_pool_size为什么无法调低至1GB以内
前言 innodb_buffer_pool_size可以调大,却不能调小至1GB以内,这是为什么? MySQL 版本:5.7.30 测试环境有台 MySQL 服务器反应很慢,检查系统后发现内存使用量已 ...
C++ 练气期之函数探幽
1. 函数基础一个C++程序中,往往需要包含若干个函数,可以说函数是C++程序的基础组成元件,是程序中的头等公民. 如果要理解程序中使用函数的具体意义,则需要了解语言发展过程中致力要解决的 2 问题 ...

GIN and RUM 索引性能比较

一、构造数据

二、例子1：距离查询

1、gin 索引

2、rum 索引

三、例子2：相关性排序

1、gin 索引

2、rum 索引

GIN and RUM 索引性能比较的更多相关文章

随机推荐

热门专题