DATABASE SYSTEM CONCEPTS, SIXTH EDITION
11.1 Basic Concepts
An index for a file in a database system works in much the same way as the index
in this textbook. If we want to learn about a particular topic (specified by a word
or a phrase) in this textbook, we can search for the topic in the index at the back
of the book, find the pages where it occurs, and then read the pages to find the
information for which we are looking. The words in the index are in sorted order,
making it easy to find the word we want. Moreover, the index is much smaller
than the book, further reducing the effort needed.
Database-system indices play the same role as book indices in libraries. For
example, to retrieve a student record given an
ID
, the database system would look
up an index to find on which disk block the corresponding record resides, and
then fetch the disk block, to get the appropriate student record.
Keeping a sorted list of students’
ID
would not work well on very large
databases with thousands of students, since the index would itself be very big;
further, even though keeping the index sorted reduces the search time, finding a
student can still be rather time-consuming. Instead, more sophisticated indexing
techniques may be used. We shall discuss several of these techniques in this
chapter.
There are two basic kinds of indices:


Ordered indices. Based on a sorted ordering of the values.


Hash indices. Based on a uniform distribution of values across a range of
buckets. The bucket to which a value is assigned is determined by a function,
called a hash function.

We shall consider several techniques for both ordered indexing and hashing.
No one technique is the best. Rather, each technique is best suited to particular
database applications. Each technique must be evaluated on the basis of these
factors:


Access types: The types of access that are supported efficiently. Access types
can include finding records with a specified attribute value and finding
records whose attribute values fall in a specified range.

Access time: The time it takes to find a particular data item, or set of items,
using the technique in question.

Insertion time: The time it takes to insert a new data item. This value includes
the time it takes to find the correct place to insert the new data item, as well
as the time it takes to update the index structure.

Deletion time: The time it takes to delete a data item. This value includes
the time it takes to find the item to be deleted, as well as the time it takes to
update the index structure.

Space overhead: The additional space occupied by an index structure. Pro-
vided that the amount of additional space is moderate, it is usually worth-
while to sacrifice the space to achieve improved performance.
We often want to have more than one index for a file. For example, we may
wish to search for a book by author, by subject, or by title.
An attribute or set of attributes used to look up records in a file is called a
search key. Note that this definition of key differs from that used in primary key,
candidate key, and superkey. This duplicate meaning for key is (unfortunately) well
established in practice. Using our notion of a search key, we see that if there are
several indices on a file, there are several search keys.

Indexing and Hashing的更多相关文章

  1. 局部敏感哈希-Locality Sensitive Hashing

    局部敏感哈希 转载请注明http://blog.csdn.net/stdcoutzyx/article/details/44456679 在检索技术中,索引一直须要研究的核心技术.当下,索引技术主要分 ...

  2. 局部敏感哈希(Locality-Sensitive Hashing, LSH)方法介绍

    局部敏感哈希(Locality-Sensitive Hashing, LSH)方法介绍 本文主要介绍一种用于海量高维数据的近似近期邻高速查找技术--局部敏感哈希(Locality-Sensitive ...

  3. 局部敏感哈希(Locality-Sensitive Hashing, LSH)

    本文主要介绍一种用于海量高维数据的近似最近邻快速查找技术——局部敏感哈希(Locality-Sensitive Hashing, LSH),内容包括了LSH的原理.LSH哈希函数集.以及LSH的一些参 ...

  4. Post Tuned Hashing,PTH

    [ACM 2018] Post Tuned Hashing_A New Approach to Indexing High-dimensional Data [paper] [code] Zhendo ...

  5. 哈希学习(2)—— Hashing图像检索资源

    CVPR14 图像检索papers——图像检索 1.  Triangulation embedding and democratic aggregation for imagesearch (Oral ...

  6. 局部敏感哈希 Kernelized Locality-Sensitive Hashing Page

    Kernelized Locality-Sensitive Hashing Page   Brian Kulis (1) and Kristen Grauman (2)(1) UC Berkeley ...

  7. 局部敏感哈希(Locality-Sensitive Hashing, LSH)方法介绍(转)

    局部敏感哈希(Locality-Sensitive Hashing, LSH)方法介绍 本文主要介绍一种用于海量高维数据的近似最近邻快速查找技术——局部敏感哈希(Locality-Sensitive ...

  8. 单细胞分析实录(1): 认识Cell Hashing

    这是一个新系列 差不多是一年以前,我定导后没多久,接手了读研后的第一个课题.合作方是医院,和我对接的是一名博一的医学生,最开始两边的老师很排斥常规的单细胞文章思路,即各大类细胞分群.注释.描述,所以起 ...

  9. [Algorithm] 局部敏感哈希算法(Locality Sensitive Hashing)

    局部敏感哈希(Locality Sensitive Hashing,LSH)算法是我在前一段时间找工作时接触到的一种衡量文本相似度的算法.局部敏感哈希是近似最近邻搜索算法中最流行的一种,它有坚实的理论 ...

随机推荐

  1. iOS Xcode个人常用插件

    1.AdjustFontSize 按command +/-进行字体大小调整 2.ATProperty @property专用,strong.assign.copy.weak IBOutlet 3.Ba ...

  2. SQL: enable sa Account in SQL Server

    Link: http://sudeeptaganguly.wordpress.com/2010/04/20/how-to-enable-sa-account-in-sql-server/ 引用: Wh ...

  3. iOS学习05C语言函数

    本次主要是学习和理解函数,函数树状图如下: 1.函数的声明和定义 函数定义的四要素分别为: 返回值类型 :函数的结果值类型,函数不能返回数组. 指定返回类型是void类型说明函数没有返回值. 函数名 ...

  4. BZOJ4384 : [POI2015]Trzy wieże

    首先只有一种字符的情况可以通过双指针在$O(n)$的时间内处理完毕. 设$cnt[i][j]$表示前$i$个字符中$j$字符出现的次数,那么对于两个位置$j<i$: 如果 $cnt[i][0]- ...

  5. Storm-166:Nimbus HA solution based on Zookeeper

    Nimbus HA feature is quite important for our application running on the storm cluster. So, we've bee ...

  6. 假装有题目 & Trie+贪心

    题意: 从N个数中选出两个使其异或值最大. SOL: 建立一个01字典树,然后对每一个数在树上贪心即可...Trie一个挺好的运用,复杂度O(n*n的位数) CODE: #include <cs ...

  7. gson 简要使用

    http://www.cnblogs.com/chenlhuaf/archive/2011/05/01/gson_test.html 发现了google的gson,因为之前对于protocolbuf有 ...

  8. JDBC 对数据库连接的封装

    1.BaseDao :抽象基类,其中定义了用于打开连接,得到Statement,执行SQL,关闭资源的基础数据库操作方法. 2.I***Dao :操作指定数据表的接口:定义了操作数据表的抽象方法. 3 ...

  9. 洛谷 P1144 最短路计数 Label:水

    题目描述 给出一个N个顶点M条边的无向无权图,顶点编号为1-N.问从顶点1开始,到其他每个点的最短路有几条. 输入输出格式 输入格式: 输入第一行包含2个正整数N,M,为图的顶点数与边数. 接下来M行 ...

  10. URAL 1427. SMS(DP+单调队列)

    题目链接 我用的比较传统的办法...单调队列优化了一下,写的有点搓,不管怎样过了...两个单调队列,存两个东西,预处理一个标记数组存... #include <iostream> #inc ...