参考文档:http://www.datastax.com/documentation/cql/3.0/webhelp/index.html#cql/ddl/ddl_primary_index_c.html#concept_ds_vk2_dyz_zj

索引提供了一种手段通过属性来获取 Cassandra中数据而不是分区键。好处是提供了快速的、高效的按照指定条件找出数据的查询。

列的值的索引在一个与值分开的、隐藏的表中。Cassandra有很多技术用来防止出现不良的情况——数据可能检索不正确,查询的数据是旧的版本。

一、什么时候使用索引

Cassandra's built-in indexes are best on a table having many rows that contain the indexed value. The more unique values that exist in a particular column, the more overhead you will have, on average, to query and maintain the index. For example, suppose you had a playlists table with a billion songs and wanted to look up songs by the artist. Many songs will share the same column value for artist. The artist column is a good candidate for an index.

When not to use an index

Do not use an index in these situations:

•On high-cardinality columns because you then query a huge volume of records for a small number of results . . . more

•In tables that use a counter column

•On a frequently updated or deleted column . . . more

•To look for a row in a large partition unless narrowly queried . . . more

Problems using a high-cardinality column index

If you create an index on a high-cardinality column, which has many distinct values, a query between the fields will incur many seeks for very few results. In the table with a billion songs, looking up songs by writer (a value that is typically unique for each song) instead of by their artist, is likely to be very inefficient. It would probably be more efficient to manually maintain the table as a form of an index instead of using the Cassandra built-in index. For columns containing unique data, it is sometimes fine performance-wise to use an index for convenience, as long as the query volume to the table having an indexed column is moderate and not under constant load.

Conversely, creating an index on an extremely low-cardinality column, such as a boolean column, does not make sense. Each value in the index becomes a single row in the index, resulting in a huge row for all the false values, for example. Indexing a multitude of indexed columns having foo = true and foo = false is not useful.

Problems using an index on a frequently updated or deleted column

Cassandra stores tombstones in the index until the tombstone limit reaches 100K cells. After exceeding the tombstone limit, the query that uses the indexed value will fail.

Problems using an index to look for a row in a large partition unless narrowly queried

A query on an indexed column in a large cluster typically requires collating responses from multiple data partitions. The query response slows down as more machines are added to the cluster. You can avoid a performance hit when looking for a row in a large partition by narrowing the search, as shown in the next section.

About indexing

An index in Cassandra refers to an index on column values. Cassandra implements an index as a hidden table, separate from the table that contains the values being indexed. Using CQL, you can create an index on a column after defining a table. The music service example shows how to create an index on the artists column of playlist, and then querying Cassandra for songs by a particular artist::

CREATE INDEX artist_names ON playlists( artist );

An index name is optional. If you provide an index name, such as artist_idx, the name must be unique within the keyspace. After creating an index for the artist column and inserting values into the playlists table, greater efficiency is achieved when you query Cassandra directly for artist by name, such as Fu Manchu:

SELECT * FROM playlists WHERE artist = 'Fu Manchu';

As mentioned earlier, when looking for a row in a large partition, narrow the search. This query, although a contrived example using so little data, narrows the search to a single id.

SELECT * FROM playlists WHERE id = 62c36092-82a1-3a00-93d1-46196ee77204 AND artist = 'Fu Manchu';

The output is:

Using multiple indexes

For example purposes, let's say you can create multiple indexes, for example on album and title columns of the playlists table, and use multiple conditions in the WHERE clause to filter the results. In a real-world situation, these columns might not be good choices, depending on their cardinality, as described later:

CREATE INDEX album_name ON playlists ( album );

CREATE INDEX title_name ON playlists ( title );

SELECT * FROM playlists

WHERE album = 'Roll Away' AND title = 'Outside Woman Blues'

ALLOW FILTERING ;

When multiple occurrances of data match a condition in a WHERE clause, Cassandra selects the least-frequent occurrence of a condition for processing first for efficiency. For example, suppose data for Blind Joe Reynolds and Cream's versions of "Outside Woman Blues" were inserted into the playlists table. Cassandra queries on the album name first if there are fewer albums named Roll Away than there are songs called "Outside Woman Blues" in the database. When you attempt a potentially expensive query, such as searching a range of rows, Cassandra requires the ALLOW FILTERING directive.

Building and maintaining indexes

An advantage of indexes is the operational ease of populating and maintaining the index. Indexes are built in the background automatically, without blocking reads or writes. Client-maintained tables as indexes must be created manually; for example, if the state column had been indexed by creating a table such as users_by_state, your client application would have to populate the table with data from the users table.

Cassandra1.2文档学习(19)—— CQL索引的更多相关文章

  1. Cassandra1.2文档学习解读计划——为自己鼓劲

    最近想深入研究一下Cassandra,而Cassandra没有中文文档,仅有的一些参考书都是0.7/0.6版本的.因此有个计划,一边学习文档(地址:http://www.datastax.com/do ...

  2. Cassandra1.2文档学习(17)—— CQL数据模型(上)

    参考文档:http://www.datastax.com/documentation/cql/3.0/webhelp/index.html#cql/ddl/ddl_anatomy_table_c.ht ...

  3. Cassandra1.2文档学习(18)—— CQL数据模型(下)

    三.集合列 CQL 3 引入了一下集合类型: •set •list •map 在关系型数据库中,允许用户拥有多个email地址,你可以创建一个email_addresses表与users表存在一个多对 ...

  4. Cassandra1.2文档学习(15)—— 配置数据一致性

    参考文档:http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/dml/dml_config ...

  5. Cassandra1.2文档学习(13)—— 数据读取

    参考文档:http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/dml/dml_about_ ...

  6. Cassandra1.2文档学习(4)——分区器

    参考文档:http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/architecture/a ...

  7. Cassandra1.2文档学习(1)——Cassandra基本说明

    参考文档:http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/architecture/a ...

  8. Cassandra1.2文档学习(16)—— 模式的变化

    参考文档:http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/dml/dml_schema ...

  9. Cassandra1.2文档学习(9)—— 数据写入

    数据参考:http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/dml/manage_dml ...

随机推荐

  1. JQuery EasyUI 之 DataGrid

    1.动态创建datagrid 在页面上添加一个div或table标签,然后用jquery获取这个标签,并初始化一个datagrid.代码如下: (1)页面上添加div标签 <div id=&qu ...

  2. 用RSA实现Web单点登录密码的加密传输

    在使用通用权限管理系统(吉日嘎拉)的单点登录功能时,对登录密码使用了RSA加密(非对称加密),有使用这个权限管理系统的可参考下. 前端部分,请引用以下几个js文件: <script type=& ...

  3. javaScript中的原型

    最近在学习javaScript,学习到js面向对象中的原型时,感悟颇多.若有不对的地方,希望可以指正. js作为一门面向对象的语言,自然也拥有了继承这一概念,但js中没有类的概念,也就没有了类似于ja ...

  4. Objective-C ,ios,iphone开发基础:3分钟教你做一个iphone手机浏览器

    第一步:新建一个Single View工程: 第二步:新建好工程,关闭arc. 第三步:拖放一个Text Field 一个UIButton 和一个 UIWebView . Text Field 的ti ...

  5. Linux安装Oracle 11G过程(测试未写完)

    一.简介 Oracle数据库在系统运维中的重要性不言而喻,通过熟悉Oracle的安装来加深对操作系统和数据库知识的了解.Linux安装Oracle前期修改linux内核参数很重要,其实就是linux下 ...

  6. 关于相对路径和绝对路径及cd命令的使用

    cd (change directory) 目录    跳转到指定目录下 路径定义分为两种:绝对路径(absolute)和相对路径(relative) 绝对路径:从根目录(/)开始写去的文件名或目录名 ...

  7. ruby 疑难点之—— attr_accessor attr_reader attr_writer

    普通的实例变量 普通的实例变量,我们没法在 class 外面直接访问 #普通的实例变量,只能在 class 内部访问 class C1 def initialize(name) @name = nam ...

  8. 使用AccessibilityService模拟点击事件失败的分析

    使用AccessibilityService模拟点击事件的方法: AccessibilityNodeInfo.performAction(AccessibilityNodeInfo.ACTION_CL ...

  9. Linux下is not in the sudoers file解决方法

    最近在学习linux,在某个用户(xxx)下使用sudo的时候,提示以下错误:xxx is not in the sudoers file. This incident will be reporte ...

  10. saltstack实战4--综合练习4

    Saltstack配置管理-给minion增加Zabbix-agent zabbix-agent的包 [root@A ~]# rpm -qa |grep zabbix zabbix-2.4.8-1.e ...