Cassandra开发入门文档第四部分(集合类型、元组类型、时间序列、计数列)
Cassandra 提供了三种集合类型,分别是Set,List,Map
Set: 非重复集,存储了一组类型相同的不重复元素,当被查询时会返回排好序的结果,但是内部构成是无序的值,应该是在查询时对结果进行了排序。
List: 列表,查询时会按照元素在list中的index顺序来返回结果,可以存储多个重复的值。
Map:哈希Key-Value键值对,提供了名字到值的映射
-- 开始工作:
bin/cqlsh localhost
-- 查看所有的键空间:
DESCRIBE keyspaces
-- 使用创建的键空间:
USE myks;
-- 查看已有表:
describe tables;
-- 查看表结构:
describe table user_status_updates;
Set
-- 修改表结构,增加一个列,用于存储评星用户记录 ALTER TABLE "user_status_updates"
ADD "starred_by_users" text; -- 查询出一个空记录
SELECT "starred_by_users"
FROM "user_status_updates"
WHERE "username" = 'alice'
AND "id" = 76e7a4d0-e796-11e3-90ce-5f98e903bf02; -- 修改记录,增加评星用户
UPDATE "user_status_updates"
SET "starred_by_users" = '["bob"]'
WHERE "username" = 'alice'
AND "id" = 76e7a4d0-e796-11e3-90ce-5f98e903bf02; -- 事实上,可以直接定义列的类型为集合列,而不是定义为Text类型
ALTER TABLE "user_status_updates"
DROP "starred_by_users";
-- 注意一下:SET<text>类型
ALTER TABLE "user_status_updates"
ADD "starred_by_userss" SET<text>; -- 修改记录方法1,增加评星用户,这次是集合,使用{}来存储多条数据
UPDATE "user_status_updates"
SET "starred_by_userss" = {'bob'}
WHERE "username" = 'alice'
AND "id" = 76e7a4d0-e796-11e3-90ce-5f98e903bf02; -- 修改记录方法2,用+
UPDATE "user_status_updates"
SET "starred_by_userss" = "starred_by_userss" + {'carol'}
WHERE "username" = 'alice'
AND "id" = 76e7a4d0-e796-11e3-90ce-5f98e903bf02; UPDATE "user_status_updates"
SET "starred_by_userss" = "starred_by_userss" + {'dave'}
WHERE "username" = 'alice'
AND "id" = 76e7a4d0-e796-11e3-90ce-5f98e903bf02; -- 修改记录方法2,用-
UPDATE "user_status_updates"
SET "starred_by_userss" = "starred_by_users" - {'dave'}
WHERE "username" = 'alice'
AND "id" = 76e7a4d0-e796-11e3-90ce-5f98e903bf02; UPDATE "user_status_updates"
SET "starred_by_userss" = "starred_by_userss" + {'carol'}
WHERE "username" = 'alice'
AND "id" = 76e7a4d0-e796-11e3-90ce-5f98e903bf02; -- 多加几个为了测试排序
UPDATE "user_status_updates"
SET "starred_by_userss" = "starred_by_userss" + {'alice'}
WHERE "username" = 'alice'
AND "id" = 76e7a4d0-e796-11e3-90ce-5f98e903bf02; SELECT "starred_by_userss"
FROM "user_status_updates"
WHERE "username" = 'alice'
AND "id" = 76e7a4d0-e796-11e3-90ce-5f98e903bf02;
查询结果发现,是经过了排序:
starred_by_userss
-----------------------------------
{'alice', 'bob', 'carol', 'dave'}
集合列表List
和上面的差不多,区别是允许重复,并且没有排序。
ALTER TABLE "user_status_updates"
ADD "shared_by" LIST<text>; UPDATE "user_status_updates"
SET "shared_by" = ['bob']
WHERE "username" = 'alice'
AND "id" = 76e7a4d0-e796-11e3-90ce-5f98e903bf02; UPDATE "user_status_updates"
SET "shared_by" = "shared_by" + ['carol']
WHERE "username" = 'alice'
AND "id" = 76e7a4d0-e796-11e3-90ce-5f98e903bf02; UPDATE "user_status_updates"
SET "shared_by" = ['dave'] + "shared_by"
WHERE "username" = 'alice'
AND "id" = 76e7a4d0-e796-11e3-90ce-5f98e903bf02; UPDATE "user_status_updates"
SET "shared_by"[] = 'robert'
WHERE "username" = 'alice'
AND "id" = 76e7a4d0-e796-11e3-90ce-5f98e903bf02; UPDATE "user_status_updates"
SET "shared_by"[] = 'maurice'
WHERE "username" = 'alice'
AND "id" = 76e7a4d0-e796-11e3-90ce-5f98e903bf02; UPDATE "user_status_updates"
SET "shared_by" = "shared_by" - ['carol']
WHERE "username" = 'alice'
AND "id" = 76e7a4d0-e796-11e3-90ce-5f98e903bf02; --删除记录的方法是按照index顺序下标进行删除
DELETE "shared_by"[]
FROM "user_status_updates"
WHERE "username" = 'alice'
AND "id" = 76e7a4d0-e796-11e3-90ce-5f98e903bf02; UPDATE "user_status_updates"
SET "shared_by" = "shared_by" + ['arol']
WHERE "username" = 'alice'
AND "id" = 76e7a4d0-e796-11e3-90ce-5f98e903bf02; -- 查询
SELECT "shared_by"
FROM "user_status_updates"
WHERE "username" = 'alice'
AND "id" = 76e7a4d0-e796-11e3-90ce-5f98e903bf02;
查询结果发现,没有排序:
shared_by
----------------------------
['dave', 'robert', 'arol']
Map
存储键值对,键是唯一和无序的。
ALTER TABLE "users"
ADD social_identities MAP<text,bigint>; UPDATE "users"
SET "social_identities" = {'twitter': 353637}
WHERE "username" = 'alice'; UPDATE "users"
SET "social_identities"['instagram'] = 9839025,
"social_identities"['yo'] = 25
WHERE "username" = 'alice'; UPDATE "users"
SET "social_identities"['twitter'] = 2725634
WHERE "username" = 'alice'; DELETE "social_identities"['instagram']
FROM "users"
WHERE "username" = 'alice'; INSERT INTO "users" (
"username", "email", "encrypted_password",
"social_identities", "version"
) VALUES (
'ivan',
'ivan@gmail.com',
0x48acb738ece5780f37b626a0cb64928b,
{'twitter': 875958, 'instagram': 109550},
NOW()
);
使用TTL
UPDATE users USING TTL <computed_ttl>
SET todo['2012-10-1'] = 'find water' WHERE user_id = 'frodo';
INSERT INTO users
(user_name, password)
VALUES ('cbrown', 'ch@ngem4a') USING TTL 86400;
在设定的computed_ttl数值秒后,数据会自动删除。
使用集合类型要注意:
1.集合的每一项最大是64K。
2.保持集合内的数据不要太大,免得Cassandra 查询延时过长,只因Cassandra 查询时会读出整个集合内的数据,集合在内部不会进行分页,集合的目的是存储小量数据。
3.不要向集合插入大于64K的数据,否则只有查询到前64K数据,其它部分会丢失。
正确的查询姿势
如果查询条件where跟随集合列的时候会报错,是因为没有建立索引
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"
-- 正确的查询姿势,先创建索引
CREATE INDEX ON "user_status_updates" ("starred_by_userss"); SELECT * FROM "user_status_updates"
WHERE "starred_by_userss" CONTAINS 'alice'; -- map类型也是 CREATE INDEX ON "users" (KEYS("social_identities")); SELECT "username", "social_identities"
FROM users
WHERE "social_identities" CONTAINS KEY 'twitter'; SELECT "shared_by"[]
FROM "user_status_updates"
WHERE "username" = 'alice'
AND "id" = 76e7a4d0-e796-11e3-90ce-5f98e903bf02; SELECT "social_identities"['twitter']
FROM "users"
WHERE "username" = 'alice'; SELECT * FROM "user_status_updates"
WHERE "username" = 'alice'
ORDER BY "id" ASC
LIMIT 2; DROP INDEX user_social_identities_idx;
ALTER TABLE "users" DROP social_identities; ALTER TABLE "users" ADD social_identities set<text>;
元组和自定义类型
-- 元组
ALTER TABLE "users"
ADD "education" frozen <tuple<text, int>>; ALTER TABLE "users"
DROP "education"; ALTER TABLE "users"
ADD "education" tuple<text, int>; UPDATE "users"
SET "education" = ('Big Data University', 2019)
WHERE "username" = 'alice'; UPDATE "users"
SET "education" = ('Cassandra College', null, null)
WHERE "username" = 'bob'; UPDATE "users"
SET "education" = ('BDU')
WHERE "username" = 'alice'; UPDATE "users"
SET "education" = ('Big Data University', 2003)
WHERE "username" = 'alice'; CREATE INDEX ON "users" ("education"); SELECT "username", "education" FROM users; SELECT "username", "education" FROM users
WHERE "education" = ('Big Data University', 2003); -- 自定义类型 CREATE TYPE "education_information" (
"school_name" text,
"graduation_year" int
); ALTER TABLE "users" DROP "education"; ALTER TABLE "users"
ADD "education" frozen <"education_information">; UPDATE "users"
SET "education" = {
"school_name": 'Big Data University',
"graduation_year": 2003
}
WHERE "username" = 'alice'; CREATE INDEX ON "users" ("education"); SELECT "username", "education" FROM "users"
WHERE "education" = {
"school_name": 'Big Data University',
"graduation_year": 2003
}; SELECT "username", "education"."school_name"
FROM "users"
WHERE "username" = 'alice'; ALTER TABLE "users"
ADD "telephone_numbers" map<text, set<text>>; ALTER TABLE "users"
ADD "telephone_numbers" map<text, frozen<set<text>>>; UPDATE "users"
SET "telephone_numbers"['home'] = {'', ''}
WHERE "username" = 'alice'; UPDATE "users"
SET "telephone_numbers"['office'] = {'', ''}
WHERE "username" = 'alice'; ALTER TABLE "users"
ADD "education_history" set<frozen<"education_information">>; UPDATE "users"
SET "education_history" = {{
"school_name": 'Big Data University',
"graduation_year": 2003
},{
"school_name": 'Cassandra College',
"graduation_year": 2005
}}
WHERE "username" = 'alice';
时间序列数据库
目前业界时间序列数据库可以分成两类,基于现有的数据库或者专门为时间序列数据写的数据库。
有很多时间序列数据库是基于 Cassandra 的, KairosDB 是其中比较早的一个。 InfluxDB 是专用于时间序列的数据库。
另外还有十几种时间序列数据库,都是基于Cassandra,见https://xephonhq.github.io/awesome-time-series-database/?language=All&backend=Cassandra
一个简单的时间序列数据结构
CREATE TABLE IF NOT EXISTS naive.metrics (
metric_name text, metric_timestamp timestamp, value int,
PRIMARY KEY (metric_name, metric_timestamp))
INSERT INTO naive.metrics (metric_name, metric_timestamp, value) VALUES (cpu, 2017/03/17:13:24:00:20, 10.2)
INSERT INTO naive.metrics (metric_name, metric_timestamp, value) VALUES (mem, 2017/03/17:13:24:00:20, 80.3)
上图显示了使用 Cassandra 存储时间序列数据时 naive 的表结构, Cluster Key 存储时间戳,列的值存储实际的数值。 它 naive 之处在于序列和 Cassandra 的物理行是一一对应的。 当单一序列的数据点超过 Cassandra 的限制(20亿)时就会崩溃。
一个更加成熟的表结构是把一个时间序列按时间范围分区,(KairosDB 按照 3 周来划分,但是可以根据数据量进行不定长的划分)。 为了存储分区的信息,需要一张额外的表。 同时在 naive 里序列的名称只是一个简单的字符串,如果需要按照多种条件进行筛选的话,需要存储更多的键值对,并且对于这些键值对需要建立索引以提高查询速度。
更复杂的例子:
一个双分区列的例子,("status_update_username", "status_update_id")是联合分区列,observed_at是簇分区列,也是时间序列,类型为timeuuid
CREATE TABLE "status_update_views" ( "status_update_username" text,
"status_update_id" timeuuid,
"observed_at" timeuuid,
"client_type" text,
PRIMARY KEY (
("status_update_username", "status_update_id"),
"observed_at"
)
); -- 插入数据
INSERT INTO "status_update_views" (
"status_update_username", "status_update_id",
"observed_at", "client_type"
) VALUES (
'alice', 76e7a4d0-e796-11e3-90ce-5f98e903bf02,
85a53d10-4cc3-11e4-a7ff-5f98e903bf02,
'web'
);
-- 查询
SELECT "observed_at", "client_type"
FROM "status_update_views"
WHERE "status_update_username" = 'alice'
AND "status_update_id" = 76e7a4d0-e796-11e3-90ce-5f98e903bf02
AND "observed_at" >= MINTIMEUUID('2014-10-05 00:00:00+0000')
AND "observed_at" < MINTIMEUUID('2014-10-06 00:00:00+0000');
-- 查询计数
SELECT COUNT(1)
FROM "status_update_views"
WHERE "status_update_username" = 'alice'
AND "status_update_id" = 76e7a4d0-e796-11e3-90ce-5f98e903bf02
AND "observed_at" >= MINTIMEUUID('2014-10-05 00:00:00+0000')
AND "observed_at" < MINTIMEUUID('2014-10-06 00:00:00+0000');
计数表counter
有一些计数类型的应用,比如某个页面被点击了多少次,或9月的每一天,状态更新了多少次。一般地说,我们希望将每日总体视图计数存储在一个结构中,该结构允许我们在给定的时间范围内轻松检索计数。我们不需要存储关于每个视图事件的离散信息;只需知道每天发生了多少视图就足够了。Cassandra非常擅长做这个。
我个人认为这种高性能、低存储空间的计数应用交给Redis会更好,Cassandra有比较多的局限(http://rockthecode.io/blog/highly-available-counters-using-cassandra/),Cassandra还是做它擅长的列存储、时间序列就好了。
-- 注意,counter类型
-- year是分区列,date为簇列 CREATE TABLE "daily_status_update_views" (
"year" int,
"date" timestamp,
"total_views" counter,
"web_views" counter,
"mobile_views" counter,
"api_views" counter,
PRIMARY KEY (("year"), "date")
); SELECT "date", "total_views"
FROM "daily_status_update_views"
WHERE "year" = 2014
AND "date" >= '2014-09-01'
AND "date" < '2014-09-30'; UPDATE "daily_status_update_views"
SET "total_views" = "total_views" + 1,
"web_views" = "web_views" + 1
WHERE "year" = 2014
AND "date" = '2014-10-05 00:00:00+0000'; SELECT * FROM "daily_status_update_views"; -- 在尝试添加的时候会报错,原因是counter表只允许update,不准insert
-- InvalidRequest: Error from server: code=2200 [Invalid query] message="INSERT statements are not allowed on counter tables, use UPDATE instead" INSERT INTO "daily_status_update_views"
("year", "date", "total_views")
VALUES (2014, '2014-02-01 00:00:00+0000', 500); -- 正确的姿势
UPDATE "daily_status_update_views"
SET "total_views" = "total_views" + 500
WHERE "year" = 2014
AND "date" = '2014-02-01 00:00:00+0000'; DELETE FROM "daily_status_update_views"
WHERE "year" = 2014
AND "date" = '2014-02-01 00:00:00+0000'; UPDATE "daily_status_update_views"
SET "total_views" = "total_views" + 100
WHERE "year" = 2014
AND "date" = '2014-02-01 00:00:00+0000'; -- 在尝试修改表定义的时候会报错,只能增加counter类型的列
-- ConfigurationException: Cannot add a non counter column (last_view_time) in a counter column family ALTER TABLE "daily_status_update_views"
ADD "last_view_time" timestamp;
用户定义函数
比较简单,不多说了。感觉应用的地方不多。
CREATE OR REPLACE FUNCTION selectCity(location text)
CALLED ON NULL INPUT
RETURNS text
LANGUAGE java
AS '
if (location == null)
return null;
else
return location.split(",")[0];
'; SELECT username, selectCity(location) FROM "users"; CREATE OR REPLACE FUNCTION selectCity(location text)
RETURNS NULL ON NULL INPUT
RETURNS text
LANGUAGE java
AS '
return location.split(",")[0];
'; INSERT INTO "status_update_views" ("status_update_username", "status_update_id", "observed_at", "client_type") VALUES ('alice', 76e7a4d0-e796-11e3-90ce-5f98e903bf02, NOW(), 'web');
INSERT INTO "status_update_views" ("status_update_username", "status_update_id", "observed_at", "client_type") VALUES ('alice', 76e7a4d0-e796-11e3-90ce-5f98e903bf02, NOW(), 'web');
INSERT INTO "status_update_views" ("status_update_username", "status_update_id", "observed_at", "client_type") VALUES ('alice', 76e7a4d0-e796-11e3-90ce-5f98e903bf02, NOW(), 'mobile');
INSERT INTO "status_update_views" ("status_update_username", "status_update_id", "observed_at", "client_type") VALUES ('alice', 76e7a4d0-e796-11e3-90ce-5f98e903bf02, NOW(), 'mobile');
INSERT INTO "status_update_views" ("status_update_username", "status_update_id", "observed_at", "client_type") VALUES ('alice', 76e7a4d0-e796-11e3-90ce-5f98e903bf02, NOW(), 'api'); CREATE OR REPLACE FUNCTION state_group_and_count (state map<text, int>, client_type text)
CALLED ON NULL INPUT
RETURNS map<text, int>
LANGUAGE java AS '
Integer count = (Integer) state.get(client_type);
if (count == null)
count = 1;
else
count++;
state.put(client_type, count);
return state;
'; CREATE OR REPLACE AGGREGATE group_and_count (text)
SFUNC state_group_and_count
STYPE map<text, int>
INITCOND {}; SELECT status_update_username, status_update_id, group_and_count(client_type)
FROM status_update_views
WHERE status_update_username='alice' AND status_update_id=76e7a4d0-e796-11e3-90ce-5f98e903bf02; SELECT status_update_username, status_update_id, group_and_count(client_type)
FROM status_update_views
WHERE status_update_username='alice' AND status_update_id=76e7a4d0-e796-11e3-90ce-5f98e903bf02
AND "observed_at" >= MINTIMEUUID('2016-12-21 00:00:00+0000')
AND "observed_at" < MINTIMEUUID('2016-12-22 00:00:00+0000');
Cassandra开发入门文档第四部分(集合类型、元组类型、时间序列、计数列)的更多相关文章
- Cassandra开发入门文档第一部分
Cassandra的特点 横向可扩展性: Cassandra部署具有几乎无限的存储和处理数据的能力.当需要额外的容量时,可以简单地将更多的机器添加到集群中.当新机器加入集群时,Cassandra需要对 ...
- Cassandra开发入门文档第五部分(使用场景)
正确建模 开发人员在构建Cassandra数据库时犯的另一个主要错误是分区键的选择不佳.cassandra是分布式的.这意味着您需要有一种方法来跨节点分布数据.Cassandra通过散列每个表的主键( ...
- Cassandra开发入门文档第三部分(非规范化关系结构、批处理)
非规范化关系结构 第二部分我们讲了复合主键,这可以灵活的解决主从关系,也即是一对多关系,那么多对多关系呢?多对多关系的数据模型应该回答两个问题: 我跟着谁? 谁跟着我? -- 建表,我们发现这里有个不 ...
- Cassandra开发入门文档第二部分(timeuuid类型、复合主键、静态字段详解)
timeuuid类型 timeuuid具有唯一索引和日期时间的综合特性,可以与日期和时间函数联合使用,常用的关联函数: dateOf() now() minTimeuuid() and maxTime ...
- Solr开发参考文档(转)
Solr开发文档 Solr 是一种可供企业使用的.基于 Lucene 的搜索服务器,它支持层面搜索.命中醒目显示和多种输出格式.在这篇文章中,将介绍 Solr 并展示如何轻松地将其表现优异的全文本搜索 ...
- Apache BeanUtils 1.9.2 官方入门文档
为什么需要Apache BeanUtils? Apache BeanUtils 是 Apache开源软件组织下面的一个项目,被广泛使用于Spring.Struts.Hibernate等框架,有数千个j ...
- 【简明翻译】Hibernate 5.4 Getting Started Guide 官方入门文档
前言 最近的精力主要集中在Hibernate上,在意识到Hibernate 5 的中文资料并不多的时候,我不得不把目光转向Hibernate的官方doc,学习之余简要翻一下入门文档. 原文地址:htt ...
- Duilib入门文档提供下载
版权声明:本文为博主原创文章,未经博主允许不得转载. 目录(?)[-] Duilib入门文档 基本框架 编写界面xml 响应事件 贴图描述 类html文本描述 动态换肤 Dll插件 资源打包 Duil ...
- 2022最新IntellJ IDEA诺依开发部署文档
前景提示 若伊是国内一款很好的开源项目,非常的便于学习,而且它是开源免费的,但是,它的开发部署文档实在是没法按照那个文档,快速高效的在本地搭建一套可以运行的项目,对于学习开发和使用实在是一大难题,为此 ...
随机推荐
- MariaDB日志文件、备份与恢复
1. 数据库的6种日志 数据库有6种日志,分别是:查询日志.慢查询日志.错误日志.二进制日志.中继日志以及事务日志. 1> 查询日志 查询日志记录每一条sql语句,建议不开启,因为如果访问量较大 ...
- 软工团队第三次作业——编码组Alpha版本
众志陈成组 柚荐--Alpha版本 编码部分 一.编码思路 思维导图如下 二.下载及操作方法 1.下载地址 GitHub地址:https://github.com/NyimaC/YouSuggest ...
- django项目中使用邮箱找回密码功能
本文使用qq邮箱,需要登录邮箱,在设置-账户里面开启SMTP服务,要记下授权码 前端html {#找回密码的表单#} <form action="" method=" ...
- php怎样应对高并发
高并发下的数据安全 我们知道在多线程写入同一个文件的时候,会出现“线程安全”的问题(多个线程同时运行同一段代码,如果每次运行结果和单线程运行的结果是一样的,结果和预期相同,就是线程安全的). 如果是M ...
- VScode 配置 C++ 环境进行编译和调试
这里记录为 VScode 配置 C++ 环境的简单步骤,实践环境为 Ubuntu 18.04 ,VScode 1.27 .在 Ubuntu 环境下,系统默认安装 gcc 和 g++ 编译器,故而下列步 ...
- 基于数组的栈(Java)
package com.rao.linkList; /** * @author Srao * @className ArrayStack * @date 2019/12/3 13:41 * @pack ...
- three arrays HDU - 6625 (字典树)
three arrays \[ Time Limit: 2500 ms \quad Memory Limit: 262144 kB \] 题意 给出 \(a\),\(b\) 数组,定义数组 \(c[i ...
- LeetCode 286. Walls and Gates
原题链接在这里:https://leetcode.com/problems/walls-and-gates/ 题目: You are given a m x n 2D grid initialized ...
- Python爬虫 | xpath的安装
错误信息:程序包无效.详细信息:“Cannot load extension with file or directory name . Filenames starting with "& ...
- Mac 无法安装安装psutil 报错 error: command '/usr/bin/clang' failed with exit status 1
psutil是一个特别好用来检查系统资源的一个包, 但是 在Mac安装却总是报错 查看监控系统脚本, 点这里 mac系统版本: Macos Mojave 10.14.3 报错信息如下: WARNING ...