druid相关的时间序列数据库——也用到了倒排相关的优化技术

Cattell [6] maintains a great summary about existing Scalable SQL and NoSQL data stores. Hu [18] contributed another great summary for streaming databases. Druid feature-wise sits some-

where between Google’s Dremel [28] and PowerDrill [17]. Druid has most of the features implemented in Dremel (Dremel handles arbitrary nested data structures while Druid only allows for a single

level of array-based nesting) and many of the interesting compression algorithms mentioned in PowerDrill. Although Druid builds on many of the same principles as other distributed columnar data stores [15], many of these data stores are

designed to be more generic key-value stores [23] and do not sup

port computation directly in the storage layer. There are also other

data stores designed for some of the same data warehousing issues

that Druid is meant to solve. These systems include in-memory

databases such as SAP’s HANA [14] and VoltDB [43]. These data

stores lack Druid’slowlatency ingestion characteristics. Druidalso

has native analytical features baked in, similar to ParAccel [34],

however, Druid allows system wide rolling software updates with

no downtime.

Druid is similiar to C-Store [38] and LazyBase [8] in that it has

twosubsystems,aread-optimizedsubsysteminthehistoricalnodes

andawrite-optimizedsubsysteminreal-timenodes. Real-timenodes

are designed to ingest a high volume of append heavy data, and do

not support data updates. Unlike the two aforementioned systems,

Druid is meant for OLAP transactions and not OLTP transactions.

Druid’s low latency data ingestion features share some similar-

ities with Trident/Storm [27] and Spark Streaming [45], however,

both systems are focused on stream processing whereas Druid is

focused on ingestion and aggregation. Stream processors are great

complements to Druid as a means of pre-processing the data before

the data enters Druid.

There are a class of systems that specialize in queries on top of

cluster computing frameworks. Shark [13] is such a system for

queriesontopofSpark,andCloudera’sImpala[9]isanothersystem

focused on optimizing query performance on top of HDFS. Druid

historical nodes download data locally and only work with native

Druid indexes. We believe this setup allows for faster query laten

cies.

Druid leverages a unique combination of algorithms in its archi-

tecture. Although we believe no other data store has the same set

of functionality as Druid, some of Druid’s optimization techniques

suchas using inverted indices to perform fast filter sarealsousedin

other data stores [26].

druid白皮书：http://static.druid.io/docs/druid.pdf

druid相关的时间序列数据库——也用到了倒排相关的优化技术的更多相关文章

时间序列数据库(TSDB)初识与选择(InfluxDB、OpenTSDB、Druid、Elasticsearch对比)
背景这两年互联网行业掀着一股新风,总是听着各种高大上的新名词.大数据.人工智能.物联网.机器学习.商业智能.智能预警啊等等. 以前的系统,做数据可视化,信息管理,流程控制.现在业务已经不仅仅满足于这 ...
时间序列数据库(TSDB)初识与选择
时间序列数据库(TSDB)初识与选择本文作者由 MageByte 团队的「借来方向」编写,关注公众号给你更多硬核技术背景这两年互联网行业掀着一股新风,总是听着各种高大上的新名词.大数据.人工 ...
OpenTSDB介绍——基于Hbase的分布式的，可伸缩的时间序列数据库，而Hbase本质是列存储
原文链接:http://www.jianshu.com/p/0bafd0168647 OpenTSDB介绍 1.1.OpenTSDB是什么?主要用途是什么? 官方文档这样描述:OpenTSDB is ...
时间序列数据库武斗大会之 KairosDB 篇
[编者按] 刘斌,OneAPM后端研发工程师,拥有10多年编程经验,参与过大型金融.通信以及Android手机操作系的开发,熟悉Linux及后台开发技术.曾参与翻译过<第一本Docker书> ...
时间序列数据库概览——基于文件（RRD）、K/V数据库（influxDB）、关系型数据库
一般人们谈论时间序列数据库的时候指代的就是这一类存储.按照底层技术不同可以划分为三类. 直接基于文件的简单存储:RRD Tool,Graphite Whisper.这类工具附属于监控告警工具,底层没有 ...
[转帖]时间序列数据库 (TSDB)
时间序列数据库 (TSDB) https://www.jianshu.com/p/31afb8492eff 0.3392019.01.28 10:51:33字数 5598阅读 4030 背景 2017 ...
Akumuli时间序列数据库——列存储，LSM，MVCC
Features Column-oriented time-series database. Log-structured append-only B+tree with multiversion c ...
时间序列数据库选型——本质是列存储，B-tree索引，抑或是搜索引擎中的倒排索引
时间序列数据库最多,使用也最广泛.一般人们谈论时间序列数据库的时候指代的就是这一类存储.按照底层技术不同可以划分为三类. 直接基于文件的简单存储:RRD Tool,Graphite Whisper.这 ...
Python 数据分析（二本实验将学习利用 Python 数据聚合与分组运算，时间序列，金融与经济数据应用等相关知识
Python 数据分析(二) 本实验将学习利用 Python 数据聚合与分组运算,时间序列,金融与经济数据应用等相关知识第1节 groupby 技术第2节数据聚合第3节分组级运算和转换第4 ...

随机推荐

安装nagios
第二部分.apache的安装 615 tar zxvf httpd-2.2.9.tar.gz 616 cd httpd-2.2.9 617 ./configure --prefix=/u ...
使用FDTemplateLayout框架打造个性App
效果展示 project下载地址 · 进入构建结构首先我们新建一个project 接下来我们拖进来一个Table View Controller,将Storyboard Entry Point指向我 ...
shell循环，判断介绍，以及实例
shell的循环主要有3种,for,while,until shell的分支判断主要有2种,if,case 一,for循环 #!/bin/bash for file in $(ls /tmp/test ...
Build Your Hexo Blog (On Github)
超简单,比jekyll好多了! 看个Demo http://kevinjmh.github.io/ 了解Hexo Hexo是一个由Node.js驱动的,简单.快速.强大的Blog框架.可以快速的生成静 ...
java中的值传递和引用传递区别
值传递:(形式参数类型是基本数据类型):方法调用时,实际参数把它的值传递给对应的形式参数,形式参数只是用实际参数的值初始化自己的存储单元内容,是两个不同的存储单元,所以方法执行中形式参数值的改变不影响 ...
Oracle 优化器
http://blog.csdn.net/it_man/article/details/8185370一.优化器基本知识 Oracle在执行一个SQL之前,首先要分析一下语句的执行计划,然后再按执 ...
UIView属性的动画
//标记着动画块的开始,第一个参数表示动画的名字,起到标识作用 [UIView beginAnimations:nil context:NULL]; [UIView setAnimationDurat ...
ORACLE schedule job设置
--创建job begin DBMS_SCHEDULER.CREATE_JOB ( job_name => 'APICALL_LOG_INTERFACE_JOB', job_type => ...
小东和三个朋友一起在楼上抛小球，他们站在楼房的不同层，假设小东站的楼层距离地面N米，球从他手里自由落下，每次落地后反跳回上次下落高度的一半，并以此类推知道全部落到地面不跳，求4个小球一共经过了多少米？(数字都为整数) 给定四个整数A,B,C,D，请返回所求结果。
include #include<vector> using namespace std; class Balls { public: int calcDistance(int A, in ...
HDFS源码分析之EditLogTailer
在FSNamesystem中,有这么一个成员变量,定义如下: /** * Used when this NN is in standby state to read from the shared e ...

druid相关的时间序列数据库——也用到了倒排相关的优化技术

druid相关的时间序列数据库——也用到了倒排相关的优化技术的更多相关文章

随机推荐

热门专题