druid查询

查询是发送HTTP请求到,Broker, Historical或者Realtime节点。查询的JSON表达和每种节点类型公开相同的查询接口。

Queries are made using an HTTP REST style request to a Broker, Historical, or Realtime node. The query is expressed in JSON and each of these node types expose the same REST query interface.

We start by describing an example query with additional comments that mention possible variations. Query operators are also summarized in a table below.

Example Query "rand"

Here is the query in the examples/rand subproject (file is query.body), followed by a commented version of the same.

{
"queryType":
"groupBy",
"dataSource":
"randSeq",
"granularity": "all",
"dimensions": [],
"aggregations": [
    { "type":
"count", "name": "rows" },
    { "type":
"doubleSum", "fieldName": "events",
"name": "e" },
    { "type":
"doubleSum", "fieldName": "outColumn",
"name": "randomNumberSum" }
],
"postAggregations":
[{
     "type":
"arithmetic",
     "name":
"avg_random",
     "fn":
"/",
     "fields": [
       { "type":
"fieldAccess", "fieldName": "randomNumberSum"
},
       { "type":
"fieldAccess", "fieldName": "rows" }
     ]
}],
"intervals":
["2012-10-01T00:00/2020-01-01T00"]
}

This query could be
submitted via curl like so (assuming the query object is in a file
"query.json").

curl -X POST "http://host:port/druid/v2/?pretty"
-H 'content-type: application/json' -d @query.json

The
"pretty" query parameter gets the results formatted a bit nicer.

Details of Example Query "rand"

The queryType JSON
field identifies which kind of query operator is to be used, in this case it is
groupBy, the most frequently used kind (which corresponds to an internal
implementation class GroupByQuery registered as "groupBy"), and it
has a set of required fields that are also part of this query. The queryType
can also be "search" or "timeBoundary" which have similar
or different required fields summarized below:

{
"queryType":
"groupBy",

The dataSource JSON
field shown next identifies where to apply the query. In this case, randSeq
corresponds to the examples/rand/rand_realtime.spec file schema:

"dataSource": "randSeq",

The granularity JSON
field specifies the bucket size for values. It could be a built-in time
interval like "second", "minute",
"fifteen_minute", "thirty_minute", "hour" or
"day". It can also be an expression like {"type":
"period", "period":"PT6m"} meaning "6 minute
buckets". See Granularities
for more information on the different options for this field. In this example,
it is set to the special value "all" which means bucket all data points
together into the same time bucket

"granularity": "all",

The dimensions JSON
field value is an array of zero or more fields as defined in the dataSource
spec file or defined in the input records and carried forward. These are used
to constrain the grouping. If empty, then one value per time granularity bucket
is requested in the groupBy:

"dimensions": [],

A groupBy also
requires the JSON field "aggregations" (See Aggregations), which
are applied to the column specified by fieldName and the output of the
aggregation will be named according to the value in the "name" field:

"aggregations": [
    { "type":
"count", "name": "rows" },
    { "type":
"doubleSum", "fieldName": "events",
"name": "e" },
    { "type":
"doubleSum", "fieldName": "outColumn", "name":
"randomNumberSum" }
],

You can also specify
postAggregations, which are applied after data has been aggregated for the
current granularity and dimensions bucket. See Post Aggregations
for a detailed description. In the rand example, an arithmetic type operation
(division, as specified by "fn") is performed with the result
"name" of "avg_random". The "fields" field
specifies the inputs from the aggregation stage to this expression. Note that
identifiers corresponding to "name" JSON field inside the type
"fieldAccess" are required but not used outside this expression, so
they are prefixed with "dummy" for clarity:

"postAggregations": [{
     "type":
"arithmetic",
     "name":
"avg_random",
     "fn":
"/",
     "fields": [
       { "type":
"fieldAccess", "fieldName": "randomNumberSum"
},
       { "type":
"fieldAccess", "fieldName": "rows" }
     ]
}],

The time range(s) of
the query; data outside the specified intervals will not be used; this example
specifies from October 1, 2012 until January 1, 2020:

"intervals":
["2012-10-01T00:00/2020-01-01T00"]
}

Query Operators

The following table
summarizes query properties.

Properties shared by
all query types

property	description	required?
dataSource	query is applied to this data source	yes
intervals	range of time series to include in query	yes
context	This is a key-value map used to alter some of the behavior of a query. See Query Context below	no

query type	property	description	required?
timeseries, topN, groupBy, search	filter	Specifies the filter (the "WHERE" clause in SQL) for the query. See Filters	no
timeseries, topN, groupBy, search	granularity	the timestamp granularity to bucket results into (i.e. "hour"). See Granularities for more information.	no
timeseries, topN, groupBy	aggregations	aggregations that combine values in a bucket. See Aggregations.	yes
timeseries, topN, groupBy	postAggregations	aggregations of aggregations. See Post Aggregations.	yes
groupBy	dimensions	constrains the groupings; if empty, then one value per time granularity bucket	yes
search	limit	maximum number of results (default is 1000), a system-level maximum can also be set via com.metamx.query.search.maxSearchLimit	no
search	searchDimensions	Dimensions to apply the search query to. If not specified, it will search through all dimensions.	no
search	query	The query portion of the search query. This is essentially a predicate that specifies if something matches.	yes

Query Context

property	default	description
timeout	0 (no timeout)	Query timeout in milliseconds, beyond which unfinished queries will be cancelled
priority	0	Query Priority. Queries with higher priority get precedence for computational resources.
queryId	auto-generated	Unique identifier given to this query. If a query ID is set or known, this can be used to cancel the query
useCache	true	Flag indicating whether to leverage the query cache for this query. This may be overriden in the broker or historical node configuration
populateCache	true	Flag indicating whether to save the results of the query to the query cache. Primarily used for debugging. This may be overriden in the broker or historical node configuration
bySegment	false	Return "by segment" results. Pimarily used for debugging, setting it to true returns results associated with the data segment they came from
finalize	true	Flag indicating whether to "finalize" aggregation results. Primarily used for debugging. For instance, the hyperUnique aggregator will return the full HyperLogLog sketch instead of the estimated cardinality when this flag is set to false

Query Cancellation

Queries can be
cancelled explicitely using their unique identifier. If the query identifier is
set at the time of query, or is otherwise known, the following endpoint can be
used on the broker or router to cancel the query.

DELETE
/druid/v2/{queryId}

For example, if the
query ID is abc123, the query can be cancelled as follows:

curl -X DELETE "http://host:port/druid/v2/abc123"

druid查询的更多相关文章

Druid 查询超时配置的探究 → DataSource 和 JdbcTemplate 的 queryTimeout 到底谁生效？
开心一刻昨晚跟我妈语音妈:我年纪有点大了,想抱孩子了我:妈,我都多大了,你还想抱我? 妈:我想抱小孩,谁乐意抱你呀! 我:刚好小区有人想找月嫂,要不我帮你联系下? 妈:你给我滚然后她直接把语音 ...
【转载】DRuid 大数据分析之查询
转载自http://yangyangmyself.iteye.com/blog/2321759 1.Druid 查询概述上一节完成数据导入后,接下来讲讲Druid如何查询及统计分析导入的数据 ...
Druid.io系列（五）：查询过程
原文链接: https://blog.csdn.net/njpjsoftdev/article/details/52956194 Druid使用JSON over HTTP 作为底层的查询语言,不过强 ...
Druid 0.17入门（4）—— 数据查询方式大全
本文介绍Druid查询数据的方式,首先我们保证数据已经成功载入. Druid查询基于HTTP,Druid提供了查询视图,并对结果进行了格式化. Druid提供了三种查询方式,SQL,原生JSON,CU ...
Druid学习之查询语法
写在前面最近一段时间都在做druid实时数据查询的工作,本文简单将官网上的英文文档加上自己的理解翻译成中文,同时将自己遇到的问题及解决方法list下,防止遗忘. 本文的demo示例均来源于官网. D ...
快速了解Druid——实时大数据分析软件
Druid 是什么 Druid 单词来源于西方古罗马的神话人物,中文常常翻译成德鲁伊. 本问介绍的Druid 是一个分布式的支持实时分析的数据存储系统(Data Store).美国广告技术公司Met ...
druid.io 海量实时OLAP数据仓库 (翻译+总结) (1)
介绍我是NDPmedia公司的大数据OLAP的资深高级工程师, 专注于OLAP领域, 现将一个成熟的可靠的高性能的海量实时OLAP数据仓库介绍给大家: druid.io NDPmedia在2014年 ...
druid.io 海量实时OLAP数据仓库 (翻译+总结) (1)——分析框架如hive或者redshift（MPPDB）、ES等
介绍我是NDPmedia公司的大数据OLAP的资深高级工程师, 专注于OLAP领域, 现将一个成熟的可靠的高性能的海量实时OLAP数据仓库介绍给大家: druid.io NDPmedia在2014年 ...
[转帖]OLAP引擎这么多，为什么苏宁选择用Druid？
OLAP引擎这么多,为什么苏宁选择用Druid? 原创 51CTO 2018-12-21 11:24:12 [51CTO.com原创稿件]随着公司业务增长迅速,数据量越来越大,数据的种类也越来越丰富, ...

随机推荐

好公司、行业、领导？应届生应根据什么选offer？
两个年轻人大学毕业了,一个去了收入更高的大企业工作,一个去了收入较低的小作坊式工厂工作.你们说他们谁的青春时光最能升值呢?表面上看应该是大企业,可是大企业是做马车制造的,小作坊是做汽车的.现在人们都知 ...
weui.css中flex容器下子项目的水平和垂直居中
想用weui.css写微信平台的页面,发现没有让flex(weui-flex)容器下,子项目(weui-flex__item)居中的类. 百度了一下,是用justify-content:center; ...
mp4文件格式解析
目前MP4的概念被炒得很火,也很乱.最开始MP4指的是音频(MP3的升级版),即MPEG-2 AAC标准.随后MP4概念被转移到视频上,对应的是MPEG-4标准.而现在我们流行的叫法,多半是指能播放M ...
推荐三款日期选择插件(My97DatePicker+jquery.datepicker+Mobiscroll)
1.My97DatePicker 纯原生JS,专注于PC端,支持IE6+:页面上只需要引入WdatePicker.js文件,但是My97DatePicker整个目录是一个整体,最好不要破坏里面的目录结 ...
存储结构与邻接矩阵，深度优先和广度优先遍历及Java实现
如果看完本篇博客任有不明白的地方,可以去看一下<大话数据结构>的7.4以及7.5,讲得比较易懂,不过是用C实现下面内容来自segmentfault 存储结构要存储一个图,我们知道图既有 ...
JVM中堆内存和栈内存的区别
Java把内存分成两种,一种叫做栈内存,一种叫做堆内存在函数中定义的一些基本类型的变量和对象的引用变量都是在函数的栈内存中分配.当在一段代码块中定义一个变量时,java就在栈中为这个变量分配内存空间 ...
WebX框架学习笔记之二----框架搭建及请求的发起和处理
框架搭建执行环境:windows.maven 执行步骤: 1.新建一个目录,例如:D:\workspace.注意在盘符目录下是无法执行成功的. 2.执行如下命令: mvn archetype:gen ...
知问前端——html+jq+jq_ui+ajax
**************************************************************************************************** ...
HTML 部分常用属性、组成属性|...超链接、路径、锚点、列表、滚动、URL编码、表格、表单、GET和POST
URL地址就是我们所说的网址:www.jd.com 浏览器内核,渲染引擎 Ie内核:triteent 谷歌/欧鹏:blink 火狐:gecko 苹果:webkit 渲染引擎是出现兼容性的根本问题 - ...
模块中为什么要加__name__ == "__main__"
写一个hello模块 #!/usr/sbin/env python #-*- coding:utf- -*- print "我是hello模块,我被执行了" 在另一个python程 ...

druid查询

druid查询的更多相关文章

随机推荐

热门专题