查询是发送HTTP请求到,Broker, Historical或者Realtime节点。查询的JSON表达和每种节点类型公开相同的查询接口。

Queries are made using an HTTP REST style request to a Broker, Historical, or Realtime node. The query is expressed in JSON and each of these node types expose the same REST query interface.

We start by describing an example query with additional comments that mention possible variations. Query operators are also summarized in a table below.

Example Query "rand"

Here is the query in the examples/rand subproject (file is query.body), followed by a commented version of the same.

{
  "queryType":
"groupBy",
  "dataSource":
"randSeq",
  "granularity": "all",
  "dimensions": [],
  "aggregations": [
    { "type":
"count", "name": "rows" },
    { "type":
"doubleSum", "fieldName": "events",
"name": "e" },
    { "type":
"doubleSum", "fieldName": "outColumn",
"name": "randomNumberSum" }
  ],
  "postAggregations":
[{
     "type":
"arithmetic",
     "name":
"avg_random",
     "fn":
"/",
     "fields": [
       { "type":
"fieldAccess", "fieldName": "randomNumberSum"
},
       { "type":
"fieldAccess", "fieldName": "rows" }
     ]
  }],
  "intervals":
["2012-10-01T00:00/2020-01-01T00"]
}

This query could be
submitted via curl like so (assuming the query object is in a file
"query.json").

curl -X POST "http://host:port/druid/v2/?pretty"
-H 'content-type: application/json' -d @query.json

The
"pretty" query parameter gets the results formatted a bit nicer.

Details of Example Query "rand"

The queryType JSON
field identifies which kind of query operator is to be used, in this case it is
groupBy, the most frequently used kind (which corresponds to an internal
implementation class GroupByQuery registered as "groupBy"), and it
has a set of required fields that are also part of this query. The queryType
can also be "search" or "timeBoundary" which have similar
or different required fields summarized below:

{
  "queryType":
"groupBy",

The dataSource JSON
field shown next identifies where to apply the query. In this case, randSeq
corresponds to the examples/rand/rand_realtime.spec file schema:

"dataSource": "randSeq",

The granularity JSON
field specifies the bucket size for values. It could be a built-in time
interval like "second", "minute",
"fifteen_minute", "thirty_minute", "hour" or
"day". It can also be an expression like {"type":
"period", "period":"PT6m"} meaning "6 minute
buckets". See Granularities
for more information on the different options for this field. In this example,
it is set to the special value "all" which means bucket all data points
together into the same time bucket

"granularity": "all",

The dimensions JSON
field value is an array of zero or more fields as defined in the dataSource
spec file or defined in the input records and carried forward. These are used
to constrain the grouping. If empty, then one value per time granularity bucket
is requested in the groupBy:

"dimensions": [],

A groupBy also
requires the JSON field "aggregations" (See Aggregations), which
are applied to the column specified by fieldName and the output of the
aggregation will be named according to the value in the "name" field:

"aggregations": [
    { "type":
"count", "name": "rows" },
    { "type":
"doubleSum", "fieldName": "events",
"name": "e" },
    { "type":
"doubleSum", "fieldName": "outColumn", "name":
"randomNumberSum" }
  ],

You can also specify
postAggregations, which are applied after data has been aggregated for the
current granularity and dimensions bucket. See Post Aggregations
for a detailed description. In the rand example, an arithmetic type operation
(division, as specified by "fn") is performed with the result
"name" of "avg_random". The "fields" field
specifies the inputs from the aggregation stage to this expression. Note that
identifiers corresponding to "name" JSON field inside the type
"fieldAccess" are required but not used outside this expression, so
they are prefixed with "dummy" for clarity:

"postAggregations": [{
     "type":
"arithmetic",
     "name":
"avg_random",
     "fn":
"/",
     "fields": [
       { "type":
"fieldAccess", "fieldName": "randomNumberSum"
},
       { "type":
"fieldAccess", "fieldName": "rows" }
     ]
  }],

The time range(s) of
the query; data outside the specified intervals will not be used; this example
specifies from October 1, 2012 until January 1, 2020:

"intervals":
["2012-10-01T00:00/2020-01-01T00"]
}

Query Operators

The following table
summarizes query properties.

Properties shared by
all query types

property

description

required?

dataSource

query is applied to
this data source

yes

intervals

range of time
series to include in query

yes

context

This is a key-value
map used to alter some of the behavior of a query. See Query Context
below

no

query type

property

description

required?

timeseries, topN,
groupBy, search

filter

Specifies the
filter (the "WHERE" clause in SQL) for the query. See Filters

no

timeseries, topN,
groupBy, search

granularity

the timestamp
granularity to bucket results into (i.e. "hour"). See Granularities for
more information.

no

timeseries, topN,
groupBy

aggregations

aggregations that
combine values in a bucket. See Aggregations.

yes

timeseries, topN,
groupBy

postAggregations

aggregations of
aggregations. See Post
Aggregations
.

yes

groupBy

dimensions

constrains the
groupings; if empty, then one value per time granularity bucket

yes

search

limit

maximum number of
results (default is 1000), a system-level maximum can also be set via
com.metamx.query.search.maxSearchLimit

no

search

searchDimensions

Dimensions to apply
the search query to. If not specified, it will search through all dimensions.

no

search

query

The query portion
of the search query. This is essentially a predicate that specifies if
something matches.

yes

Query Context

property

default

description

timeout

0 (no timeout)

Query timeout in
milliseconds, beyond which unfinished queries will be cancelled

priority

0

Query Priority.
Queries with higher priority get precedence for computational resources.

queryId

auto-generated

Unique identifier
given to this query. If a query ID is set or known, this can be used to
cancel the query

useCache

true

Flag indicating
whether to leverage the query cache for this query. This may be overriden in
the broker or historical node configuration

populateCache

true

Flag indicating
whether to save the results of the query to the query cache. Primarily used
for debugging. This may be overriden in the broker or historical node
configuration

bySegment

false

Return "by
segment" results. Pimarily used for debugging, setting it to true
returns results associated with the data segment they came from

finalize

true

Flag indicating
whether to "finalize" aggregation results. Primarily used for
debugging. For instance, the hyperUnique aggregator will return the full
HyperLogLog sketch instead of the estimated cardinality when this flag is set
to false

Query Cancellation

Queries can be
cancelled explicitely using their unique identifier. If the query identifier is
set at the time of query, or is otherwise known, the following endpoint can be
used on the broker or router to cancel the query.

DELETE
/druid/v2/{queryId}

For example, if the
query ID is abc123, the query can be cancelled as follows:

curl -X DELETE "http://host:port/druid/v2/abc123"

druid查询的更多相关文章

  1. Druid 查询超时配置的探究 → DataSource 和 JdbcTemplate 的 queryTimeout 到底谁生效?

    开心一刻 昨晚跟我妈语音 妈:我年纪有点大了,想抱孩子了 我:妈,我都多大了,你还想抱我? 妈:我想抱小孩,谁乐意抱你呀! 我:刚好小区有人想找月嫂,要不我帮你联系下? 妈:你给我滚 然后她直接把语音 ...

  2. 【转载】DRuid 大数据分析之查询

    转载自http://yangyangmyself.iteye.com/blog/2321759 1.Druid 查询概述     上一节完成数据导入后,接下来讲讲Druid如何查询及统计分析导入的数据 ...

  3. Druid.io系列(五):查询过程

    原文链接: https://blog.csdn.net/njpjsoftdev/article/details/52956194 Druid使用JSON over HTTP 作为底层的查询语言,不过强 ...

  4. Druid 0.17入门(4)—— 数据查询方式大全

    本文介绍Druid查询数据的方式,首先我们保证数据已经成功载入. Druid查询基于HTTP,Druid提供了查询视图,并对结果进行了格式化. Druid提供了三种查询方式,SQL,原生JSON,CU ...

  5. Druid学习之查询语法

    写在前面 最近一段时间都在做druid实时数据查询的工作,本文简单将官网上的英文文档加上自己的理解翻译成中文,同时将自己遇到的问题及解决方法list下,防止遗忘. 本文的demo示例均来源于官网. D ...

  6. 快速了解Druid——实时大数据分析软件

    Druid 是什么 Druid 单词来源于西方古罗马的神话人物,中文常常翻译成德鲁伊.  本问介绍的Druid 是一个分布式的支持实时分析的数据存储系统(Data Store).美国广告技术公司Met ...

  7. druid.io 海量实时OLAP数据仓库 (翻译+总结) (1)

    介绍 我是NDPmedia公司的大数据OLAP的资深高级工程师, 专注于OLAP领域, 现将一个成熟的可靠的高性能的海量实时OLAP数据仓库介绍给大家: druid.io NDPmedia在2014年 ...

  8. druid.io 海量实时OLAP数据仓库 (翻译+总结) (1)——分析框架如hive或者redshift(MPPDB)、ES等

    介绍 我是NDPmedia公司的大数据OLAP的资深高级工程师, 专注于OLAP领域, 现将一个成熟的可靠的高性能的海量实时OLAP数据仓库介绍给大家: druid.io NDPmedia在2014年 ...

  9. [转帖]OLAP引擎这么多,为什么苏宁选择用Druid?

    OLAP引擎这么多,为什么苏宁选择用Druid? 原创 51CTO 2018-12-21 11:24:12 [51CTO.com原创稿件]随着公司业务增长迅速,数据量越来越大,数据的种类也越来越丰富, ...

随机推荐

  1. Java代理(静态/动态 JDK,cglib)

    Java的代理模式是应用非常广泛的设计模式之一,也叫作委托模式,其目的就是为其他的对象提供一个代理以控制对某个对象的访问和使用,代理类负责为委托类预处理消息,过滤消息并转发消息,以及对消息执行后续处理 ...

  2. 【js】基本类型和引用类型的区别

    1.保存方式:(一脸懵逼???) 基本类型是按值访问的,可以在变量的生命周期改变它,但是它是储存在哪里的呢?在浏览器缓存吗?[执行环境中定义的所有变量和函数都存储在执行环境的变量对象里,变量对象我们编 ...

  3. 【Unity优化】构建一个拒绝GC的List

    版权声明:本文为博主原创文章,欢迎转载.请保留博主链接:http://blog.csdn.net/andrewfan 上篇文章<[Unity优化]Unity中究竟能不能使用foreach?> ...

  4. html基础知识1(基本标签)2017-03-07

    摘要:php基础知识1 内容:大学中虽有接触,却是以学生的心态去应付考试的,学的都是理论知识:从今天开始我同样还是要以学生的心态去学习,但却要以要从事工作的心态去练习. 以下为第一天所学内容,因电脑原 ...

  5. 3386/1752: [Usaco2004 Nov]Til the Cows Come Home 带奶牛回家

    3386/1752: [Usaco2004 Nov]Til the Cows Come Home 带奶牛回家 Time Limit: 1 Sec  Memory Limit: 128 MBSubmit ...

  6. 根据模板导出Excel报表并生成多个Sheet页

    因为最近用报表导出比较多,所有就提成了一个工具类,本工具类使用的场景为  根据提供的模板来导出Excel报表 并且可根据提供的模板Sheet页进行复制 从而实现多个Sheet页的需求, 使用本工具类时 ...

  7. linux 私房菜 CH7 Linux 档案与目录管理

    路径 ``` . 此层目录 .. 上一级目录 前一个工作目录 ~ 当前用户的家的目录 ``` 变换目录 cd 显示目录 pwd [-P] -P 显示出确实的路径,而非使用链接 (link) 路径. 创 ...

  8. Android Socket 遇到的Soure Not Find 错误

    参考: http://blog.csdn.net/brokge/article/details/8543145 http://blog.csdn.net/mad1989/article/details ...

  9. LINQ笔记

    LINQ概述 语言集成查询(Language intergrated Query,LINQ)在C#编程语言中集成了查询语法. 可以使用相同的语法访问不同的数据源 提供了不同数据源的抽象层,所有可以使用 ...

  10. 【转】JavaScript 之arguments、caller 和 callee 介绍

    1.前言 arguments, caller ,   callee 是什么? 在JavaScript 中有什么样的作用?本篇会对于此做一些基本介绍. 本文转载自:http://blog.csdn.ne ...