Druid 0.17入门(4)—— 数据查询方式大全
本文介绍Druid查询数据的方式,首先我们保证数据已经成功载入。
Druid查询基于HTTP,Druid提供了查询视图,并对结果进行了格式化。
Druid提供了三种查询方式,SQL,原生JSON,CURL。
一、SQL查询
我们用wiki的数据为例
查询10条最多的页面编辑
SELECT page, COUNT(*) AS Edits
FROM wikipedia
WHERE TIMESTAMP '2015-09-12 00:00:00' <= "__time" AND "__time" < TIMESTAMP '2015-09-13 00:00:00'
GROUP BY page
ORDER BY Edits DESC
LIMIT 10
我们在Query视图中操作
会有提示
选择Smart query limit会自动限制行数
Druid还提供了命令行查询sql 可以运行bin/dsql进行操作
Welcome to dsql, the command-line client for Druid SQL.
Type "\h" for help.
dsql>
提交sql
dsql> SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;
┌──────────────────────────────────────────────────────────┬───────┐
│ page │ Edits │
├──────────────────────────────────────────────────────────┼───────┤
│ Wikipedia:Vandalismusmeldung │ 33 │
│ User:Cyde/List of candidates for speedy deletion/Subpage │ 28 │
│ Jeremy Corbyn │ 27 │
│ Wikipedia:Administrators' noticeboard/Incidents │ 21 │
│ Flavia Pennetta │ 20 │
│ Total Drama Presents: The Ridonculous Race │ 18 │
│ User talk:Dudeperson176123 │ 18 │
│ Wikipédia:Le Bistro/12 septembre 2015 │ 18 │
│ Wikipedia:In the news/Candidates │ 17 │
│ Wikipedia:Requests for page protection │ 17 │
└──────────────────────────────────────────────────────────┴───────┘
Retrieved 10 rows in 0.06s.
还可以通过Http发送SQL
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-top-pages-sql.json http://localhost:8888/druid/v2/sql
可以得到如下结果
[
{
"page": "Wikipedia:Vandalismusmeldung",
"Edits": 33
},
{
"page": "User:Cyde/List of candidates for speedy deletion/Subpage",
"Edits": 28
},
{
"page": "Jeremy Corbyn",
"Edits": 27
},
{
"page": "Wikipedia:Administrators' noticeboard/Incidents",
"Edits": 21
},
{
"page": "Flavia Pennetta",
"Edits": 20
},
{
"page": "Total Drama Presents: The Ridonculous Race",
"Edits": 18
},
{
"page": "User talk:Dudeperson176123",
"Edits": 18
},
{
"page": "Wikipédia:Le Bistro/12 septembre 2015",
"Edits": 18
},
{
"page": "Wikipedia:In the news/Candidates",
"Edits": 17
},
{
"page": "Wikipedia:Requests for page protection",
"Edits": 17
}
]
更多SQL示例
时间查询
SELECT FLOOR(__time to HOUR) AS HourTime, SUM(deleted) AS LinesDeleted
FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00'
GROUP BY 1
分组查询
SELECT channel, page, SUM(added)
FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00'
GROUP BY channel, page
ORDER BY SUM(added) DESC
查询原始数据
SELECT user, page
FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 02:00:00' AND TIMESTAMP '2015-09-12 03:00:00'
LIMIT 5
定时查询
也可以在dsql里操作
dsql> EXPLAIN PLAN FOR SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;
│ DruidQueryRel(query=[{"queryType":"topN","dataSource":{"type":"table","name":"wikipedia"},"virtualColumns":[],"dimension":{"type":"default","dimension":"page","outputName":"d0","outputType":"STRING"},"metric":{"type":"numeric","metric":"a0"},"threshold":10,"intervals":{"type":"intervals","intervals":["2015-09-12T00:00:00.000Z/2015-09-13T00:00:00.001Z"]},"filter":null,"granularity":{"type":"all"},"aggregations":[{"type":"count","name":"a0"}],"postAggregations":[],"context":{},"descending":false}], signature=[{d0:STRING, a0:LONG}]) │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Retrieved 1 row in 0.03s.
二、原生JSON查询
Druid支持基于Json的查询
{
"queryType" : "topN",
"dataSource" : "wikipedia",
"intervals" : ["2015-09-12/2015-09-13"],
"granularity" : "all",
"dimension" : "page",
"metric" : "count",
"threshold" : 10,
"aggregations" : [
{
"type" : "count",
"name" : "count"
}
]
}
把json粘贴到json 查询模式窗口
Json查询是通过向router和broker发送请求
curl -X POST '<queryable_host>:<port>/druid/v2/?pretty' -H 'Content-Type:application/json' -H 'Accept:application/json' -d @<query_json_file>
Druid提供了丰富的查询方式
Aggregation查询
Timeseries查询
{
"queryType": "timeseries",
"dataSource": "sample_datasource",
"granularity": "day",
"descending": "true",
"filter": {
"type": "and",
"fields": [
{ "type": "selector", "dimension": "sample_dimension1", "value": "sample_value1" },
{ "type": "or",
"fields": [
{ "type": "selector", "dimension": "sample_dimension2", "value": "sample_value2" },
{ "type": "selector", "dimension": "sample_dimension3", "value": "sample_value3" }
]
}
]
},
"aggregations": [
{ "type": "longSum", "name": "sample_name1", "fieldName": "sample_fieldName1" },
{ "type": "doubleSum", "name": "sample_name2", "fieldName": "sample_fieldName2" }
],
"postAggregations": [
{ "type": "arithmetic",
"name": "sample_divide",
"fn": "/",
"fields": [
{ "type": "fieldAccess", "name": "postAgg__sample_name1", "fieldName": "sample_name1" },
{ "type": "fieldAccess", "name": "postAgg__sample_name2", "fieldName": "sample_name2" }
]
}
],
"intervals": [ "2012-01-01T00:00:00.000/2012-01-03T00:00:00.000" ]
}
TopN查询
{
"queryType": "topN",
"dataSource": "sample_data",
"dimension": "sample_dim",
"threshold": 5,
"metric": "count",
"granularity": "all",
"filter": {
"type": "and",
"fields": [
{
"type": "selector",
"dimension": "dim1",
"value": "some_value"
},
{
"type": "selector",
"dimension": "dim2",
"value": "some_other_val"
}
]
},
"aggregations": [
{
"type": "longSum",
"name": "count",
"fieldName": "count"
},
{
"type": "doubleSum",
"name": "some_metric",
"fieldName": "some_metric"
}
],
"postAggregations": [
{
"type": "arithmetic",
"name": "average",
"fn": "/",
"fields": [
{
"type": "fieldAccess",
"name": "some_metric",
"fieldName": "some_metric"
},
{
"type": "fieldAccess",
"name": "count",
"fieldName": "count"
}
]
}
],
"intervals": [
"2013-08-31T00:00:00.000/2013-09-03T00:00:00.000"
]
}
GroupBy查询
{
"queryType": "groupBy",
"dataSource": "sample_datasource",
"granularity": "day",
"dimensions": ["country", "device"],
"limitSpec": { "type": "default", "limit": 5000, "columns": ["country", "data_transfer"] },
"filter": {
"type": "and",
"fields": [
{ "type": "selector", "dimension": "carrier", "value": "AT&T" },
{ "type": "or",
"fields": [
{ "type": "selector", "dimension": "make", "value": "Apple" },
{ "type": "selector", "dimension": "make", "value": "Samsung" }
]
}
]
},
"aggregations": [
{ "type": "longSum", "name": "total_usage", "fieldName": "user_count" },
{ "type": "doubleSum", "name": "data_transfer", "fieldName": "data_transfer" }
],
"postAggregations": [
{ "type": "arithmetic",
"name": "avg_usage",
"fn": "/",
"fields": [
{ "type": "fieldAccess", "fieldName": "data_transfer" },
{ "type": "fieldAccess", "fieldName": "total_usage" }
]
}
],
"intervals": [ "2012-01-01T00:00:00.000/2012-01-03T00:00:00.000" ],
"having": {
"type": "greaterThan",
"aggregation": "total_usage",
"value": 100
}
}
Metadata查询
TimeBoundary 查询
{
"queryType" : "timeBoundary",
"dataSource": "sample_datasource",
"bound" : < "maxTime" | "minTime" > # optional, defaults to returning both timestamps if not set
"filter" : { "type": "and", "fields": [<filter>, <filter>, ...] } # optional
}
SegmentMetadata查询
{
"queryType":"segmentMetadata",
"dataSource":"sample_datasource",
"intervals":["2013-01-01/2014-01-01"]
}
DatasourceMetadata查询
{
"queryType" : "dataSourceMetadata",
"dataSource": "sample_datasource"
}
Search查询
{
"queryType": "search",
"dataSource": "sample_datasource",
"granularity": "day",
"searchDimensions": [
"dim1",
"dim2"
],
"query": {
"type": "insensitive_contains",
"value": "Ke"
},
"sort" : {
"type": "lexicographic"
},
"intervals": [
"2013-01-01T00:00:00.000/2013-01-03T00:00:00.000"
]
}
查询建议
用Timeseries和TopN替代GroupBy
取消查询
DELETE /druid/v2/{queryId}
curl -X DELETE "http://host:port/druid/v2/abc123"
查询失败
{
"error" : "Query timeout",
"errorMessage" : "Timeout waiting for task.",
"errorClass" : "java.util.concurrent.TimeoutException",
"host" : "druid1.example.com:8083"
}
三、CURL
基于Http的查询
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-top-pages.json http://localhost:8888/druid/v2?pretty
四、客户端查询
客户端查询是基于json的
具体查看 https://druid.apache.org/libraries.html
比如python查询的pydruid
from pydruid.client import *
from pylab import plt
query = PyDruid(druid_url_goes_here, 'druid/v2')
ts = query.timeseries(
datasource='twitterstream',
granularity='day',
intervals='2014-02-02/p4w',
aggregations={'length': doublesum('tweet_length'), 'count': doublesum('count')},
post_aggregations={'avg_tweet_length': (Field('length') / Field('count'))},
filter=Dimension('first_hashtag') == 'sochi2014'
)
df = query.export_pandas()
df['timestamp'] = df['timestamp'].map(lambda x: x.split('T')[0])
df.plot(x='timestamp', y='avg_tweet_length', ylim=(80, 140), rot=20,
title='Sochi 2014')
plt.ylabel('avg tweet length (chars)')
plt.show()
实时流式计算整理了Druid入门指南
持续更新中~
更多实时数据分析相关博文与科技资讯,欢迎关注 “实时流式计算”
获取《Druid实时大数据分析》电子书,请在公号后台回复 “Druid”
Druid 0.17入门(4)—— 数据查询方式大全的更多相关文章
- Druid 0.17 入门(3)—— 数据接入指南
在快速开始中,我们演示了接入本地示例数据方式,但Druid其实支持非常丰富的数据接入方式.比如批处理数据的接入和实时流数据的接入.本文我们将介绍这几种数据接入方式. 文件数据接入:从文件中加载批处理数 ...
- Druid 0.17 入门(2)—— 安装与部署
在Druid快速入门其实已经简单的介绍过最简化配置的单节点部署,本文我们将详细描述Druid的多种部署方式,对于测试开发环境可以选用轻量的单机部署方式,而生产环境我们最好选用集群部署的方式,确保系统的 ...
- 8种json数据查询方式
你有没有对“在复杂的JSON数据结构中查找匹配内容”而烦恼.这里有8种不同的方式可以做到: JsonSQL JsonSQL实现了使用SQL select语句在json数据结构中查询的功能. 例子: ? ...
- Dynamics CRM 2015/2016 Web API:新的数据查询方式
今天我们来看看Web API的数据查询功能,尽管之前介绍CRUD的文章里面提到过怎么去Read数据,可是并没有详细的去深究那些细节,今天我们就来详细看看吧.事实上呢,Web API的数据查询接口也是基 ...
- Django之ORM数据查询方式练习
单表查询 单表查询简单示例 # 字段 models.DateField(auto_now_add) models.DateField(auto_now) # auto_now 和auto_now_ad ...
- XML教程、语法手册、数据读取方式大全
XML简单易懂教程 本文提供全流程,中文翻译.Chinar坚持将简单的生活方式,带给世人!(拥有更好的阅读体验 -- 高分辨率用户请根据需求调整网页缩放比例) 一 XML --数据格式的写法 二 Re ...
- [转帖]Druid介绍及入门
Druid介绍及入门 2018-09-19 19:38:36 拿着核武器的程序员 阅读数 22552更多 分类专栏: Druid 版权声明:本文为博主原创文章,遵循CC 4.0 BY-SA版权协议 ...
- Hibernate 查询方式(HQL/QBC/QBE)汇总
作为老牌的 ORM 框架,Hibernate 在推动数据库持久化层所做出的贡献有目共睹. 它所提供的数据查询方式也越来越丰富,从 SQL 到自创的 HQL,再到面向对象的标准化查询. 虽然查询方式有点 ...
- hibernate框架学习之数据查询(HQL)
lHibernate共提供5种查询方式 •OID数据查询方式 •HQL数据查询方式 •QBC数据查询方式 •本地SQL查询方式 •OGN数据查询方式 OID数据查询方式 l前提:已经获取到了对象的OI ...
随机推荐
- 通过dockerfile制作镜像
Dockerfile是一个用于构建Docker镜像的文本文件,其中包含了创建Docker镜像的全部指令.就是将我们安装环境的每个步骤使用指令的形式存放在一个文件中,最后生成一个需要的环境. Docke ...
- css特效sh
1 opacity=0.5: 透明度 2 选择器 .btn1:ho ...
- 使用 selenium 实现谷歌以图搜图爬虫
使用selenium实现谷歌以图搜图 实现思路 原理非常简单,就是利用selenium去操作浏览器,获取到想要的链接,然后进行图片的下载,和一般的爬虫无异. 用到的技术:multiprocessing ...
- cmd 文件/文件夹的一切操作
dir // 列出目录下所有文件夹 rd dirname // 删除dirname文件夹(空文件夹) rd /s/q dirname // 删除dirname文件夹(非空)
- jarvisoj MISC 取证2
打开之后一个文件和一个镜像 TrueCrypt....记住他了,再看一眼那个文件,好的,TrueCrypt加密..找密码 把Truecrypt.exe直接dump下来,用efdd解密就行了
- iview使用之怎样给Page组件添加跳转按钮
在项目开发过程中,我们会经常遇到使用分页的表格,然而在ivieiw中,我们通常只能使用Page组件自带的功能,如下图: 切换每页条数这些基本的功能都不说了,有时候我们需要在输入框里输入想要跳转到的页数 ...
- 利用 PhpQuery 随机爬取妹子图
前言 运行下面的代码会随机得到妹子图的一张图片,代码中的phpQuery可以在这里下载:phpQuery-0.9.5.386.zip <?php require 'phpQuery.php'; ...
- c++指定输出小数的精度
在c++中,有的时候要对输出的double型或float型保留几位小数,这时可以使用setflags(ios::fixed),不过要先包含有文件<iomainp>,具体如下 例: #inc ...
- Auth认证中的think_auth_rule type字段干嘛用的?
昨晚认真研究了一下这个类,设计的很巧妙,但是你说的这个字段,我认为应该是作者想加功能但还没写,在session判断的地方可以看到,type这个字段实际是对应的 1-实时验证,2登陆验证 ,显然,这个字 ...
- linux下批量删除文件
1. 在linux批量删除多级目录下同一格式的文件,可采用find + exec命令组合: 如在删除old目录下的,所有子目录中,后缀为.l的文件方法为: find old -type f -name ...