本文介绍Druid查询数据的方式,首先我们保证数据已经成功载入。

Druid查询基于HTTP,Druid提供了查询视图,并对结果进行了格式化。

Druid提供了三种查询方式,SQL,原生JSON,CURL。

一、SQL查询

我们用wiki的数据为例

查询10条最多的页面编辑

  1. SELECT page, COUNT(*) AS Edits
  2. FROM wikipedia
  3. WHERE TIMESTAMP '2015-09-12 00:00:00' <= "__time" AND "__time" < TIMESTAMP '2015-09-13 00:00:00'
  4. GROUP BY page
  5. ORDER BY Edits DESC
  6. LIMIT 10

我们在Query视图中操作

会有提示

选择Smart query limit会自动限制行数

Druid还提供了命令行查询sql 可以运行bin/dsql进行操作

  1. Welcome to dsql, the command-line client for Druid SQL.
  2. Type "\h" for help.
  3. dsql>

提交sql

  1. dsql> SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;
  2. ┌──────────────────────────────────────────────────────────┬───────┐
  3. page Edits
  4. ├──────────────────────────────────────────────────────────┼───────┤
  5. Wikipedia:Vandalismusmeldung 33
  6. User:Cyde/List of candidates for speedy deletion/Subpage 28
  7. Jeremy Corbyn 27
  8. Wikipedia:Administrators' noticeboard/Incidents │ 21 │
  9. │ Flavia Pennetta │ 20 │
  10. │ Total Drama Presents: The Ridonculous Race │ 18 │
  11. │ User talk:Dudeperson176123 │ 18 │
  12. │ Wikipédia:Le Bistro/12 septembre 2015 │ 18 │
  13. │ Wikipedia:In the news/Candidates │ 17 │
  14. │ Wikipedia:Requests for page protection │ 17 │
  15. └──────────────────────────────────────────────────────────┴───────┘
  16. Retrieved 10 rows in 0.06s.

还可以通过Http发送SQL

  1. curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-top-pages-sql.json http://localhost:8888/druid/v2/sql

可以得到如下结果

  1. [
  2. {
  3. "page": "Wikipedia:Vandalismusmeldung",
  4. "Edits": 33
  5. },
  6. {
  7. "page": "User:Cyde/List of candidates for speedy deletion/Subpage",
  8. "Edits": 28
  9. },
  10. {
  11. "page": "Jeremy Corbyn",
  12. "Edits": 27
  13. },
  14. {
  15. "page": "Wikipedia:Administrators' noticeboard/Incidents",
  16. "Edits": 21
  17. },
  18. {
  19. "page": "Flavia Pennetta",
  20. "Edits": 20
  21. },
  22. {
  23. "page": "Total Drama Presents: The Ridonculous Race",
  24. "Edits": 18
  25. },
  26. {
  27. "page": "User talk:Dudeperson176123",
  28. "Edits": 18
  29. },
  30. {
  31. "page": "Wikipédia:Le Bistro/12 septembre 2015",
  32. "Edits": 18
  33. },
  34. {
  35. "page": "Wikipedia:In the news/Candidates",
  36. "Edits": 17
  37. },
  38. {
  39. "page": "Wikipedia:Requests for page protection",
  40. "Edits": 17
  41. }
  42. ]

更多SQL示例

时间查询

  1. SELECT FLOOR(__time to HOUR) AS HourTime, SUM(deleted) AS LinesDeleted
  2. FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00'
  3. GROUP BY 1

分组查询

  1. SELECT channel, page, SUM(added)
  2. FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00'
  3. GROUP BY channel, page
  4. ORDER BY SUM(added) DESC

查询原始数据

  1. SELECT user, page
  2. FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 02:00:00' AND TIMESTAMP '2015-09-12 03:00:00'
  3. LIMIT 5

定时查询

也可以在dsql里操作

  1. dsql> EXPLAIN PLAN FOR SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;
  2. DruidQueryRel(query=[{"queryType":"topN","dataSource":{"type":"table","name":"wikipedia"},"virtualColumns":[],"dimension":{"type":"default","dimension":"page","outputName":"d0","outputType":"STRING"},"metric":{"type":"numeric","metric":"a0"},"threshold":10,"intervals":{"type":"intervals","intervals":["2015-09-12T00:00:00.000Z/2015-09-13T00:00:00.001Z"]},"filter":null,"granularity":{"type":"all"},"aggregations":[{"type":"count","name":"a0"}],"postAggregations":[],"context":{},"descending":false}], signature=[{d0:STRING, a0:LONG}])
  3. └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
  4. Retrieved 1 row in 0.03s.

二、原生JSON查询

Druid支持基于Json的查询

  1. {
  2. "queryType" : "topN",
  3. "dataSource" : "wikipedia",
  4. "intervals" : ["2015-09-12/2015-09-13"],
  5. "granularity" : "all",
  6. "dimension" : "page",
  7. "metric" : "count",
  8. "threshold" : 10,
  9. "aggregations" : [
  10. {
  11. "type" : "count",
  12. "name" : "count"
  13. }
  14. ]
  15. }

把json粘贴到json 查询模式窗口

Json查询是通过向router和broker发送请求

  1. curl -X POST '<queryable_host>:<port>/druid/v2/?pretty' -H 'Content-Type:application/json' -H 'Accept:application/json' -d @<query_json_file>

Druid提供了丰富的查询方式

Aggregation查询

Timeseries查询
  1. {
  2. "queryType": "timeseries",
  3. "dataSource": "sample_datasource",
  4. "granularity": "day",
  5. "descending": "true",
  6. "filter": {
  7. "type": "and",
  8. "fields": [
  9. { "type": "selector", "dimension": "sample_dimension1", "value": "sample_value1" },
  10. { "type": "or",
  11. "fields": [
  12. { "type": "selector", "dimension": "sample_dimension2", "value": "sample_value2" },
  13. { "type": "selector", "dimension": "sample_dimension3", "value": "sample_value3" }
  14. ]
  15. }
  16. ]
  17. },
  18. "aggregations": [
  19. { "type": "longSum", "name": "sample_name1", "fieldName": "sample_fieldName1" },
  20. { "type": "doubleSum", "name": "sample_name2", "fieldName": "sample_fieldName2" }
  21. ],
  22. "postAggregations": [
  23. { "type": "arithmetic",
  24. "name": "sample_divide",
  25. "fn": "/",
  26. "fields": [
  27. { "type": "fieldAccess", "name": "postAgg__sample_name1", "fieldName": "sample_name1" },
  28. { "type": "fieldAccess", "name": "postAgg__sample_name2", "fieldName": "sample_name2" }
  29. ]
  30. }
  31. ],
  32. "intervals": [ "2012-01-01T00:00:00.000/2012-01-03T00:00:00.000" ]
  33. }
TopN查询
  1. {
  2. "queryType": "topN",
  3. "dataSource": "sample_data",
  4. "dimension": "sample_dim",
  5. "threshold": 5,
  6. "metric": "count",
  7. "granularity": "all",
  8. "filter": {
  9. "type": "and",
  10. "fields": [
  11. {
  12. "type": "selector",
  13. "dimension": "dim1",
  14. "value": "some_value"
  15. },
  16. {
  17. "type": "selector",
  18. "dimension": "dim2",
  19. "value": "some_other_val"
  20. }
  21. ]
  22. },
  23. "aggregations": [
  24. {
  25. "type": "longSum",
  26. "name": "count",
  27. "fieldName": "count"
  28. },
  29. {
  30. "type": "doubleSum",
  31. "name": "some_metric",
  32. "fieldName": "some_metric"
  33. }
  34. ],
  35. "postAggregations": [
  36. {
  37. "type": "arithmetic",
  38. "name": "average",
  39. "fn": "/",
  40. "fields": [
  41. {
  42. "type": "fieldAccess",
  43. "name": "some_metric",
  44. "fieldName": "some_metric"
  45. },
  46. {
  47. "type": "fieldAccess",
  48. "name": "count",
  49. "fieldName": "count"
  50. }
  51. ]
  52. }
  53. ],
  54. "intervals": [
  55. "2013-08-31T00:00:00.000/2013-09-03T00:00:00.000"
  56. ]
  57. }
GroupBy查询
  1. {
  2. "queryType": "groupBy",
  3. "dataSource": "sample_datasource",
  4. "granularity": "day",
  5. "dimensions": ["country", "device"],
  6. "limitSpec": { "type": "default", "limit": 5000, "columns": ["country", "data_transfer"] },
  7. "filter": {
  8. "type": "and",
  9. "fields": [
  10. { "type": "selector", "dimension": "carrier", "value": "AT&T" },
  11. { "type": "or",
  12. "fields": [
  13. { "type": "selector", "dimension": "make", "value": "Apple" },
  14. { "type": "selector", "dimension": "make", "value": "Samsung" }
  15. ]
  16. }
  17. ]
  18. },
  19. "aggregations": [
  20. { "type": "longSum", "name": "total_usage", "fieldName": "user_count" },
  21. { "type": "doubleSum", "name": "data_transfer", "fieldName": "data_transfer" }
  22. ],
  23. "postAggregations": [
  24. { "type": "arithmetic",
  25. "name": "avg_usage",
  26. "fn": "/",
  27. "fields": [
  28. { "type": "fieldAccess", "fieldName": "data_transfer" },
  29. { "type": "fieldAccess", "fieldName": "total_usage" }
  30. ]
  31. }
  32. ],
  33. "intervals": [ "2012-01-01T00:00:00.000/2012-01-03T00:00:00.000" ],
  34. "having": {
  35. "type": "greaterThan",
  36. "aggregation": "total_usage",
  37. "value": 100
  38. }
  39. }

Metadata查询

TimeBoundary 查询
  1. {
  2. "queryType" : "timeBoundary",
  3. "dataSource": "sample_datasource",
  4. "bound" : < "maxTime" | "minTime" > # optional, defaults to returning both timestamps if not set
  5. "filter" : { "type": "and", "fields": [<filter>, <filter>, ...] } # optional
  6. }
SegmentMetadata查询
  1. {
  2. "queryType":"segmentMetadata",
  3. "dataSource":"sample_datasource",
  4. "intervals":["2013-01-01/2014-01-01"]
  5. }
DatasourceMetadata查询
  1. {
  2. "queryType" : "dataSourceMetadata",
  3. "dataSource": "sample_datasource"
  4. }

Search查询

  1. {
  2. "queryType": "search",
  3. "dataSource": "sample_datasource",
  4. "granularity": "day",
  5. "searchDimensions": [
  6. "dim1",
  7. "dim2"
  8. ],
  9. "query": {
  10. "type": "insensitive_contains",
  11. "value": "Ke"
  12. },
  13. "sort" : {
  14. "type": "lexicographic"
  15. },
  16. "intervals": [
  17. "2013-01-01T00:00:00.000/2013-01-03T00:00:00.000"
  18. ]
  19. }

查询建议

用Timeseries和TopN替代GroupBy

取消查询

  1. DELETE /druid/v2/{queryId}
  1. curl -X DELETE "http://host:port/druid/v2/abc123"

查询失败

  1. {
  2. "error" : "Query timeout",
  3. "errorMessage" : "Timeout waiting for task.",
  4. "errorClass" : "java.util.concurrent.TimeoutException",
  5. "host" : "druid1.example.com:8083"
  6. }

三、CURL

基于Http的查询

  1. curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-top-pages.json http://localhost:8888/druid/v2?pretty

四、客户端查询

客户端查询是基于json的

具体查看 https://druid.apache.org/libraries.html

比如python查询的pydruid

  1. from pydruid.client import *
  2. from pylab import plt
  3. query = PyDruid(druid_url_goes_here, 'druid/v2')
  4. ts = query.timeseries(
  5. datasource='twitterstream',
  6. granularity='day',
  7. intervals='2014-02-02/p4w',
  8. aggregations={'length': doublesum('tweet_length'), 'count': doublesum('count')},
  9. post_aggregations={'avg_tweet_length': (Field('length') / Field('count'))},
  10. filter=Dimension('first_hashtag') == 'sochi2014'
  11. )
  12. df = query.export_pandas()
  13. df['timestamp'] = df['timestamp'].map(lambda x: x.split('T')[0])
  14. df.plot(x='timestamp', y='avg_tweet_length', ylim=(80, 140), rot=20,
  15. title='Sochi 2014')
  16. plt.ylabel('avg tweet length (chars)')
  17. plt.show()

实时流式计算整理了Druid入门指南

持续更新中~

更多实时数据分析相关博文与科技资讯,欢迎关注 “实时流式计算”

获取《Druid实时大数据分析》电子书,请在公号后台回复 “Druid”

Druid 0.17入门(4)—— 数据查询方式大全的更多相关文章

  1. Druid 0.17 入门(3)—— 数据接入指南

    在快速开始中,我们演示了接入本地示例数据方式,但Druid其实支持非常丰富的数据接入方式.比如批处理数据的接入和实时流数据的接入.本文我们将介绍这几种数据接入方式. 文件数据接入:从文件中加载批处理数 ...

  2. Druid 0.17 入门(2)—— 安装与部署

    在Druid快速入门其实已经简单的介绍过最简化配置的单节点部署,本文我们将详细描述Druid的多种部署方式,对于测试开发环境可以选用轻量的单机部署方式,而生产环境我们最好选用集群部署的方式,确保系统的 ...

  3. 8种json数据查询方式

    你有没有对“在复杂的JSON数据结构中查找匹配内容”而烦恼.这里有8种不同的方式可以做到: JsonSQL JsonSQL实现了使用SQL select语句在json数据结构中查询的功能. 例子: ? ...

  4. Dynamics CRM 2015/2016 Web API:新的数据查询方式

    今天我们来看看Web API的数据查询功能,尽管之前介绍CRUD的文章里面提到过怎么去Read数据,可是并没有详细的去深究那些细节,今天我们就来详细看看吧.事实上呢,Web API的数据查询接口也是基 ...

  5. Django之ORM数据查询方式练习

    单表查询 单表查询简单示例 # 字段 models.DateField(auto_now_add) models.DateField(auto_now) # auto_now 和auto_now_ad ...

  6. XML教程、语法手册、数据读取方式大全

    XML简单易懂教程 本文提供全流程,中文翻译.Chinar坚持将简单的生活方式,带给世人!(拥有更好的阅读体验 -- 高分辨率用户请根据需求调整网页缩放比例) 一 XML --数据格式的写法 二 Re ...

  7. [转帖]Druid介绍及入门

    Druid介绍及入门 2018-09-19 19:38:36 拿着核武器的程序员 阅读数 22552更多 分类专栏: Druid   版权声明:本文为博主原创文章,遵循CC 4.0 BY-SA版权协议 ...

  8. Hibernate 查询方式(HQL/QBC/QBE)汇总

    作为老牌的 ORM 框架,Hibernate 在推动数据库持久化层所做出的贡献有目共睹. 它所提供的数据查询方式也越来越丰富,从 SQL 到自创的 HQL,再到面向对象的标准化查询. 虽然查询方式有点 ...

  9. hibernate框架学习之数据查询(HQL)

    lHibernate共提供5种查询方式 •OID数据查询方式 •HQL数据查询方式 •QBC数据查询方式 •本地SQL查询方式 •OGN数据查询方式 OID数据查询方式 l前提:已经获取到了对象的OI ...

随机推荐

  1. Matlab学习-(3)

    1. 二维图 绘制完图形以后,可能还需要对图形进行一些辅助操作,以使图形意义更加明确,可读性更强. 1.1 图形标注 title(’图形名称’) (都放在单引号内)xlabel(’x轴说明’)ylab ...

  2. Unity ML-agents 一、初次尝试

    前言 曾在高二寒假的时候,跟表哥在外面玩,当时他问我有没有想过以后要做什么,我愣了一下,回答不上来.是的,从没想过以后要做什么,只是一直在完成学校.老师安排的任务,于是那之后半年,我一直在思考,大学要 ...

  3. 实体识别中,或序列标注任务中的维特比Viterbi解码

    看懂这个算法,首先要了解序列标注任务     QQ522414928 可以在线交流 大体做一个解释,首先需要4个矩阵,当然这些矩阵是取完np.log后的结果, 分别是:初始strat→第一个字符状态的 ...

  4. for嵌套setTimeout的心得

    export default { data() { return { dialogList: [] } }, created() { this.setList() }, methods: { setL ...

  5. Image Filter and Recover

    这是CS50的第四次大作业,顺便学习了图像的入门知识. 基础 黑白图(bitmap)的每个像素点只能取值0/1,1代表白色,0代表黑色. 常见的图片格式有JPEG/PNG/BMP,这些格式都支持RGB ...

  6. 曹工谈并发:Synchronized升级为重量级锁后,靠什么 API 来阻塞自己

    背景 因为想知道java中的关键字,对应的操作系统级别的api是啥,本来打算整理几个我知道的出来,但是,尴尬的是,我发现java里最重要的synchronized关键字,我就不知道它对应的api是什么 ...

  7. Centos 搭建wordpress个人博客

    1.装apache.mariadb yum install httpd mariadb-server php php-mysql -ysystemctl start httpdsystemctl en ...

  8. 9) drf JWT 认证 签发与校验token 多方式登陆 自定义认证规则反爬 admin密文显示

    一 .认证方法比较 1.认证规则图 django 前后端不分离 csrf认证 drf 前后端分离 禁用csrf 2. 认证规则演变图 数据库session认证:低效 缓存认证:高效 jwt认证:高效 ...

  9. MySQL 中 on与where筛选条件的区别

    在两张表连接的时候才会有on的筛选条件,那么on和where的区别是什么呢? 在inner join中是没有区别的,但是在左连接和右连接中,区别就体现出来了,下面以左连接为例: 1.用on的时候,只对 ...

  10. 题目分享Q

    题意:给出一个N个点的树,找出一个点来,以这个点为根的树时,所有点的深度之和最大 分析:这可以说是换根法的裸题吧 首先考虑对一个给定的根如何计算,这应该是最简单的那种树形dp吧甚至可能都不算dp(好像 ...