ElasticSearch6.x版本聚合分析整理

ES将聚合分析主要分为如下4类

  1. Bucket,分桶类型,类似SQL中的GROUP BY语法
  2. Metric,指标分析类型,如计算最大值 , 最小值,平均值等
  3. Pipeline,管道分析类型,基于上一级的聚合分析结果进行再分析
  4. Matrix,矩阵分析类型

Metric聚合分析

主要分如下两类:

1.单值分析,只输出一个分析结果

​ min,max,avg,sum

​ cardinality

2.多值分析,输出多个分析结果

​ stats,extended stats

​ percentile,percentile rank

​ top hits

需要使用到的数据:

  1. POST test_search_index/doc/_bulk
  2. {"index":{"_id":"1"}}
  3. {"username":"alfred way","job":"java engineer","age":18,"birth":"1990-01-02","isMarried":false,"salary":10000}
  4. {"index":{"_id":"2"}}
  5. {"username":"tom","job":"java senior engineer","age":28,"birth":"1980-05-07","isMarried":true,"salary":30000}
  6. {"index":{"_id":"3"}}
  7. {"username":"lee","job":"ruby engineer","age":22,"birth":"1985-08-07","isMarried":false,"salary":15000}
  8. {"index":{"_id":"4"}}
  9. {"username":"Nick","job":"web engineer","age":23,"birth":"1989-08-07","isMarried":false,"salary":8000}
  10. {"index":{"_id":"5"}}
  11. {"username":"Niko","job":"web engineer","age":18,"birth":"1994-08-07","isMarried":false,"salary":5000}
  12. {"index":{"_id":"6"}}
  13. {"username":"Michell","job":"ruby engineer","age":26,"birth":"1987-08-07","isMarried":false,"salary":12000}
Metric聚合分析

返回数值类字段的平均值

  1. GET test_search_index/_search
  2. {
  3. #不需要返回文档列表
  4. "size":0,
  5. "aggs":{
  6. #名字
  7. "min_age":{
  8. #关键词
  9. "min":{
  10. "field":"age"
  11. }
  12. }
  13. }
  14. }
  15. #返回结果:
  16. {
  17. "took": 5,
  18. "timed_out": false,
  19. "_shards": {
  20. "total": 5,
  21. "successful": 5,
  22. "skipped": 0,
  23. "failed": 0
  24. },
  25. "hits": {
  26. "total": 6,
  27. "max_score": 0,
  28. "hits": []
  29. },
  30. "aggregations": {
  31. "min_age": {
  32. "value": 18
  33. }
  34. }
  35. }
  36. #返回数值类字段的最大值
  37. GET test_search_index/_search
  38. {
  39. "size":0,
  40. "aggs":{
  41. "max_age":{
  42. "max":{
  43. "field":"age"
  44. }
  45. }
  46. }
  47. }
  48. #返回结果
  49. {
  50. "took": 7,
  51. "timed_out": false,
  52. "_shards": {
  53. "total": 5,
  54. "successful": 5,
  55. "skipped": 0,
  56. "failed": 0
  57. },
  58. "hits": {
  59. "total": 6,
  60. "max_score": 0,
  61. "hits": []
  62. },
  63. "aggregations": {
  64. "max_age": {
  65. "value": 28
  66. }
  67. }
  68. }
  69. #返回数值类字段的平均值
  70. GET test_search_index/_search
  71. {
  72. "size":0,
  73. "aggs":{
  74. "avg_age":{
  75. "avg":{
  76. "field":"age"
  77. }
  78. }
  79. }
  80. }
  81. #返回结果
  82. {
  83. "took": 2,
  84. "timed_out": false,
  85. "_shards": {
  86. "total": 5,
  87. "successful": 5,
  88. "skipped": 0,
  89. "failed": 0
  90. },
  91. "hits": {
  92. "total": 6,
  93. "max_score": 0,
  94. "hits": []
  95. },
  96. "aggregations": {
  97. "avg_age": {
  98. "value": 22.5
  99. }
  100. }
  101. }
  102. #返回数值字段的总和
  103. GET test_search_index/_search
  104. {
  105. "size":0,
  106. "aggs":{
  107. "sum_age":{
  108. "sum":{
  109. "field":"age"
  110. }
  111. }
  112. }
  113. }
  114. {
  115. "took": 3,
  116. "timed_out": false,
  117. "_shards": {
  118. "total": 5,
  119. "successful": 5,
  120. "skipped": 0,
  121. "failed": 0
  122. },
  123. "hits": {
  124. "total": 6,
  125. "max_score": 0,
  126. "hits": []
  127. },
  128. "aggregations": {
  129. "sum_age": {
  130. "value": 135
  131. }
  132. }
  133. }
  134. #一次返回多个结果
  135. {
  136. "size":0,
  137. "aggs":{
  138. "min_age":{
  139. "min":{
  140. "field":"age"
  141. }
  142. },
  143. "max_age":{
  144. "max":{
  145. "field":"age"
  146. }
  147. },
  148. "sum_age":{
  149. "sum":{
  150. "field":"age"
  151. }
  152. }
  153. }
  154. }
  155. #返回结果
  156. {
  157. "took": 2,
  158. "timed_out": false,
  159. "_shards": {
  160. "total": 5,
  161. "successful": 5,
  162. "skipped": 0,
  163. "failed": 0
  164. },
  165. "hits": {
  166. "total": 6,
  167. "max_score": 0,
  168. "hits": []
  169. },
  170. "aggregations": {
  171. "max_age": {
  172. "value": 28
  173. },
  174. "sum_age": {
  175. "value": 135
  176. },
  177. "min_age": {
  178. "value": 18
  179. }
  180. }
  181. }

Metric聚合分析--Cardinality

Cardinality,意为集合的势,或者基数,是指不同数值的个数,类似SQL中的distinct count概念

  1. GET test_search_index/_search
  2. {
  3. "size":10,
  4. "aggs":{
  5. "count_of_job":{
  6. "cardinality":{
  7. "field":"job.keyword"
  8. }
  9. }
  10. }
  11. }
  12. #返回
  13. {
  14. "took": 4,
  15. "timed_out": false,
  16. "_shards": {
  17. "total": 5,
  18. "successful": 5,
  19. "skipped": 0,
  20. "failed": 0
  21. },
  22. "hits": {
  23. "total": 6,
  24. "max_score": 1,
  25. "hits": [
  26. {
  27. "_index": "test_search_index",
  28. "_type": "doc",
  29. "_id": "5",
  30. "_score": 1,
  31. "_source": {
  32. "username": "Niko",
  33. "job": "web engineer",
  34. "age": 18,
  35. "birth": "1994-08-07",
  36. "isMarried": false,
  37. "salary": 5000
  38. }
  39. },
  40. {
  41. "_index": "test_search_index",
  42. "_type": "doc",
  43. "_id": "2",
  44. "_score": 1,
  45. "_source": {
  46. "username": "tom",
  47. "job": "java senior engineer",
  48. "age": 28,
  49. "birth": "1980-05-07",
  50. "isMarried": true,
  51. "salary": 30000
  52. }
  53. },
  54. {
  55. "_index": "test_search_index",
  56. "_type": "doc",
  57. "_id": "4",
  58. "_score": 1,
  59. "_source": {
  60. "username": "Nick",
  61. "job": "web engineer",
  62. "age": 23,
  63. "birth": "1989-08-07",
  64. "isMarried": false,
  65. "salary": 8000
  66. }
  67. },
  68. {
  69. "_index": "test_search_index",
  70. "_type": "doc",
  71. "_id": "6",
  72. "_score": 1,
  73. "_source": {
  74. "username": "Michell",
  75. "job": "ruby engineer",
  76. "age": 26,
  77. "birth": "1987-08-07",
  78. "isMarried": false,
  79. "salary": 12000
  80. }
  81. },
  82. {
  83. "_index": "test_search_index",
  84. "_type": "doc",
  85. "_id": "1",
  86. "_score": 1,
  87. "_source": {
  88. "username": "alfred way",
  89. "job": "java engineer",
  90. "age": 18,
  91. "birth": "1990-01-02",
  92. "isMarried": false,
  93. "salary": 10000
  94. }
  95. },
  96. {
  97. "_index": "test_search_index",
  98. "_type": "doc",
  99. "_id": "3",
  100. "_score": 1,
  101. "_source": {
  102. "username": "lee",
  103. "job": "ruby engineer",
  104. "age": 22,
  105. "birth": "1985-08-07",
  106. "isMarried": false,
  107. "salary": 15000
  108. }
  109. }
  110. ]
  111. },
  112. "aggregations": {
  113. "count_of_job": {
  114. "value": 4
  115. }
  116. }
  117. }

Metric聚合分析-Stats

返回一系列数值类型的统计值,包含min,max,avg,sum和count

  1. GET test_search_index/_search
  2. {
  3. "size":0,
  4. "aggs":{
  5. "stats_age":{
  6. "stats":{
  7. "field":"age"
  8. }
  9. }
  10. }
  11. }
  12. #返回
  13. {
  14. "took": 1,
  15. "timed_out": false,
  16. "_shards": {
  17. "total": 5,
  18. "successful": 5,
  19. "skipped": 0,
  20. "failed": 0
  21. },
  22. "hits": {
  23. "total": 6,
  24. "max_score": 0,
  25. "hits": []
  26. },
  27. "aggregations": {
  28. "stats_age": {
  29. "count": 6,
  30. "min": 18,
  31. "max": 28,
  32. "avg": 22.5,
  33. "sum": 135
  34. }
  35. }
  36. }

Metric聚合分析-Extended Stats

对stats的扩展,包含了更多的统计数据,如方差,标准差等

  1. GET test_search_index/_search
  2. {
  3. "size":0,
  4. "aggs":{
  5. "stats_age":{
  6. "extended_stats":{
  7. "field":"age"
  8. }
  9. }
  10. }
  11. }
  12. #返回
  13. {
  14. "took": 2,
  15. "timed_out": false,
  16. "_shards": {
  17. "total": 5,
  18. "successful": 5,
  19. "skipped": 0,
  20. "failed": 0
  21. },
  22. "hits": {
  23. "total": 6,
  24. "max_score": 0,
  25. "hits": []
  26. },
  27. "aggregations": {
  28. "stats_age": {
  29. "count": 6,
  30. "min": 18,
  31. "max": 28,
  32. "avg": 22.5,
  33. "sum": 135,
  34. "sum_of_squares": 3121,
  35. "variance": 13.916666666666666,
  36. "std_deviation": 3.730504880933232,
  37. "std_deviation_bounds": {
  38. "upper": 29.961009761866464,
  39. "lower": 15.038990238133536
  40. }
  41. }
  42. }
  43. }

Metric聚合分析-Percentile

百分位数统计

  1. GET test_search_index/_search
  2. {
  3. "size":0,
  4. "aggs":{
  5. "per_age":{
  6. "percentiles":{
  7. "field":"salary"
  8. }
  9. }
  10. }
  11. }
  12. #返回
  13. {
  14. "took": 6,
  15. "timed_out": false,
  16. "_shards": {
  17. "total": 5,
  18. "successful": 5,
  19. "skipped": 0,
  20. "failed": 0
  21. },
  22. "hits": {
  23. "total": 6,
  24. "max_score": 0,
  25. "hits": []
  26. },
  27. "aggregations": {
  28. "per_age": {
  29. #代表有百分之一的人工资在5000以下,百分之二十五的人工资在8000以下....
  30. "values": {
  31. "1.0": 5000,
  32. "5.0": 5000,
  33. "25.0": 8000,
  34. "50.0": 11000,
  35. "75.0": 15000,
  36. "95.0": 30000,
  37. "99.0": 30000
  38. }
  39. }
  40. }
  41. }
  42. GET test_search_index/_search
  43. {
  44. "size":0,
  45. "aggs":{
  46. "per_age":{
  47. "percentile_ranks":{
  48. "field":"salary",
  49. "values":[
  50. 11000,
  51. 30000
  52. ]
  53. }
  54. }
  55. }
  56. }
  57. #返回
  58. {
  59. "took": 2,
  60. "timed_out": false,
  61. "_shards": {
  62. "total": 5,
  63. "successful": 5,
  64. "skipped": 0,
  65. "failed": 0
  66. },
  67. "hits": {
  68. "total": 6,
  69. "max_score": 0,
  70. "hits": []
  71. },
  72. "aggregations": {
  73. "per_age": {
  74. "values": {
  75. "11000.0": 50,
  76. "30000.0": 100
  77. }
  78. }
  79. }
  80. }

Metric聚合分析-Top Hits

一般用于分桶后获取该桶内最匹配的顶部文档列表,即详情数据

  1. #先按照job分桶, 然后在桶内做年龄的排序
  2. GET test_search_index/_search
  3. {
  4. "size":0,
  5. "aggs":{
  6. "jobs":{
  7. "terms":{
  8. "field":"job.keyword",
  9. "size":10
  10. },
  11. "aggs":{
  12. "top_employee":{
  13. "top_hits":{
  14. "size":10,
  15. "sort":[
  16. {
  17. "age":{
  18. "order":"desc"
  19. }
  20. }
  21. ]
  22. }
  23. }
  24. }
  25. }
  26. }
  27. }
  28. #返回
  29. {
  30. "took": 42,
  31. "timed_out": false,
  32. "_shards": {
  33. "total": 5,
  34. "successful": 5,
  35. "skipped": 0,
  36. "failed": 0
  37. },
  38. "hits": {
  39. "total": 6,
  40. "max_score": 0,
  41. "hits": []
  42. },
  43. "aggregations": {
  44. "jobs": {
  45. "doc_count_error_upper_bound": 0,
  46. "sum_other_doc_count": 0,
  47. "buckets": [
  48. {
  49. "key": "ruby engineer",
  50. "doc_count": 2,
  51. "top_employee": {
  52. "hits": {
  53. "total": 2,
  54. "max_score": null,
  55. "hits": [
  56. {
  57. "_index": "test_search_index",
  58. "_type": "doc",
  59. "_id": "6",
  60. "_score": null,
  61. "_source": {
  62. "username": "Michell",
  63. "job": "ruby engineer",
  64. "age": 26,
  65. "birth": "1987-08-07",
  66. "isMarried": false,
  67. "salary": 12000
  68. },
  69. "sort": [
  70. 26
  71. ]
  72. },
  73. {
  74. "_index": "test_search_index",
  75. "_type": "doc",
  76. "_id": "3",
  77. "_score": null,
  78. "_source": {
  79. "username": "lee",
  80. "job": "ruby engineer",
  81. "age": 22,
  82. "birth": "1985-08-07",
  83. "isMarried": false,
  84. "salary": 15000
  85. },
  86. "sort": [
  87. 22
  88. ]
  89. }
  90. ]
  91. }
  92. }
  93. },
  94. {
  95. "key": "web engineer",
  96. "doc_count": 2,
  97. "top_employee": {
  98. "hits": {
  99. "total": 2,
  100. "max_score": null,
  101. "hits": [
  102. {
  103. "_index": "test_search_index",
  104. "_type": "doc",
  105. "_id": "4",
  106. "_score": null,
  107. "_source": {
  108. "username": "Nick",
  109. "job": "web engineer",
  110. "age": 23,
  111. "birth": "1989-08-07",
  112. "isMarried": false,
  113. "salary": 8000
  114. },
  115. "sort": [
  116. 23
  117. ]
  118. },
  119. {
  120. "_index": "test_search_index",
  121. "_type": "doc",
  122. "_id": "5",
  123. "_score": null,
  124. "_source": {
  125. "username": "Niko",
  126. "job": "web engineer",
  127. "age": 18,
  128. "birth": "1994-08-07",
  129. "isMarried": false,
  130. "salary": 5000
  131. },
  132. "sort": [
  133. 18
  134. ]
  135. }
  136. ]
  137. }
  138. }
  139. },
  140. {
  141. "key": "java engineer",
  142. "doc_count": 1,
  143. "top_employee": {
  144. "hits": {
  145. "total": 1,
  146. "max_score": null,
  147. "hits": [
  148. {
  149. "_index": "test_search_index",
  150. "_type": "doc",
  151. "_id": "1",
  152. "_score": null,
  153. "_source": {
  154. "username": "alfred way",
  155. "job": "java engineer",
  156. "age": 18,
  157. "birth": "1990-01-02",
  158. "isMarried": false,
  159. "salary": 10000
  160. },
  161. "sort": [
  162. 18
  163. ]
  164. }
  165. ]
  166. }
  167. }
  168. },
  169. {
  170. "key": "java senior engineer",
  171. "doc_count": 1,
  172. "top_employee": {
  173. "hits": {
  174. "total": 1,
  175. "max_score": null,
  176. "hits": [
  177. {
  178. "_index": "test_search_index",
  179. "_type": "doc",
  180. "_id": "2",
  181. "_score": null,
  182. "_source": {
  183. "username": "tom",
  184. "job": "java senior engineer",
  185. "age": 28,
  186. "birth": "1980-05-07",
  187. "isMarried": true,
  188. "salary": 30000
  189. },
  190. "sort": [
  191. 28
  192. ]
  193. }
  194. ]
  195. }
  196. }
  197. }
  198. ]
  199. }
  200. }
  201. }
Bucket聚合分析

Terms

该分桶策略最简单,直接按照term来分桶,如果是text类型,则按照分次后的结果分桶

  1. GET test_search_index/_search
  2. {
  3. "size":0,
  4. "aggs":{
  5. "jobs":{
  6. "terms":{
  7. "field":"job.keyword",
  8. "size":10
  9. }
  10. }
  11. }
  12. }
  13. #返回
  14. {
  15. "took": 2,
  16. "timed_out": false,
  17. "_shards": {
  18. "total": 5,
  19. "successful": 5,
  20. "skipped": 0,
  21. "failed": 0
  22. },
  23. "hits": {
  24. "total": 6,
  25. "max_score": 0,
  26. "hits": []
  27. },
  28. "aggregations": {
  29. "jobs": {
  30. "doc_count_error_upper_bound": 0,
  31. "sum_other_doc_count": 0,
  32. "buckets": [
  33. {
  34. "key": "ruby engineer",
  35. "doc_count": 2
  36. },
  37. {
  38. "key": "web engineer",
  39. "doc_count": 2
  40. },
  41. {
  42. "key": "java engineer",
  43. "doc_count": 1
  44. },
  45. {
  46. "key": "java senior engineer",
  47. "doc_count": 1
  48. }
  49. ]
  50. }
  51. }
  52. }

Range

通过制定数值的范围来设定分桶规则

  1. GET test_search_index/_search
  2. {
  3. "size":0,
  4. "aggs":{
  5. "salary_range":{
  6. "range":{
  7. "field":"salary",
  8. "ranges":[
  9. {
  10. "to":10000
  11. },
  12. {
  13. "from":10000,
  14. "to":20000
  15. },
  16. {
  17. "from":20000
  18. }
  19. ]
  20. }
  21. }
  22. }
  23. }
  24. #返回
  25. {
  26. "took": 3,
  27. "timed_out": false,
  28. "_shards": {
  29. "total": 5,
  30. "successful": 5,
  31. "skipped": 0,
  32. "failed": 0
  33. },
  34. "hits": {
  35. "total": 6,
  36. "max_score": 0,
  37. "hits": []
  38. },
  39. "aggregations": {
  40. "salary_range": {
  41. "buckets": [
  42. {
  43. "key": "*-10000.0",
  44. "to": 10000,
  45. "doc_count": 2
  46. },
  47. {
  48. "key": "10000.0-20000.0",
  49. "from": 10000,
  50. "to": 20000,
  51. "doc_count": 3
  52. },
  53. {
  54. "key": "20000.0-*",
  55. "from": 20000,
  56. "doc_count": 1
  57. }
  58. ]
  59. }
  60. }
  61. }

Date Range

通过制定日期的范围来设定分桶规则

  1. GET test_search_index/_search
  2. {
  3. "size":0,
  4. "aggs":{
  5. "date_range":{
  6. "range":{
  7. "field":"birth",
  8. "format":"yyyy",
  9. "ranges":[
  10. {
  11. "from":"1980",
  12. "to":"1990"
  13. },
  14. {
  15. "from":"1990",
  16. "to":"2000"
  17. },
  18. {
  19. "from":"2000"
  20. }
  21. ]
  22. }
  23. }
  24. }
  25. }
  26. #返回
  27. {
  28. "took": 3,
  29. "timed_out": false,
  30. "_shards": {
  31. "total": 5,
  32. "successful": 5,
  33. "skipped": 0,
  34. "failed": 0
  35. },
  36. "hits": {
  37. "total": 6,
  38. "max_score": 0,
  39. "hits": []
  40. },
  41. "aggregations": {
  42. "date_range": {
  43. "buckets": [
  44. {
  45. "key": "1980-1990",
  46. "from": 315532800000,
  47. "from_as_string": "1980",
  48. "to": 631152000000,
  49. "to_as_string": "1990",
  50. "doc_count": 4
  51. },
  52. {
  53. "key": "1990-2000",
  54. "from": 631152000000,
  55. "from_as_string": "1990",
  56. "to": 946684800000,
  57. "to_as_string": "2000",
  58. "doc_count": 2
  59. },
  60. {
  61. "key": "2000-*",
  62. "from": 946684800000,
  63. "from_as_string": "2000",
  64. "doc_count": 0
  65. }
  66. ]
  67. }
  68. }
  69. }

Historgram

直方图,以固定间隔的策略来分隔数据

  1. #表示间隔5000分隔工资的分布情况, 最小0,最大40000
  2. GET test_search_index/_search
  3. {
  4. "size":0,
  5. "aggs":{
  6. "salary_hist":{
  7. "histogram":{
  8. "field":"salary",
  9. "interval":5000,
  10. "extended_bounds":{
  11. "min":0,
  12. "max":40000
  13. }
  14. }
  15. }
  16. }
  17. }
  18. #返回
  19. {
  20. "took": 2,
  21. "timed_out": false,
  22. "_shards": {
  23. "total": 5,
  24. "successful": 5,
  25. "skipped": 0,
  26. "failed": 0
  27. },
  28. "hits": {
  29. "total": 6,
  30. "max_score": 0,
  31. "hits": []
  32. },
  33. "aggregations": {
  34. "salary_hist": {
  35. "buckets": [
  36. {
  37. "key": 0,
  38. "doc_count": 0
  39. },
  40. {
  41. "key": 5000,
  42. "doc_count": 2
  43. },
  44. {
  45. "key": 10000,
  46. "doc_count": 2
  47. },
  48. {
  49. "key": 15000,
  50. "doc_count": 1
  51. },
  52. {
  53. "key": 20000,
  54. "doc_count": 0
  55. },
  56. {
  57. "key": 25000,
  58. "doc_count": 0
  59. },
  60. {
  61. "key": 30000,
  62. "doc_count": 1
  63. },
  64. {
  65. "key": 35000,
  66. "doc_count": 0
  67. },
  68. {
  69. "key": 40000,
  70. "doc_count": 0
  71. }
  72. ]
  73. }
  74. }
  75. }

Date Historgram

针对日期的直方图或者柱状图,是时序数据分析中常用的聚合分析类型

  1. GET test_search_index/_search
  2. {
  3. "size":0,
  4. "aggs":{
  5. "salary_hist":{
  6. "date_histogram":{
  7. "field":"birth",
  8. "interval":"year",
  9. "format":"yyyy"
  10. }
  11. }
  12. }
  13. }
  14. #返回
  15. {
  16. "took": 4,
  17. "timed_out": false,
  18. "_shards": {
  19. "total": 5,
  20. "successful": 5,
  21. "skipped": 0,
  22. "failed": 0
  23. },
  24. "hits": {
  25. "total": 6,
  26. "max_score": 0,
  27. "hits": []
  28. },
  29. "aggregations": {
  30. "salary_hist": {
  31. "buckets": [
  32. {
  33. "key_as_string": "1980",
  34. "key": 315532800000,
  35. "doc_count": 1
  36. },
  37. {
  38. "key_as_string": "1981",
  39. "key": 347155200000,
  40. "doc_count": 0
  41. },
  42. {
  43. "key_as_string": "1982",
  44. "key": 378691200000,
  45. "doc_count": 0
  46. },
  47. {
  48. "key_as_string": "1983",
  49. "key": 410227200000,
  50. "doc_count": 0
  51. },
  52. {
  53. "key_as_string": "1984",
  54. "key": 441763200000,
  55. "doc_count": 0
  56. },
  57. {
  58. "key_as_string": "1985",
  59. "key": 473385600000,
  60. "doc_count": 1
  61. },
  62. {
  63. "key_as_string": "1986",
  64. "key": 504921600000,
  65. "doc_count": 0
  66. },
  67. {
  68. "key_as_string": "1987",
  69. "key": 536457600000,
  70. "doc_count": 1
  71. },
  72. {
  73. "key_as_string": "1988",
  74. "key": 567993600000,
  75. "doc_count": 0
  76. },
  77. {
  78. "key_as_string": "1989",
  79. "key": 599616000000,
  80. "doc_count": 1
  81. },
  82. {
  83. "key_as_string": "1990",
  84. "key": 631152000000,
  85. "doc_count": 1
  86. },
  87. {
  88. "key_as_string": "1991",
  89. "key": 662688000000,
  90. "doc_count": 0
  91. },
  92. {
  93. "key_as_string": "1992",
  94. "key": 694224000000,
  95. "doc_count": 0
  96. },
  97. {
  98. "key_as_string": "1993",
  99. "key": 725846400000,
  100. "doc_count": 0
  101. },
  102. {
  103. "key_as_string": "1994",
  104. "key": 757382400000,
  105. "doc_count": 1
  106. }
  107. ]
  108. }
  109. }
  110. }
Bucket+Metric聚合分析

分桶后再分桶

  1. 1. GET test_search_index/_search
  2. {
  3. "size":0,
  4. "aggs":{
  5. "jobs":{
  6. "terms":{
  7. "field":"job.keyword",
  8. "size":10
  9. },
  10. "aggs":{
  11. "age_range":{
  12. "range":{
  13. "field":"age",
  14. "ranges":[
  15. {"to":20},
  16. {"from":20,"to":30},
  17. {"from":30}
  18. ]
  19. }
  20. }
  21. }
  22. }
  23. }
  24. }
  25. #返回
  26. {
  27. "took": 2,
  28. "timed_out": false,
  29. "_shards": {
  30. "total": 5,
  31. "successful": 5,
  32. "skipped": 0,
  33. "failed": 0
  34. },
  35. "hits": {
  36. "total": 6,
  37. "max_score": 0,
  38. "hits": []
  39. },
  40. "aggregations": {
  41. "jobs": {
  42. "doc_count_error_upper_bound": 0,
  43. "sum_other_doc_count": 0,
  44. "buckets": [
  45. {
  46. "key": "ruby engineer",
  47. "doc_count": 2,
  48. "age_range": {
  49. "buckets": [
  50. {
  51. "key": "*-20.0",
  52. "to": 20,
  53. "doc_count": 0
  54. },
  55. {
  56. "key": "20.0-30.0",
  57. "from": 20,
  58. "to": 30,
  59. "doc_count": 2
  60. },
  61. {
  62. "key": "30.0-*",
  63. "from": 30,
  64. "doc_count": 0
  65. }
  66. ]
  67. }
  68. },
  69. {
  70. "key": "web engineer",
  71. "doc_count": 2,
  72. "age_range": {
  73. "buckets": [
  74. {
  75. "key": "*-20.0",
  76. "to": 20,
  77. "doc_count": 1
  78. },
  79. {
  80. "key": "20.0-30.0",
  81. "from": 20,
  82. "to": 30,
  83. "doc_count": 1
  84. },
  85. {
  86. "key": "30.0-*",
  87. "from": 30,
  88. "doc_count": 0
  89. }
  90. ]
  91. }
  92. },
  93. {
  94. "key": "java engineer",
  95. "doc_count": 1,
  96. "age_range": {
  97. "buckets": [
  98. {
  99. "key": "*-20.0",
  100. "to": 20,
  101. "doc_count": 1
  102. },
  103. {
  104. "key": "20.0-30.0",
  105. "from": 20,
  106. "to": 30,
  107. "doc_count": 0
  108. },
  109. {
  110. "key": "30.0-*",
  111. "from": 30,
  112. "doc_count": 0
  113. }
  114. ]
  115. }
  116. },
  117. {
  118. "key": "java senior engineer",
  119. "doc_count": 1,
  120. "age_range": {
  121. "buckets": [
  122. {
  123. "key": "*-20.0",
  124. "to": 20,
  125. "doc_count": 0
  126. },
  127. {
  128. "key": "20.0-30.0",
  129. "from": 20,
  130. "to": 30,
  131. "doc_count": 1
  132. },
  133. {
  134. "key": "30.0-*",
  135. "from": 30,
  136. "doc_count": 0
  137. }
  138. ]
  139. }
  140. }
  141. ]
  142. }
  143. }
  144. }
  145. 2.分桶后进行数据分析
  146. GET test_search_index/_search
  147. {
  148. "size":0,
  149. "aggs":{
  150. "jobs":{
  151. "terms":{
  152. "field":"job.keyword",
  153. "size":10
  154. },
  155. "aggs":{
  156. "salary":{
  157. "stats":{
  158. "field":"salary"
  159. }
  160. }
  161. }
  162. }
  163. }
  164. }
  165. {
  166. "took": 7,
  167. "timed_out": false,
  168. "_shards": {
  169. "total": 5,
  170. "successful": 5,
  171. "skipped": 0,
  172. "failed": 0
  173. },
  174. "hits": {
  175. "total": 6,
  176. "max_score": 0,
  177. "hits": []
  178. },
  179. "aggregations": {
  180. "jobs": {
  181. "doc_count_error_upper_bound": 0,
  182. "sum_other_doc_count": 0,
  183. "buckets": [
  184. {
  185. "key": "ruby engineer",
  186. "doc_count": 2,
  187. "salary": {
  188. "count": 2,
  189. "min": 12000,
  190. "max": 15000,
  191. "avg": 13500,
  192. "sum": 27000
  193. }
  194. },
  195. {
  196. "key": "web engineer",
  197. "doc_count": 2,
  198. "salary": {
  199. "count": 2,
  200. "min": 5000,
  201. "max": 8000,
  202. "avg": 6500,
  203. "sum": 13000
  204. }
  205. },
  206. {
  207. "key": "java engineer",
  208. "doc_count": 1,
  209. "salary": {
  210. "count": 1,
  211. "min": 10000,
  212. "max": 10000,
  213. "avg": 10000,
  214. "sum": 10000
  215. }
  216. },
  217. {
  218. "key": "java senior engineer",
  219. "doc_count": 1,
  220. "salary": {
  221. "count": 1,
  222. "min": 30000,
  223. "max": 30000,
  224. "avg": 30000,
  225. "sum": 30000
  226. }
  227. }
  228. ]
  229. }
  230. }
  231. }
Pipeline聚合分析

针对聚合分析的结果再次进行聚合分析,而且支持链式调用

Pipeline的分析结果会输出到原结果中,根据输出位置的不同,分为以下两类:

1.Parent结果内嵌到现有的聚合分析结果中

​ Derivative

​ Moving Average

​ Cumulative Sum

2.Sibling结果与现有聚合分析结果同级

​ Max/Min/Avg/Sum Bucket

​ Stats/Extended Stats Bucket

​ Percentitles Bucket

Sibling - Min Bucket

找出所有Bucket中值最小的Bucket名称和值

  1. 1.聚合分析求job的和
  2. 2.job里面内嵌套了一个求平均salary
  3. 3.然后用一个同级的 Min Bucket求上面平均工资里面最小的那个
  4. GET test_search_index/_search
  5. {
  6. "size":0,
  7. "aggs":{
  8. "jobs":{
  9. "terms":{
  10. "field":"job.keyword",
  11. "size":10
  12. },
  13. "aggs":{
  14. "avg_salary":{
  15. "avg":{
  16. "field":"salary"
  17. }
  18. }
  19. }
  20. },
  21. "min_salary_by_job":{
  22. "min_bucket":{
  23. "buckets_path":"jobs>avg_salary"
  24. }
  25. }
  26. }
  27. }
  28. {
  29. "took": 5,
  30. "timed_out": false,
  31. "_shards": {
  32. "total": 5,
  33. "successful": 5,
  34. "skipped": 0,
  35. "failed": 0
  36. },
  37. "hits": {
  38. "total": 6,
  39. "max_score": 0,
  40. "hits": []
  41. },
  42. "aggregations": {
  43. "jobs": {
  44. "doc_count_error_upper_bound": 0,
  45. "sum_other_doc_count": 0,
  46. "buckets": [
  47. {
  48. "key": "ruby engineer",
  49. "doc_count": 2,
  50. "avg_salary": {
  51. "value": 13500
  52. }
  53. },
  54. {
  55. "key": "web engineer",
  56. "doc_count": 2,
  57. "avg_salary": {
  58. "value": 6500
  59. }
  60. },
  61. {
  62. "key": "java engineer",
  63. "doc_count": 1,
  64. "avg_salary": {
  65. "value": 10000
  66. }
  67. },
  68. {
  69. "key": "java senior engineer",
  70. "doc_count": 1,
  71. "avg_salary": {
  72. "value": 30000
  73. }
  74. }
  75. ]
  76. },
  77. "min_salary_by_job": {
  78. "value": 6500,
  79. "keys": [
  80. "web engineer"
  81. ]
  82. }
  83. }
  84. }
  85. 找出所有Bucket中值最大的Bucket名称和值
  86. GET test_search_index/_search
  87. {
  88. "size":0,
  89. "aggs":{
  90. "jobs":{
  91. "terms":{
  92. "field":"job.keyword",
  93. "size":10
  94. },
  95. "aggs":{
  96. "avg_salary":{
  97. "avg":{
  98. "field":"salary"
  99. }
  100. }
  101. }
  102. },
  103. "max_salary_by_job":{
  104. "max_bucket":{
  105. "buckets_path":"jobs>avg_salary"
  106. }
  107. }
  108. }
  109. }
  110. 找出所有Bucket中值平均值
  111. GET test_search_index/_search
  112. {
  113. "size":0,
  114. "aggs":{
  115. "jobs":{
  116. "terms":{
  117. "field":"job.keyword",
  118. "size":10
  119. },
  120. "aggs":{
  121. "avg_salary":{
  122. "avg":{
  123. "field":"salary"
  124. }
  125. }
  126. }
  127. },
  128. "avg_salary_by_job":{
  129. "avg_bucket":{
  130. "buckets_path":"jobs>avg_salary"
  131. }
  132. }
  133. }
  134. }
  135. 计算所有Bucket值的Stats分析
  136. GET test_search_index/_search
  137. {
  138. "size":0,
  139. "aggs":{
  140. "jobs":{
  141. "terms":{
  142. "field":"job.keyword",
  143. "size":10
  144. },
  145. "aggs":{
  146. "avg_salary":{
  147. "avg":{
  148. "field":"salary"
  149. }
  150. }
  151. }
  152. },
  153. "stats_salary_by_job":{
  154. "stats_bucket":{
  155. "buckets_path":"jobs>avg_salary"
  156. }
  157. }
  158. }
  159. }
  160. #返回
  161. {
  162. "took": 3,
  163. "timed_out": false,
  164. "_shards": {
  165. "total": 5,
  166. "successful": 5,
  167. "skipped": 0,
  168. "failed": 0
  169. },
  170. "hits": {
  171. "total": 6,
  172. "max_score": 0,
  173. "hits": []
  174. },
  175. "aggregations": {
  176. "jobs": {
  177. "doc_count_error_upper_bound": 0,
  178. "sum_other_doc_count": 0,
  179. "buckets": [
  180. {
  181. "key": "ruby engineer",
  182. "doc_count": 2,
  183. "avg_salary": {
  184. "value": 13500
  185. }
  186. },
  187. {
  188. "key": "web engineer",
  189. "doc_count": 2,
  190. "avg_salary": {
  191. "value": 6500
  192. }
  193. },
  194. {
  195. "key": "java engineer",
  196. "doc_count": 1,
  197. "avg_salary": {
  198. "value": 10000
  199. }
  200. },
  201. {
  202. "key": "java senior engineer",
  203. "doc_count": 1,
  204. "avg_salary": {
  205. "value": 30000
  206. }
  207. }
  208. ]
  209. },
  210. "stats_salary_by_job": {
  211. "count": 4,
  212. "min": 6500,
  213. "max": 30000,
  214. "avg": 15000,
  215. "sum": 60000
  216. }
  217. }
  218. }
  219. 计算所有Bucket值的百分位数
  220. GET test_search_index/_search
  221. {
  222. "size":0,
  223. "aggs":{
  224. "jobs":{
  225. "terms":{
  226. "field":"job.keyword",
  227. "size":10
  228. },
  229. "aggs":{
  230. "avg_salary":{
  231. "avg":{
  232. "field":"salary"
  233. }
  234. }
  235. }
  236. },
  237. "percentiles_salary_by_job":{
  238. "percentiles_bucket":{
  239. "buckets_path":"jobs>avg_salary"
  240. }
  241. }
  242. }
  243. }
  244. {
  245. "took": 1,
  246. "timed_out": false,
  247. "_shards": {
  248. "total": 5,
  249. "successful": 5,
  250. "skipped": 0,
  251. "failed": 0
  252. },
  253. "hits": {
  254. "total": 6,
  255. "max_score": 0,
  256. "hits": []
  257. },
  258. "aggregations": {
  259. "jobs": {
  260. "doc_count_error_upper_bound": 0,
  261. "sum_other_doc_count": 0,
  262. "buckets": [
  263. {
  264. "key": "ruby engineer",
  265. "doc_count": 2,
  266. "avg_salary": {
  267. "value": 13500
  268. }
  269. },
  270. {
  271. "key": "web engineer",
  272. "doc_count": 2,
  273. "avg_salary": {
  274. "value": 6500
  275. }
  276. },
  277. {
  278. "key": "java engineer",
  279. "doc_count": 1,
  280. "avg_salary": {
  281. "value": 10000
  282. }
  283. },
  284. {
  285. "key": "java senior engineer",
  286. "doc_count": 1,
  287. "avg_salary": {
  288. "value": 30000
  289. }
  290. }
  291. ]
  292. },
  293. "percentiles_salary_by_job": {
  294. "values": {
  295. "1.0": 6500,
  296. "5.0": 6500,
  297. "25.0": 10000,
  298. "50.0": 13500,
  299. "75.0": 13500,
  300. "95.0": 30000,
  301. "99.0": 30000
  302. }
  303. }
  304. }
  305. }
Parent- Derivative

计算Bucket值的导数

  1. GET test_search_index/_search
  2. {
  3. "size":0,
  4. "aggs":{
  5. "birth":{
  6. "date_histogram":{
  7. "field":"birth",
  8. "interval":"year",
  9. "min_doc_count":0
  10. },
  11. "aggs":{
  12. "avg_salary":{
  13. "avg":{
  14. "field":"salary"
  15. }
  16. },
  17. "derivative_avg_salary":{
  18. "derivative":{
  19. "buckets_path":"avg_salary"
  20. }
  21. }
  22. }
  23. }
  24. }
  25. }
  26. #返回
  27. {
  28. "took": 2,
  29. "timed_out": false,
  30. "_shards": {
  31. "total": 5,
  32. "successful": 5,
  33. "skipped": 0,
  34. "failed": 0
  35. },
  36. "hits": {
  37. "total": 6,
  38. "max_score": 0,
  39. "hits": []
  40. },
  41. "aggregations": {
  42. "birth": {
  43. "buckets": [
  44. {
  45. "key_as_string": "1980-01-01T00:00:00.000Z",
  46. "key": 315532800000,
  47. "doc_count": 1,
  48. "avg_salary": {
  49. "value": 30000
  50. }
  51. },
  52. {
  53. "key_as_string": "1981-01-01T00:00:00.000Z",
  54. "key": 347155200000,
  55. "doc_count": 0,
  56. "avg_salary": {
  57. "value": null
  58. },
  59. "derivative_avg_salary": {
  60. "value": null
  61. }
  62. },
  63. {
  64. "key_as_string": "1982-01-01T00:00:00.000Z",
  65. "key": 378691200000,
  66. "doc_count": 0,
  67. "avg_salary": {
  68. "value": null
  69. },
  70. "derivative_avg_salary": {
  71. "value": null
  72. }
  73. },
  74. {
  75. "key_as_string": "1983-01-01T00:00:00.000Z",
  76. "key": 410227200000,
  77. "doc_count": 0,
  78. "avg_salary": {
  79. "value": null
  80. },
  81. "derivative_avg_salary": {
  82. "value": null
  83. }
  84. },
  85. {
  86. "key_as_string": "1984-01-01T00:00:00.000Z",
  87. "key": 441763200000,
  88. "doc_count": 0,
  89. "avg_salary": {
  90. "value": null
  91. },
  92. "derivative_avg_salary": {
  93. "value": null
  94. }
  95. },
  96. {
  97. "key_as_string": "1985-01-01T00:00:00.000Z",
  98. "key": 473385600000,
  99. "doc_count": 1,
  100. "avg_salary": {
  101. "value": 15000
  102. },
  103. "derivative_avg_salary": {
  104. "value": null
  105. }
  106. },
  107. {
  108. "key_as_string": "1986-01-01T00:00:00.000Z",
  109. "key": 504921600000,
  110. "doc_count": 0,
  111. "avg_salary": {
  112. "value": null
  113. },
  114. "derivative_avg_salary": {
  115. "value": null
  116. }
  117. },
  118. {
  119. "key_as_string": "1987-01-01T00:00:00.000Z",
  120. "key": 536457600000,
  121. "doc_count": 1,
  122. "avg_salary": {
  123. "value": 12000
  124. },
  125. "derivative_avg_salary": {
  126. "value": null
  127. }
  128. },
  129. {
  130. "key_as_string": "1988-01-01T00:00:00.000Z",
  131. "key": 567993600000,
  132. "doc_count": 0,
  133. "avg_salary": {
  134. "value": null
  135. },
  136. "derivative_avg_salary": {
  137. "value": null
  138. }
  139. },
  140. {
  141. "key_as_string": "1989-01-01T00:00:00.000Z",
  142. "key": 599616000000,
  143. "doc_count": 1,
  144. "avg_salary": {
  145. "value": 8000
  146. },
  147. "derivative_avg_salary": {
  148. "value": null
  149. }
  150. },
  151. {
  152. "key_as_string": "1990-01-01T00:00:00.000Z",
  153. "key": 631152000000,
  154. "doc_count": 1,
  155. "avg_salary": {
  156. "value": 10000
  157. },
  158. "derivative_avg_salary": {
  159. "value": 2000
  160. }
  161. },
  162. {
  163. "key_as_string": "1991-01-01T00:00:00.000Z",
  164. "key": 662688000000,
  165. "doc_count": 0,
  166. "avg_salary": {
  167. "value": null
  168. },
  169. "derivative_avg_salary": {
  170. "value": null
  171. }
  172. },
  173. {
  174. "key_as_string": "1992-01-01T00:00:00.000Z",
  175. "key": 694224000000,
  176. "doc_count": 0,
  177. "avg_salary": {
  178. "value": null
  179. },
  180. "derivative_avg_salary": {
  181. "value": null
  182. }
  183. },
  184. {
  185. "key_as_string": "1993-01-01T00:00:00.000Z",
  186. "key": 725846400000,
  187. "doc_count": 0,
  188. "avg_salary": {
  189. "value": null
  190. },
  191. "derivative_avg_salary": {
  192. "value": null
  193. }
  194. },
  195. {
  196. "key_as_string": "1994-01-01T00:00:00.000Z",
  197. "key": 757382400000,
  198. "doc_count": 1,
  199. "avg_salary": {
  200. "value": 5000
  201. },
  202. "derivative_avg_salary": {
  203. "value": null
  204. }
  205. }
  206. ]
  207. }
  208. }
  209. }

ElasticSearch6.x版本聚合分析整理的更多相关文章

  1. Elasticsearch 6.x版本全文检索学习之聚合分析入门

    1.什么是聚合分析? 答:聚合分析,英文为Aggregation,是es除搜索功能外提供的针对es数据做统计分析的功能.特点如下所示: a.功能丰富,提供Bucket.Metric.Pipeline等 ...

  2. 根据版本的不同整理所有的绿色SQL Server

    在这篇论坛文章中,读者可以了解到如何根据不同的SQL Server版本,整理出所有版本的绿色SQL Server的具体方法,详细内容请参考下文: 1. Sqlservr.exe 运行参数 Sql Se ...

  3. SQL语句优化技术分析 整理他人的

    一.操作符优化 1.IN 操作符 用IN写出来的SQL的优点是比较容易写及清晰易懂,这比较适合现代软件开发的风格.但是用IN的SQL性能总是比较低的,从Oracle执行的步骤来分析用IN的SQL与不用 ...

  4. Elasticsearch学习笔记(三)聚合分析Agg

    一.设置fielddata PUT /index/_mapping/type {     "properties":{          "fieldName" ...

  5. ElasticSearch聚合分析

    聚合用于分析查询结果集的统计指标,我们以观看日志分析为例,介绍各种常用的ElasticSearch聚合操作. 目录: 查询用户观看视频数和观看时长 聚合分页器 查询视频uv 单个视频uv 批量查询视频 ...

  6. element-ui 组件源码分析整理笔记目录

    element-ui button组件 radio组件源码分析整理笔记(一) element-ui switch组件源码分析整理笔记(二) element-ui inputNumber.Card .B ...

  7. element-ui Carousel 走马灯源码分析整理笔记(十一)

    Carousel 走马灯源码分析整理笔记,这篇写的不详细,后面有空补充 main.vue <template> <!--走马灯的最外层包裹div--> <div clas ...

  8. ES系列十四、ES聚合分析(聚合分析简介、指标聚合、桶聚合)

    一.聚合分析简介 1. ES聚合分析是什么? 聚合分析是数据库中重要的功能特性,完成对一个查询的数据集中数据的聚合计算,如:找出某字段(或计算表达式的结果)的最大值.最小值,计算和.平均值等.ES作为 ...

  9. es-aggregations聚合分析

    聚合分析的格式: "aggregations" : { "<aggregation_name>" : { "<aggregation ...

随机推荐

  1. TCP/IP 第三章

    1,ip协议不可靠.无连接特性介绍 不可靠:计算机A往计算机B发送数据报1,若途径的路由器缓存已满,或者ttl(time to live 生存周期)到了,则路由器直接丢弃数据包1,并产生icmp数据包 ...

  2. 17 | 精益求精:聊聊提高GUI测试稳定性的关键技术

  3. 如何判断/检查一个集合(List<string>)中是否有重复的元素

    问题描述 在.NET/C#应用程序编程开发中,如何判断一个字符串集合List<string>中是否有重复的元素? 假如有如下的List<string>集合: var lstNa ...

  4. 阿里巴巴 -- MySQL DBA 面试题

    1.MySQL的复制原理以及流程 (1).先问基本原理流程,3个线程以及之间的关联: (2).再问一致性延时性,数据恢复: (3).再问各种工作遇到的复制bug的解决方法. 2.MySQL中myisa ...

  5. Hive入门(三)分桶

    1 什么是分桶 上一篇说到了分区,分区中的数据可以被进一步拆分成桶,bucket.不同于分区对列直接进行拆分,桶往往使用列的哈希值进行数据采样.在分区数量过于庞大以至于可能导致文件系统崩溃时,建议使用 ...

  6. Codeforces Gym100502H:Clock Pictures(KMP算法)

    http://codeforces.com/gym/100502/attachments 题意:有两个时钟上面有n个指针,给出的数字代表指针的角度.问能否在某一时刻使得两个时钟的指针重合. 思路:容易 ...

  7. LightOJ 1422:Halloween Costumes(区间DP入门)

    http://lightoj.com/volume_showproblem.php?problem=1422 题意:去参加派对,有n场派对,每场派对要穿第wi种衣服,可以选择外面套一件,也可以选择脱掉 ...

  8. vim与系统剪切板之间的复制粘贴

    背景 vim各种快捷建溜得飞起,然而与系统剪切板之间的复制粘贴一直都是我的痛. 每次需要从vim中拷贝些文字去浏览器搜索,都需要用鼠标选中vim的文字后,Ctrl+c.Ctrl+v,硬生生掐断了纯键盘 ...

  9. 转: windows系统下mysql出现Error 1045(28000) Access Denied for user 'root'@'localhost'

    windows系统下mysql出现Error 1045(28000) Access Denied for user 'root'@'localhost' 转自 http://zxy5241.space ...

  10. centos 5.2 php升级

    # gedit /etc/yum.repos.d/utterramblings.repo [utterramblings] name=Jason's Utter Ramblings Repo base ...