一、聚合分析简介

1. ES聚合分析是什么?

聚合分析是数据库中重要的功能特性,完成对一个查询的数据集中数据的聚合计算,如:找出某字段(或计算表达式的结果)的最大值、最小值,计算和、平均值等。ES作为搜索引擎兼数据库,同样提供了强大的聚合分析能力。

对一个数据集求最大、最小、和、平均值等指标的聚合,在ES中称为指标聚合   metric

而关系型数据库中除了有聚合函数外,还可以对查询出的数据进行分组group by,再在组上进行指标聚合。在 ES 中group by 称为分桶桶聚合 bucketing

ES中还提供了矩阵聚合(matrix)、管道聚合(pipleline),但还在完善中。

2. ES聚合分析查询的写法

在查询请求体中以aggregations节点按如下语法定义聚合分析:

  1. "aggregations" : {
  2. "<aggregation_name>" : { <!--聚合的名字 -->
  3. "<aggregation_type>" : { <!--聚合的类型 -->
  4. <aggregation_body> <!--聚合体:对哪些字段进行聚合 -->
  5. }
  6. [,"meta" : { [<meta_data_body>] } ]? <!--元 -->
  7. [,"aggregations" : { [<sub_aggregation>]+ } ]? <!--在聚合里面在定义子聚合 -->
  8. }
  9. [,"<aggregation_name_2>" : { ... } ]*<!--聚合的名字 -->
  10. }

 说明:

aggregations 也可简写为 aggs

3. 聚合分析的值来源

聚合计算的值可以取字段的值,也可是脚本计算的结果

二、指标聚合

1. max min sum avg

示例1:查询所有客户中余额的最大值

  1. POST /bank/_search?
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "masssbalance": {
  6. "max": {
  7. "field": "balance"
  8. }
  9. }
  10. }
  11. }

结果1:

  1. {
  2. "took": 2080,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 1000,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "masssbalance": {
  17. "value": 49989
  18. }
  19. }
  20. }

示例2:查询年龄为24岁的客户中的余额最大值

  1. POST /bank/_search?
  2. {
  3. "size": 2,
  4. "query": {
  5. "match": {
  6. "age": 24
  7. }
  8. },
  9. "sort": [
  10. {
  11. "balance": {
  12. "order": "desc"
  13. }
  14. }
  15. ],
  16. "aggs": {
  17. "max_balance": {
  18. "max": {
  19. "field": "balance"
  20. }
  21. }
  22. }
  23. }

结果2:

  1. {
  2. "took": 5,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 42,
  12. "max_score": null,
  13. "hits": [
  14. {
  15. "_index": "bank",
  16. "_type": "_doc",
  17. "_id": "697",
  18. "_score": null,
  19. "_source": {
  20. "account_number": 697,
  21. "balance": 48745,
  22. "firstname": "Mallory",
  23. "lastname": "Emerson",
  24. "age": 24,
  25. "gender": "F",
  26. "address": "318 Dunne Court",
  27. "employer": "Exoplode",
  28. "email": "malloryemerson@exoplode.com",
  29. "city": "Montura",
  30. "state": "LA"
  31. },
  32. "sort": [
  33. 48745
  34. ]
  35. },
  36. {
  37. "_index": "bank",
  38. "_type": "_doc",
  39. "_id": "917",
  40. "_score": null,
  41. "_source": {
  42. "account_number": 917,
  43. "balance": 47782,
  44. "firstname": "Parks",
  45. "lastname": "Hurst",
  46. "age": 24,
  47. "gender": "M",
  48. "address": "933 Cozine Avenue",
  49. "employer": "Pyramis",
  50. "email": "parkshurst@pyramis.com",
  51. "city": "Lindcove",
  52. "state": "GA"
  53. },
  54. "sort": [
  55. 47782
  56. ]
  57. }
  58. ]
  59. },
  60. "aggregations": {
  61. "max_balance": {
  62. "value": 48745
  63. }
  64. }
  65. }

示例3:值来源于脚本,查询所有客户的平均年龄是多少,并对平均年龄加10

  1. POST /bank/_search?size=0
  2. {
  3. "aggs": {
  4. "avg_age": {
  5. "avg": {
  6. "script": {
  7. "source": "doc.age.value"
  8. }
  9. }
  10. },
  11. "avg_age10": {
  12. "avg": {
  13. "script": {
  14. "source": "doc.age.value + 10"
  15. }
  16. }
  17. }
  18. }
  19. }

结果3:

  1. {
  2. "took": 86,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 1000,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "avg_age": {
  17. "value": 30.171
  18. },
  19. "avg_age10": {
  20. "value": 40.171
  21. }
  22. }
  23. }

示例4:指定field,在脚本中用_value 取字段的值

  1. POST /bank/_search?size=0
  2. {
  3. "aggs": {
  4. "sum_balance": {
  5. "sum": {
  6. "field": "balance",
  7. "script": {
  8. "source": "_value * 1.03"
  9. }
  10. }
  11. }
  12. }
  13. }

结果4:

  1. {
  2. "took": 165,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 1000,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "sum_balance": {
  17. "value": 26486282.11
  18. }
  19. }
  20. }

示例5:为没有值字段指定值。如未指定,缺失该字段值的文档将被忽略。

  1. POST /bank/_search?size=0
  2. {
  3. "aggs": {
  4. "avg_age": {
  5. "avg": {
  6. "field": "age",
  7. "missing": 18
  8. }
  9. }
  10. }
  11. }

2. 文档计数 count

示例1:统计银行索引bank下年龄为24的文档数量

  1. POST /bank/_doc/_count
  2. {
  3. "query": {
  4. "match": {
  5. "age" : 24
  6. }
  7. }
  8. }

结果1:

  1. {
  2. "count": 42,
  3. "_shards": {
  4. "total": 5,
  5. "successful": 5,
  6. "skipped": 0,
  7. "failed": 0
  8. }
  9. }

3. Value count 统计某字段有值的文档数

示例1:

  1. POST /bank/_search?size=0
  2. {
  3. "aggs": {
  4. "age_count": {
  5. "value_count": {
  6. "field": "age"
  7. }
  8. }
  9. }
  10. }

结果1:

  1. {
  2. "took": 2022,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 1000,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "age_count": {
  17. "value": 1000
  18. }
  19. }
  20. }

4. cardinality  值去重计数

示例1:

  1. POST /bank/_search?size=0
  2. {
  3. "aggs": {
  4. "age_count": {
  5. "cardinality": {
  6. "field": "age"
  7. }
  8. },
  9. "state_count": {
  10. "cardinality": {
  11. "field": "state.keyword"
  12. }
  13. }
  14. }
  15. }

 说明:state的使用它的keyword版

结果1:

  1. {
  2. "took": 2074,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 1000,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "state_count": {
  17. "value": 51
  18. },
  19. "age_count": {
  20. "value": 21
  21. }
  22. }
  23. }

5. stats 统计 count max min avg sum 5个值

示例1:

  1. POST /bank/_search?size=0
  2. {
  3. "aggs": {
  4. "age_stats": {
  5. "stats": {
  6. "field": "age"
  7. }
  8. }
  9. }
  10. }

结果1:

  1. {
  2. "took": 7,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 1000,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "age_stats": {
  17. "count": 1000,
  18. "min": 20,
  19. "max": 40,
  20. "avg": 30.171,
  21. "sum": 30171
  22. }
  23. }
  24. }

6. Extended stats

高级统计,比stats多4个统计结果: 平方和、方差、标准差、平均值加/减两个标准差的区间

示例1:

  1. POST /bank/_search?size=0
  2. {
  3. "aggs": {
  4. "age_stats": {
  5. "extended_stats": {
  6. "field": "age"
  7. }
  8. }
  9. }
  10. }

结果1:

  1. {
  2. "took": 7,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 1000,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "age_stats": {
  17. "count": 1000,
  18. "min": 20,
  19. "max": 40,
  20. "avg": 30.171,
  21. "sum": 30171,
  22. "sum_of_squares": 946393,
  23. "variance": 36.10375899999996,
  24. "std_deviation": 6.008640362012022,
  25. "std_deviation_bounds": {
  26. "upper": 42.18828072402404,
  27. "lower": 18.153719275975956
  28. }
  29. }
  30. }
  31. }

7. Percentiles 占比百分位对应的值统计

对指定字段(脚本)的值按从小到大累计每个值对应的文档数的占比(占所有命中文档数的百分比),返回指定占比比例对应的值。默认返回[ 1, 5, 25, 50, 75, 95, 99 ]分位上的值。如下中间的结果,可以理解为:占比为50%的文档的age值 <= 31,或反过来:age<=31的文档数占总命中文档数的50%

示例1:

  1. POST /bank/_search?size=0
  2. {
  3. "aggs": {
  4. "age_percents": {
  5. "percentiles": {
  6. "field": "age"
  7. }
  8. }
  9. }
  10. }

结果1:

  1. {
  2. "took": 87,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 1000,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "age_percents": {
  17. "values": {
  18. "1.0": 20,
  19. "5.0": 21,
  20. "25.0": 25,
  21. "50.0": 31,
  22. "75.0": 35.00000000000001,
  23. "95.0": 39,
  24. "99.0": 40
  25. }
  26. }
  27. }
  28. }

结果说明:

占比为50%的文档的age值 <= 31,或反过来:age<=31的文档数占总命中文档数的50%

示例2:指定分位值

  1. POST /bank/_search?size=0
  2. {
  3. "aggs": {
  4. "age_percents": {
  5. "percentiles": {
  6. "field": "age",
  7. "percents" : [95, 99, 99.9]
  8. }
  9. }
  10. }
  11. }

结果2:

  1. {
  2. "took": 8,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 1000,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "age_percents": {
  17. "values": {
  18. "95.0": 39,
  19. "99.0": 40,
  20. "99.9": 40
  21. }
  22. }
  23. }
  24. }

8. Percentiles rank 统计值小于等于指定值的文档占比

示例1:统计年龄小于25和30的文档的占比,和第7项相反

  1. POST /bank/_search?size=0
  2. {
  3. "aggs": {
  4. "gge_perc_rank": {
  5. "percentile_ranks": {
  6. "field": "age",
  7. "values": [
  8. 25,
  9. 30
  10. ]
  11. }
  12. }
  13. }
  14. }

结果2:

  1. {
  2. "took": 8,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 1000,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "gge_perc_rank": {
  17. "values": {
  18. "25.0": 26.1,
  19. "30.0": 49.2
  20. }
  21. }
  22. }
  23. }

结果说明:年龄小于25的文档占比为26.1%,年龄小于30的文档占比为49.2%,

9. Geo Bounds aggregation 求文档集中的地理位置坐标点的范围

参考官网链接:

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-geobounds-aggregation.html

10. Geo Centroid aggregation  求地理位置中心点坐标值

参考官网链接:

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-geocentroid-aggregation.html

三、桶聚合

1. Terms Aggregation  根据字段值项分组聚合

示例1:

  1. POST /bank/_search?size=0
  2. {
  3. "aggs": {
  4. "age_terms": {
  5. "terms": {
  6. "field": "age"
  7. }
  8. }
  9. }
  10. }

结果1:

  1. {
  2. "took": 2000,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 1000,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "age_terms": {
  17. "doc_count_error_upper_bound": 0,
  18. "sum_other_doc_count": 463,
  19. "buckets": [
  20. {
  21. "key": 31,
  22. "doc_count": 61
  23. },
  24. {
  25. "key": 39,
  26. "doc_count": 60
  27. },
  28. {
  29. "key": 26,
  30. "doc_count": 59
  31. },
  32. {
  33. "key": 32,
  34. "doc_count": 52
  35. },
  36. {
  37. "key": 35,
  38. "doc_count": 52
  39. },
  40. {
  41. "key": 36,
  42. "doc_count": 52
  43. },
  44. {
  45. "key": 22,
  46. "doc_count": 51
  47. },
  48. {
  49. "key": 28,
  50. "doc_count": 51
  51. },
  52. {
  53. "key": 33,
  54. "doc_count": 50
  55. },
  56. {
  57. "key": 34,
  58. "doc_count": 49
  59. }
  60. ]
  61. }
  62. }
  63. }

结果说明:

"doc_count_error_upper_bound": 0:文档计数的最大偏差值

"sum_other_doc_count": 463:未返回的其他项的文档数

默认情况下返回按文档计数从高到低的前10个分组:

  1. "buckets": [
  2. {
  3. "key": 31,
  4. "doc_count": 61
  5. },
  6. {
  7. "key": 39,
  8. "doc_count": 60
  9. },
  10. .............
  11. ]

年龄为31的文档有61个,年龄为39的文档有60个

 size 指定返回多少个分组:

示例2:指定返回20个分组

  1. POST /bank/_search?size=0
  2. {
  3. "aggs": {
  4. "age_terms": {
  5. "terms": {
  6. "field": "age",
  7. "size": 20
  8. }
  9. }
  10. }
  11. }

结果2:

  1. {
  2. "took": 9,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 1000,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "age_terms": {
  17. "doc_count_error_upper_bound": 0,
  18. "sum_other_doc_count": 35,
  19. "buckets": [
  20. {
  21. "key": 31,
  22. "doc_count": 61
  23. },
  24. {
  25. "key": 39,
  26. "doc_count": 60
  27. },
  28. {
  29. "key": 26,
  30. "doc_count": 59
  31. },
  32. {
  33. "key": 32,
  34. "doc_count": 52
  35. },
  36. {
  37. "key": 35,
  38. "doc_count": 52
  39. },
  40. {
  41. "key": 36,
  42. "doc_count": 52
  43. },
  44. {
  45. "key": 22,
  46. "doc_count": 51
  47. },
  48. {
  49. "key": 28,
  50. "doc_count": 51
  51. },
  52. {
  53. "key": 33,
  54. "doc_count": 50
  55. },
  56. {
  57. "key": 34,
  58. "doc_count": 49
  59. },
  60. {
  61. "key": 30,
  62. "doc_count": 47
  63. },
  64. {
  65. "key": 21,
  66. "doc_count": 46
  67. },
  68. {
  69. "key": 40,
  70. "doc_count": 45
  71. },
  72. {
  73. "key": 20,
  74. "doc_count": 44
  75. },
  76. {
  77. "key": 23,
  78. "doc_count": 42
  79. },
  80. {
  81. "key": 24,
  82. "doc_count": 42
  83. },
  84. {
  85. "key": 25,
  86. "doc_count": 42
  87. },
  88. {
  89. "key": 37,
  90. "doc_count": 42
  91. },
  92. {
  93. "key": 27,
  94. "doc_count": 39
  95. },
  96. {
  97. "key": 38,
  98. "doc_count": 39
  99. }
  100. ]
  101. }
  102. }
  103. }

示例3:每个分组上显示偏差值

  1. POST /bank/_search?size=0
  2. {
  3. "aggs": {
  4. "age_terms": {
  5. "terms": {
  6. "field": "age",
  7. "size": 5,
  8. "shard_size": 20,
  9. "show_term_doc_count_error": true
  10. }
  11. }
  12. }
  13. }

结果3:

  1. {
  2. "took": 8,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 1000,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "age_terms": {
  17. "doc_count_error_upper_bound": 25,
  18. "sum_other_doc_count": 716,
  19. "buckets": [
  20. {
  21. "key": 31,
  22. "doc_count": 61,
  23. "doc_count_error_upper_bound": 0
  24. },
  25. {
  26. "key": 39,
  27. "doc_count": 60,
  28. "doc_count_error_upper_bound": 0
  29. },
  30. {
  31. "key": 26,
  32. "doc_count": 59,
  33. "doc_count_error_upper_bound": 0
  34. },
  35. {
  36. "key": 32,
  37. "doc_count": 52,
  38. "doc_count_error_upper_bound": 0
  39. },
  40. {
  41. "key": 36,
  42. "doc_count": 52,
  43. "doc_count_error_upper_bound": 0
  44. }
  45. ]
  46. }
  47. }
  48. }

示例4:shard_size 指定每个分片上返回多少个分组

shard_size 的默认值为:
索引只有一个分片:= size
多分片:= size * 1.5 + 10

  1. POST /bank/_search?size=0
  2. {
  3. "aggs": {
  4. "age_terms": {
  5. "terms": {
  6. "field": "age",
  7. "size": 5,
  8. "shard_size": 20
  9. }
  10. }
  11. }
  12. }

结果4:

  1. {
  2. "took": 8,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 1000,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "age_terms": {
  17. "doc_count_error_upper_bound": 25,
  18. "sum_other_doc_count": 716,
  19. "buckets": [
  20. {
  21. "key": 31,
  22. "doc_count": 61
  23. },
  24. {
  25. "key": 39,
  26. "doc_count": 60
  27. },
  28. {
  29. "key": 26,
  30. "doc_count": 59
  31. },
  32. {
  33. "key": 32,
  34. "doc_count": 52
  35. },
  36. {
  37. "key": 36,
  38. "doc_count": 52
  39. }
  40. ]
  41. }
  42. }
  43. }

 order  指定分组的排序

示例5:根据文档计数排序

  1. POST /bank/_search?size=0
  2. {
  3. "aggs": {
  4. "age_terms": {
  5. "terms": {
  6. "field": "age",
  7. "order" : { "_count" : "asc" }
  8. }
  9. }
  10. }
  11. }

结果5:

  1. {
  2. "took": 3,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 1000,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "age_terms": {
  17. "doc_count_error_upper_bound": 0,
  18. "sum_other_doc_count": 584,
  19. "buckets": [
  20. {
  21. "key": 29,
  22. "doc_count": 35
  23. },
  24. {
  25. "key": 27,
  26. "doc_count": 39
  27. },
  28. {
  29. "key": 38,
  30. "doc_count": 39
  31. },
  32. {
  33. "key": 23,
  34. "doc_count": 42
  35. },
  36. {
  37. "key": 24,
  38. "doc_count": 42
  39. },
  40. {
  41. "key": 25,
  42. "doc_count": 42
  43. },
  44. {
  45. "key": 37,
  46. "doc_count": 42
  47. },
  48. {
  49. "key": 20,
  50. "doc_count": 44
  51. },
  52. {
  53. "key": 40,
  54. "doc_count": 45
  55. },
  56. {
  57. "key": 21,
  58. "doc_count": 46
  59. }
  60. ]
  61. }
  62. }
  63. }

示例6:根据分组值排序

  1. POST /bank/_search?size=0
  2. {
  3. "aggs": {
  4. "age_terms": {
  5. "terms": {
  6. "field": "age",
  7. "order" : { "_key" : "asc" }
  8. }
  9. }
  10. }
  11. }

结果6:

  1. {
  2. "took": 10,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 1000,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "age_terms": {
  17. "doc_count_error_upper_bound": 0,
  18. "sum_other_doc_count": 549,
  19. "buckets": [
  20. {
  21. "key": 20,
  22. "doc_count": 44
  23. },
  24. {
  25. "key": 21,
  26. "doc_count": 46
  27. },
  28. {
  29. "key": 22,
  30. "doc_count": 51
  31. },
  32. {
  33. "key": 23,
  34. "doc_count": 42
  35. },
  36. {
  37. "key": 24,
  38. "doc_count": 42
  39. },
  40. {
  41. "key": 25,
  42. "doc_count": 42
  43. },
  44. {
  45. "key": 26,
  46. "doc_count": 59
  47. },
  48. {
  49. "key": 27,
  50. "doc_count": 39
  51. },
  52. {
  53. "key": 28,
  54. "doc_count": 51
  55. },
  56. {
  57. "key": 29,
  58. "doc_count": 35
  59. }
  60. ]
  61. }
  62. }
  63. }

示例7:取分组指标值排序

  1. POST /bank/_search?size=0
  2. {
  3. "aggs": {
  4. "age_terms": {
  5. "terms": {
  6. "field": "age",
  7. "order": {
  8. "max_balance": "asc"
  9. }
  10. },
  11. "aggs": {
  12. "max_balance": {
  13. "max": {
  14. "field": "balance"
  15. }
  16. },
  17. "min_balance": {
  18. "min": {
  19. "field": "balance"
  20. }
  21. }
  22. }
  23. }
  24. }
  25. }

结果7:

  1. {
  2. "took": 28,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 1000,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "age_terms": {
  17. "doc_count_error_upper_bound": 0,
  18. "sum_other_doc_count": 511,
  19. "buckets": [
  20. {
  21. "key": 27,
  22. "doc_count": 39,
  23. "min_balance": {
  24. "value": 1110
  25. },
  26. "max_balance": {
  27. "value": 46868
  28. }
  29. },
  30. {
  31. "key": 39,
  32. "doc_count": 60,
  33. "min_balance": {
  34. "value": 3589
  35. },
  36. "max_balance": {
  37. "value": 47257
  38. }
  39. },
  40. {
  41. "key": 37,
  42. "doc_count": 42,
  43. "min_balance": {
  44. "value": 1360
  45. },
  46. "max_balance": {
  47. "value": 47546
  48. }
  49. },
  50. {
  51. "key": 32,
  52. "doc_count": 52,
  53. "min_balance": {
  54. "value": 1031
  55. },
  56. "max_balance": {
  57. "value": 48294
  58. }
  59. },
  60. {
  61. "key": 26,
  62. "doc_count": 59,
  63. "min_balance": {
  64. "value": 1447
  65. },
  66. "max_balance": {
  67. "value": 48466
  68. }
  69. },
  70. {
  71. "key": 33,
  72. "doc_count": 50,
  73. "min_balance": {
  74. "value": 1314
  75. },
  76. "max_balance": {
  77. "value": 48734
  78. }
  79. },
  80. {
  81. "key": 24,
  82. "doc_count": 42,
  83. "min_balance": {
  84. "value": 1011
  85. },
  86. "max_balance": {
  87. "value": 48745
  88. }
  89. },
  90. {
  91. "key": 31,
  92. "doc_count": 61,
  93. "min_balance": {
  94. "value": 2384
  95. },
  96. "max_balance": {
  97. "value": 48758
  98. }
  99. },
  100. {
  101. "key": 34,
  102. "doc_count": 49,
  103. "min_balance": {
  104. "value": 3001
  105. },
  106. "max_balance": {
  107. "value": 48997
  108. }
  109. },
  110. {
  111. "key": 29,
  112. "doc_count": 35,
  113. "min_balance": {
  114. "value": 3596
  115. },
  116. "max_balance": {
  117. "value": 49119
  118. }
  119. }
  120. ]
  121. }
  122. }
  123. }

示例8:筛选分组-正则表达式匹配值

  1. GET /_search
  2. {
  3. "aggs" : {
  4. "tags" : {
  5. "terms" : {
  6. "field" : "tags",
  7. "include" : ".*sport.*",
  8. "exclude" : "water_.*"
  9. }
  10. }
  11. }
  12. }

示例9:筛选分组-指定值列表

  1. GET /_search
  2. {
  3. "aggs" : {
  4. "JapaneseCars" : {
  5. "terms" : {
  6. "field" : "make",
  7. "include" : ["mazda", "honda"]
  8. }
  9. },
  10. "ActiveCarManufacturers" : {
  11. "terms" : {
  12. "field" : "make",
  13. "exclude" : ["rover", "jensen"]
  14. }
  15. }
  16. }
  17. }

示例10:根据脚本计算值分组

  1. GET /_search
  2. {
  3. "aggs" : {
  4. "genres" : {
  5. "terms" : {
  6. "script" : {
  7. "source": "doc['genre'].value",
  8. "lang": "painless"
  9. }
  10. }
  11. }
  12. }
  13. }

示例1:缺失值处理

  1. GET /_search
  2. {
  3. "aggs" : {
  4. "tags" : {
  5. "terms" : {
  6. "field" : "tags",
  7. "missing": "N/A"
  8. }
  9. }
  10. }
  11. }

结果10:

  1. {
  2. "took": 2059,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 58,
  6. "successful": 58,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 1015,
  12. "max_score": 1,
  13. "hits": [
  14. {
  15. "_index": "bank",
  16. "_type": "_doc",
  17. "_id": "25",
  18. "_score": 1,
  19. "_source": {
  20. "account_number": 25,
  21. "balance": 40540,
  22. "firstname": "Virginia",
  23. "lastname": "Ayala",
  24. "age": 39,
  25. "gender": "F",
  26. "address": "171 Putnam Avenue",
  27. "employer": "Filodyne",
  28. "email": "virginiaayala@filodyne.com",
  29. "city": "Nicholson",
  30. "state": "PA"
  31. }
  32. },
  33. {
  34. "_index": "bank",
  35. "_type": "_doc",
  36. "_id": "44",
  37. "_score": 1,
  38. "_source": {
  39. "account_number": 44,
  40. "balance": 34487,
  41. "firstname": "Aurelia",
  42. "lastname": "Harding",
  43. "age": 37,
  44. "gender": "M",
  45. "address": "502 Baycliff Terrace",
  46. "employer": "Orbalix",
  47. "email": "aureliaharding@orbalix.com",
  48. "city": "Yardville",
  49. "state": "DE"
  50. }
  51. },
  52. {
  53. "_index": "bank",
  54. "_type": "_doc",
  55. "_id": "99",
  56. "_score": 1,
  57. "_source": {
  58. "account_number": 99,
  59. "balance": 47159,
  60. "firstname": "Ratliff",
  61. "lastname": "Heath",
  62. "age": 39,
  63. "gender": "F",
  64. "address": "806 Rockwell Place",
  65. "employer": "Zappix",
  66. "email": "ratliffheath@zappix.com",
  67. "city": "Shaft",
  68. "state": "ND"
  69. }
  70. },
  71. {
  72. "_index": "bank",
  73. "_type": "_doc",
  74. "_id": "119",
  75. "_score": 1,
  76. "_source": {
  77. "account_number": 119,
  78. "balance": 49222,
  79. "firstname": "Laverne",
  80. "lastname": "Johnson",
  81. "age": 28,
  82. "gender": "F",
  83. "address": "302 Howard Place",
  84. "employer": "Senmei",
  85. "email": "lavernejohnson@senmei.com",
  86. "city": "Herlong",
  87. "state": "DC"
  88. }
  89. },
  90. {
  91. "_index": "bank",
  92. "_type": "_doc",
  93. "_id": "126",
  94. "_score": 1,
  95. "_source": {
  96. "account_number": 126,
  97. "balance": 3607,
  98. "firstname": "Effie",
  99. "lastname": "Gates",
  100. "age": 39,
  101. "gender": "F",
  102. "address": "620 National Drive",
  103. "employer": "Digitalus",
  104. "email": "effiegates@digitalus.com",
  105. "city": "Blodgett",
  106. "state": "MD"
  107. }
  108. },
  109. {
  110. "_index": "bank",
  111. "_type": "_doc",
  112. "_id": "145",
  113. "_score": 1,
  114. "_source": {
  115. "account_number": 145,
  116. "balance": 47406,
  117. "firstname": "Rowena",
  118. "lastname": "Wilkinson",
  119. "age": 32,
  120. "gender": "M",
  121. "address": "891 Elton Street",
  122. "employer": "Asimiline",
  123. "email": "rowenawilkinson@asimiline.com",
  124. "city": "Ripley",
  125. "state": "NH"
  126. }
  127. },
  128. {
  129. "_index": "bank",
  130. "_type": "_doc",
  131. "_id": "183",
  132. "_score": 1,
  133. "_source": {
  134. "account_number": 183,
  135. "balance": 14223,
  136. "firstname": "Hudson",
  137. "lastname": "English",
  138. "age": 26,
  139. "gender": "F",
  140. "address": "823 Herkimer Place",
  141. "employer": "Xinware",
  142. "email": "hudsonenglish@xinware.com",
  143. "city": "Robbins",
  144. "state": "ND"
  145. }
  146. },
  147. {
  148. "_index": "bank",
  149. "_type": "_doc",
  150. "_id": "190",
  151. "_score": 1,
  152. "_source": {
  153. "account_number": 190,
  154. "balance": 3150,
  155. "firstname": "Blake",
  156. "lastname": "Davidson",
  157. "age": 30,
  158. "gender": "F",
  159. "address": "636 Diamond Street",
  160. "employer": "Quantasis",
  161. "email": "blakedavidson@quantasis.com",
  162. "city": "Crumpler",
  163. "state": "KY"
  164. }
  165. },
  166. {
  167. "_index": "bank",
  168. "_type": "_doc",
  169. "_id": "208",
  170. "_score": 1,
  171. "_source": {
  172. "account_number": 208,
  173. "balance": 40760,
  174. "firstname": "Garcia",
  175. "lastname": "Hess",
  176. "age": 26,
  177. "gender": "F",
  178. "address": "810 Nostrand Avenue",
  179. "employer": "Quiltigen",
  180. "email": "garciahess@quiltigen.com",
  181. "city": "Brooktrails",
  182. "state": "GA"
  183. }
  184. },
  185. {
  186. "_index": "bank",
  187. "_type": "_doc",
  188. "_id": "222",
  189. "_score": 1,
  190. "_source": {
  191. "account_number": 222,
  192. "balance": 14764,
  193. "firstname": "Rachelle",
  194. "lastname": "Rice",
  195. "age": 36,
  196. "gender": "M",
  197. "address": "333 Narrows Avenue",
  198. "employer": "Enaut",
  199. "email": "rachellerice@enaut.com",
  200. "city": "Wright",
  201. "state": "AZ"
  202. }
  203. }
  204. ]
  205. },
  206. "aggregations": {
  207. "tags": {
  208. "doc_count_error_upper_bound": 0,
  209. "sum_other_doc_count": 0,
  210. "buckets": [
  211. {
  212. "key": "N/A",
  213. "doc_count": 1014
  214. },
  215. {
  216. "key": "red",
  217. "doc_count": 1
  218. }
  219. ]
  220. }
  221. }
  222. }

2.  filter Aggregation  对满足过滤查询的文档进行聚合计算

在查询命中的文档中选取符合过滤条件的文档进行聚合,先过滤再聚合

示例1:

  1. POST /bank/_search?size=0
  2. {
  3. "aggs": {
  4. "age_terms": {
  5. "filter": {"match":{"gender":"F"}},
  6. "aggs": {
  7. "avg_age": {
  8. "avg": {
  9. "field": "age"
  10. }
  11. }
  12. }
  13. }
  14. }
  15. }

结果1:

  1. {
  2. "took": 163,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 1000,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "age_terms": {
  17. "doc_count": 493,
  18. "avg_age": {
  19. "value": 30.3184584178499
  20. }
  21. }
  22. }
  23. }

3. Filters Aggregation  多个过滤组聚合计算

示例1:

准备数据:

  1. PUT /logs/_doc/_bulk?refresh
  2. {"index":{"_id":1}}
  3. {"body":"warning: page could not be rendered"}
  4. {"index":{"_id":2}}
  5. {"body":"authentication error"}
  6. {"index":{"_id":3}}
  7. {"body":"warning: connection timed out"}

获取组合过滤后聚合的结果:

  1. GET logs/_search
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "messages": {
  6. "filters": {
  7. "filters": {
  8. "errors": {
  9. "match": {
  10. "body": "error"
  11. }
  12. },
  13. "warnings": {
  14. "match": {
  15. "body": "warning"
  16. }
  17. }
  18. }
  19. }
  20. }
  21. }
  22. }

上面的结果:

  1. {
  2. "took": 18,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 3,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "messages": {
  17. "buckets": {
  18. "errors": {
  19. "doc_count": 1
  20. },
  21. "warnings": {
  22. "doc_count": 2
  23. }
  24. }
  25. }
  26. }
  27. }

示例2:为其他值组指定key

  1. GET logs/_search
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "messages": {
  6. "filters": {
  7. "other_bucket_key": "other_messages",
  8. "filters": {
  9. "errors": {
  10. "match": {
  11. "body": "error"
  12. }
  13. },
  14. "warnings": {
  15. "match": {
  16. "body": "warning"
  17. }
  18. }
  19. }
  20. }
  21. }
  22. }
  23. }

结果2:

  1. {
  2. "took": 5,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 3,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "messages": {
  17. "buckets": {
  18. "errors": {
  19. "doc_count": 1
  20. },
  21. "warnings": {
  22. "doc_count": 2
  23. },
  24. "other_messages": {
  25. "doc_count": 0
  26. }
  27. }
  28. }
  29. }
  30. }

4. Range Aggregation 范围分组聚合

示例1:

  1. POST /bank/_search?size=0
  2. {
  3. "aggs": {
  4. "age_range": {
  5. "range": {
  6. "field": "age",
  7. "ranges": [
  8. {
  9. "to": 25
  10. },
  11. {
  12. "from": 25,
  13. "to": 35
  14. },
  15. {
  16. "from": 35
  17. }
  18. ]
  19. },
  20. "aggs": {
  21. "bmax": {
  22. "max": {
  23. "field": "balance"
  24. }
  25. }
  26. }
  27. }
  28. }
  29. }

结果1:

  1. {
  2. "took": 7,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 1000,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "age_range": {
  17. "buckets": [
  18. {
  19. "key": "*-25.0",
  20. "to": 25,
  21. "doc_count": 225,
  22. "bmax": {
  23. "value": 49587
  24. }
  25. },
  26. {
  27. "key": "25.0-35.0",
  28. "from": 25,
  29. "to": 35,
  30. "doc_count": 485,
  31. "bmax": {
  32. "value": 49795
  33. }
  34. },
  35. {
  36. "key": "35.0-*",
  37. "from": 35,
  38. "doc_count": 290,
  39. "bmax": {
  40. "value": 49989
  41. }
  42. }
  43. ]
  44. }
  45. }
  46. }

示例2:为组指定key

  1. POST /bank/_search?size=0
  2. {
  3. "aggs": {
  4. "age_range": {
  5. "range": {
  6. "field": "age",
  7. "keyed": true,
  8. "ranges": [
  9. {
  10. "to": 25,
  11. "key": "Ld"
  12. },
  13. {
  14. "from": 25,
  15. "to": 35,
  16. "key": "Md"
  17. },
  18. {
  19. "from": 35,
  20. "key": "Od"
  21. }
  22. ]
  23. }
  24. }
  25. }
  26. }

结果2:

  1. {
  2. "took": 2,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 1000,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "age_range": {
  17. "buckets": {
  18. "Ld": {
  19. "to": 25,
  20. "doc_count": 225
  21. },
  22. "Md": {
  23. "from": 25,
  24. "to": 35,
  25. "doc_count": 485
  26. },
  27. "Od": {
  28. "from": 35,
  29. "doc_count": 290
  30. }
  31. }
  32. }
  33. }
  34. }

5. Date Range Aggregation  时间范围分组聚合

示例1:

  1. POST /bank/_search?size=0
  2. {
  3. "aggs": {
  4. "range": {
  5. "date_range": {
  6. "field": "date",
  7. "format": "MM-yyy",
  8. "ranges": [
  9. {
  10. "to": "now-10M/M"
  11. },
  12. {
  13. "from": "now-10M/M"
  14. }
  15. ]
  16. }
  17. }
  18. }
  19. }

结果1:

  1. {
  2. "took": 115,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 1000,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "range": {
  17. "buckets": [
  18. {
  19. "key": "*-2017-08-01T00:00:00.000Z",
  20. "to": 1501545600000,
  21. "to_as_string": "2017-08-01T00:00:00.000Z",
  22. "doc_count": 0
  23. },
  24. {
  25. "key": "2017-08-01T00:00:00.000Z-*",
  26. "from": 1501545600000,
  27. "from_as_string": "2017-08-01T00:00:00.000Z",
  28. "doc_count": 0
  29. }
  30. ]
  31. }
  32. }
  33. }

6. Date Histogram Aggregation  时间直方图(柱状)聚合

就是按天、月、年等进行聚合统计。可按 year (1y), quarter (1q), month (1M), week (1w), day (1d), hour (1h), minute (1m), second (1s) 间隔聚合或指定的时间间隔聚合。

示例1:

  1. POST /bank/_search?size=0
  2. {
  3. "aggs": {
  4. "sales_over_time": {
  5. "date_histogram": {
  6. "field": "date",
  7. "interval": "month"
  8. }
  9. }
  10. }
  11. }

结果1:

  1. {
  2. "took": 9,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 1000,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "sales_over_time": {
  17. "buckets": []
  18. }
  19. }
  20. }

7. Missing Aggregation  缺失值的桶聚合

  1. POST /bank/_search?size=0
  2. {
  3. "aggs" : {
  4. "account_without_a_age" : {
  5. "missing" : { "field" : "age" }
  6. }
  7. }
  8. }

8. Geo Distance Aggregation  地理距离分区聚合

参考官网链接:

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-geodistance-aggregation.html

elasticsearch系列六:聚合分析(聚合分析简介、指标聚合、桶聚合)的更多相关文章

  1. ES系列十四、ES聚合分析(聚合分析简介、指标聚合、桶聚合)

    一.聚合分析简介 1. ES聚合分析是什么? 聚合分析是数据库中重要的功能特性,完成对一个查询的数据集中数据的聚合计算,如:找出某字段(或计算表达式的结果)的最大值.最小值,计算和.平均值等.ES作为 ...

  2. ElasticSearch实战系列五: ElasticSearch的聚合查询基础使用教程之度量(Metric)聚合

    Title:ElasticSearch实战系列四: ElasticSearch的聚合查询基础使用教程之度量(Metric)聚合 前言 在上上一篇中介绍了ElasticSearch实战系列三: Elas ...

  3. Jmeter5.1——聚合报告参数分析

    Jmeter5.1——聚合报告参数分析 Label: 每个JMeter的element的Name值.例如HTTP Request的Name. Samples:发出请求的数量.如果线程组中配置的是线程数 ...

  4. ElasticSearch实战系列六: Logstash快速入门和实战

    前言 本文主要介绍的是ELK日志系统中的Logstash快速入门和实战 ELK介绍 ELK是三个开源软件的缩写,分别表示:Elasticsearch , Logstash, Kibana , 它们都是 ...

  5. 白日梦的Elasticsearch实战笔记,32个查询案例、15个聚合案例、7个查询优化技巧。

    目录 一.导读 三._search api 搜索api 3.1.什么是query string search? 3.2.什么是query dsl? 3.3.干货!32个查询案例! 四.聚合分析 4.1 ...

  6. SSRS 系列 - 使用带参数的 MDX 查询实现一个分组聚合功能的报表

    SSRS 系列 - 使用带参数的 MDX 查询实现一个分组聚合功能的报表 SSRS 系列 - 使用带参数的 MDX 查询实现一个分组聚合功能的报表 2013-10-09 23:09 by BI Wor ...

  7. 用ElasticSearch搭建自己的搜索和分析引擎

    作者:robben,腾讯高级工程师 商业转载请联系腾讯WeTest获得授权,非商业转载请注明出处. 导语:互联网产品中的检索功能随处可见.当你的项目规模是百度大搜|商搜或者微信公众号搜索这种体量的时候 ...

  8. 用ElasticSearch搭建自己的搜索和分析引擎【转自腾讯Wetest】

    本文大概地介绍了ES的原理,以及Wetest在使用ES中的一些经验总结.因为ES本身涉及的功能和知识点非常广泛,所以这里重点挑出了实际项目中可能会用到,也可能会踩坑的一些关键点进行了阐述. 一 重要概 ...

  9. 爬虫系列(二) Chrome抓包分析

    在这篇文章中,我们将尝试使用直观的网页分析工具(Chrome 开发者工具)对网页进行抓包分析,更加深入的了解网络爬虫的本质与内涵 1.测试环境 浏览器:Chrome 浏览器 浏览器版本:67.0.33 ...

随机推荐

  1. c++中数据表如何转成业务实体--map和结构体的相互转换

    应用场景:如何把数据库表中的一行转换成一个业务实体结构体,c#和java中都有实体框架,表到实体的转换很方便,c++中缺少这些框架,但是有一些折中的办法去做.其实问题的本质是:map如何转成结构体. ...

  2. Mathematica .nb程序运行不下去的原因

    Mathematica是个不错的工具,尤其是其支持交互式参数调整的plot功能,灰常实用.但一直有个烦人的carveat,这里提一下. 在evaluate notebook(.nb)时,一旦碰到了使用 ...

  3. 【Android】HAL分析

    HAL概述 以下是基于android4.0.3,对应其他低版本的代码,可能有所差异,但基本大同小异. Android的HAL是为了保护一些硬件提供商的知识产权而提出的,是为了避开linux的GPL束缚 ...

  4. java只使用try和finally不使用catch的原因和场景

    JDK并发工具包中,很多异常处理都使用了如下的结构,如AbstractExecutorService,即只有try和finally没有catch. class X { private final Re ...

  5. input type= file 如何更改自定义的样式

    input { @include wh(24px,22px);//sass 宽高 @include pa(0,0); //绝对定位 top:0:left:0: opacity: 0; //透明度: o ...

  6. ftell函数使用注意事项

    ftell函数的原型如下: long ftell(FILE *stream); 主要功能是获取FILE指针在当前文件中的位置. 但在使用文本模式打开文件时,ftell函数返回值不一定跟FILE文件指针 ...

  7. 2015-2016款Mac安装win10多分区教程,不破坏GUID分区表。

    原文:https://bbs.feng.com/read-htm-tid-10895240.html 参考:https://bbs.feng.com/read-htm-tid-9940193.html ...

  8. LeetCode: Binary Tree Level Order Traversal 解题报告

    Binary Tree Level Order Traversal Given a binary tree, return the level order traversal of its nodes ...

  9. Python中斐波那契数列的四种写法

    在这些时候,我可以附和着笑,项目经理是决不责备的.而且项目经理见了孔乙己,也每每这样问他,引人发笑.孔乙己自己知道不能和他们谈天,便只好向新人说话.有一回对我说道,“你学过数据结构吗?”我略略点一点头 ...

  10. pandas简单应用

    机器学习离不开数据,数据分析离不开pandas.昨天感受了一下,真的方便.按照一般的使用过程,将pandas的常用方法说明一下. 首先,我们拿到一个excel表,我们将之另存为csv文件.因为文件是实 ...