Solr学习笔记之5、Component(组件)与Handler(处理器)学习

一、搜索篇

拼写检查(spellCheck)

作用:用来检查用户输入的检索内容是否存在,如果不存在则给它提示出相近或相似的内容

配置:在solrconfig.xml中配置如下

  1. <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
  2. <lst name="spellchecker">
  3. <str name="name">default</str>
  4. <!--这里指明需要根据哪个字段的索引为依据进行拼写检查。现配置名为 Title 的字段-->
  5. <str name="field">Title</str>
  6. <!--拼写检查索引的目录-->
  7. <str name="spellcheckIndexDir">spellchecker</str>
  8. <!--当commit的时候,对拼写检查索引进行构建。(只有构建后,拼写检查才有效果)-->
  9. <!--当然,也可以选择在optimize的时候,进行构建。那么只需要将"buildOnCommint"换为 "buildOnOptimize"-->
  10. <str name="buildOnCommit">true</str>
  11. </lst>
  12. </searchComponent>
  13.  
  14. <requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
  15. <!--默认参数-->
  16. <lst name="defaults">
  17. <str name="spellcheck.onlyMorePopular">false</str>
  18. <str name="spellcheck.extendedResults">false</str>
  19. <!--配置拼写检查提示结果的个数(可以根据需要适当加大)-->
  20. <str name="spellcheck.count">1</str>
  21. </lst>
  22. <arr name="last-components">
  23. <str>spellcheck</str>
  24. </arr>
  25. </requestHandler>

举例:

http://localhost:8080/solr/collection1/spell?q=Title:tests&spellcheck=true

请求结果如下图:

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <response>
  3. <lst name="responseHeader">
  4. <int name="status">0</int>
  5. <int name="QTime">119</int>
  6. </lst>
  7. <result name="response" start="0" numFound="0"/>
  8. <lst name="spellcheck">
  9. <lst name="suggestions">
  10. <lst name="tests">
  11. <int name="numFound">1</int>
  12. <int name="startOffset">6</int>
  13. <int name="endOffset">11</int>
  14. <int name="origFreq">0</int>
  15. <arr name="suggestion">
  16. <lst>
  17. <str name="word">test</str>
  18. <int name="freq">6</int>
  19. </lst>
  20. </arr>
  21. </lst>
  22. <bool name="correctlySpelled">false</bool>
  23. <lst name="collation">
  24. <str name="collationQuery">Title:test</str>
  25. <int name="hits">6</int>
  26. <lst name="misspellingsAndCorrections">
  27. <str name="tests">test</str>
  28. </lst>
  29. </lst>
  30. </lst>
  31. </lst>
  32. </response>

检索建议(suggest)

作用:检索建议则是用户输入某个检索条件后,会立刻友好的给出一系列提示内容,并推荐首个出现的相似的词,作为推荐词。如果这个条件想关的东西一个都没有,则不会提示,所以某种意义上来说,可以在用户输入检索条件时使用suggest,而在点击完搜索时,使用拼写检查,二者结合给可以用户带来比较好的用户体验。

配置:在solrconfig.xml中配置如下

  1. <!--搜索建议-->
  2. <searchComponent name="suggest" class="solr.SpellCheckComponent">
  3. <str name="queryAnalyzerFieldType">text</str>
  4. <lst name="spellchecker">
  5. <str name="name">suggest</str>
  6. <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
  7. <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
  8. <str name="field">Title</str>
  9. <float name="threshold">0.0001</float>
  10. <!-- 使用自定义suggest词库词可以将如下两行的注释取消
  11. <str name="sourceLocation">suggest.txt</str>
  12. <str name="spellcheckIndexDir">spellchecker</str>
  13. -->
  14.  
  15. <str name="comparatorClass">freq</str>
  16. <str name="buildOnOptimize">true</str>
  17. <str name="buildOnCommit">true</str>
  18. </lst>
  19. </searchComponent>
  20.  
  21. <requestHandler name="/suggest" class="org.apache.solr.handler.component.SearchHandler">
  22. <lst name="defaults">
  23. <str name="spellcheck">true</str>
  24. <str name="spellcheck.dictionary">suggest</str>
  25. <str name="spellcheck.count">10</str>
  26. <str name="spellcheck.onlyMorePopular">true</str>
  27. <str name="spellcheck.extendedResults">false</str>
  28. <str name="spellcheck.collate">true</str>
  29. <!--<str name="spellcheck.build">true</str> -->
  30. </lst>
  31. <arr name="components">
  32. <str>suggest</str>
  33. </arr>
  34. </requestHandler>

举例:

http://localhost:8080/solr/collection1/suggest?wt=xml&indent=true&spellcheck=true&spellcheck.q=tes

http://localhost:8080/solr/collection1/suggest?q=Title:tes&wt=xml&indent=true

该请求执行结果如下:

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <response>
  3. <lst name="responseHeader">
  4. <int name="status">0</int>
  5. <int name="QTime">1</int>
  6. </lst>
  7. <lst name="spellcheck">
  8. <lst name="suggestions">
  9. <lst name="tes">
  10. <int name="numFound">1</int><int name="startOffset">0</int><int name="endOffset">3</int>-<arr name="suggestion">
  11. <str>test</str>
  12. </arr>
  13. </lst>
  14. <str name="collation">test</str>
  15. </lst>
  16. </lst>
  17. </response>

分层查询(facet)

作用:Facet是solr的高级搜索功能之一,可以给用户提供更友好的搜索体验。在搜索关键字的同时,能够按照Facet的字段进行分组并统计。Facet是Solr默认集成的一个组件。

配置:无需额外配置

特别说明:

1、适宜被Facet的字段

  一般代表了实体的某种公共属性,如商品的分类、商品的制造厂家、书籍的出版商等等。

2、Facet字段的要求

  Facet的字段必须被索引,一般来说该字段无需分词,无需存储。

无需分词是因为该字段的值代表了一个整体概念,另外该字段的值无需进行大小写转换等处理,保持其原貌即可。

无需存储是因为一般而言用户所关心的并不是该字段的具体值,而是作为对查询结果进行分组的一种手段,用户一般会沿着这个分组进一步深入搜索。

3、特殊情况

对于一般查询而言,分词和存储都是必要的。比如CPU类型”Intel 酷睿2双核 P7570”, 拆分成”Intel”,”酷睿”,”P7570”这样一些关键字并分别索引,可能提供更好的搜索体验。但是如果将CPU作为Facet字段,最好不进行分词,这样就造成了矛盾,解决方法为,将CPU字段设置为不分词不存储,然后建立另外一个字段为它的COPY,对这个COPY的字段进行分词和存储。

参数说明:

Field Facet :Facet字段通过在请求中加入facet.field参数加以声明,如果需要对多个字段进行Facet查询,那么将该参数声明多次。

各个Facet字段互不影响,且可以针对每个Facet字段设置查询参数。形式为:f.字段名.参数名=参数值,字段为为空代表应用于所有facet字段

举例:

http://localhost:8080/solr/collection1/select/?q=*:*&indent=on&facet=on&facet.field=ArticleTypeName&facet.field=EditorialOfficeName

该请求执行结果如下:

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <response>
  3. <lst name="responseHeader">
  4. <int name="status">0</int>
  5. <int name="QTime">1</int>
  6. <lst name="params">
  7. <str name="facet">on</str>
  8. <str name="indent">on</str>
  9. <str name="q">ArticleId:5</str>
  10. <arr name="facet.field">
  11. <str>ArticleTypeName</str>
  12. <str>EditorialOfficeName</str>
  13. </arr>
  14. </lst>
  15. </lst>
  16. <result name="response" start="0" numFound="1">
  17. <doc>
  18. <str name="TypeId">2</str>
  19. <str name="ArticleId">5</str>
  20. <bool name="IsDelete">false</bool>
  21. <str name="EditorialOfficeId">2</str>
  22. <str name="Content">content five</str>
  23. <date name="CreateDate">2014-03-24T16:00:00Z</date>
  24. <str name="Title">test title 5 five</str>
  25. <str name="ArticleTypeName">体育</str>
  26. <str name="EditorialOfficeName">燕赵都市报</str>
  27. <long name="_version_">1463443978500177920</long>
  28. </doc>
  29. </result>
  30. <lst name="facet_counts">
  31. <lst name="facet_queries"/>
  32. <lst name="facet_fields">
  33. <lst name="ArticleTypeName">
  34. <int name="体育">1</int>
  35. <int name="财经">0</int>
  36. </lst>
  37. <lst name="EditorialOfficeName">
  38. <int name="燕赵都市报">1</int>
  39. <int name="光明日报">0</int>
  40. <int name="北京晚报">0</int>
  41. </lst>
  42. </lst>
  43. <lst name="facet_dates"/>
  44. <lst name="facet_ranges"/>
  45. </lst>
  46. </response>

Date Facet :Solr为日期字段提供了更为方便的日期查询统计方式,字段的类型必须是DateField(或其子类型)。

需要注意的是使用Date Facet时,字段名、起始时间、结束时间、时间间隔这4个参数都必须提供。

举例:

http://localhost:8080/solr/collection1/select/?q=*:*&indent=on&facet=on&facet.date=CreateDate&facet.date.start=2014-3-10T0:0:0Z&facet.date.end=2014-3-26T0:0:0Z&facet.date.gap=%2B1DAY&facet.date.other=all

该请求执行结果如下:

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <response>
  3. <lst name="responseHeader">
  4. <int name="status">0</int>
  5. <int name="QTime">39</int>
  6. <lst name="params">
  7. <str name="facet.date.start">2014-3-10T0:0:0Z</str>
  8. <str name="facet">on</str>
  9. <str name="indent">on</str>
  10. <str name="q">*:*</str>
  11. <str name="facet.date">CreateDate</str>
  12. <str name="facet.date.other">all</str>
  13. <str name="facet.date.gap">+1DAY</str>
  14. <str name="facet.date.end">2014-3-26T0:0:0Z</str>
  15. </lst>
  16. </lst>
  17. <result name="response" start="0" numFound="6">
  18. <doc>
  19. <str name="TypeId">2</str>
  20. <str name="ArticleId">5</str>
  21. <bool name="IsDelete">false</bool>
  22. <str name="EditorialOfficeId">2</str>
  23. <str name="Content">content five</str>
  24. <date name="CreateDate">2014-03-24T16:00:00Z</date>
  25. <str name="Title">test title 5 five</str>
  26. <str name="ArticleTypeName">体育</str>
  27. <str name="EditorialOfficeName">燕赵都市报</str>
  28. <long name="_version_">1463443978500177920</long>
  29. </doc>
  30. <doc>
  31. <str name="TypeId">2</str>
  32. <str name="ArticleId">6</str>
  33. <bool name="IsDelete">false</bool>
  34. <str name="EditorialOfficeId">3</str>
  35. <str name="Content">content six</str>
  36. <date name="CreateDate">2014-03-25T16:00:00Z</date>
  37. <str name="Title">test title 6 six</str>
  38. <str name="ArticleTypeName">体育</str>
  39. <str name="EditorialOfficeName">北京晚报</str>
  40. <long name="_version_">1463443978552606720</long>
  41. </doc>
  42. <doc>
  43. <str name="TypeId">1</str>
  44. <str name="ArticleId">7</str>
  45. <bool name="IsDelete">false</bool>
  46. <str name="EditorialOfficeId">1</str>
  47. <str name="Content">content seven</str>
  48. <date name="CreateDate">2014-03-26T16:00:00Z</date>
  49. <str name="Title">test title 7 seven</str>
  50. <str name="ArticleTypeName">财经</str>
  51. <str name="EditorialOfficeName">光明日报</str>
  52. <long name="_version_">1463443978554703872</long>
  53. </doc>
  54. <doc>
  55. <str name="TypeId">1</str>
  56. <str name="ArticleId">8</str>
  57. <bool name="IsDelete">false</bool>
  58. <str name="EditorialOfficeId">2</str>
  59. <str name="Content">content eight</str>
  60. <date name="CreateDate">2014-03-27T16:00:00Z</date>
  61. <str name="Title">test title 8 eight</str>
  62. <str name="ArticleTypeName">财经</str>
  63. <str name="EditorialOfficeName">燕赵都市报</str>
  64. <long name="_version_">1463443978556801024</long>
  65. </doc>
  66. <doc>
  67. <str name="TypeId">1</str>
  68. <str name="ArticleId">9</str>
  69. <bool name="IsDelete">false</bool>
  70. <str name="EditorialOfficeId">3</str>
  71. <str name="Content">content nine</str>
  72. <date name="CreateDate">2014-03-28T16:00:00Z</date>
  73. <str name="Title">test title 9 nine</str>
  74. <str name="ArticleTypeName">财经</str>
  75. <str name="EditorialOfficeName">北京晚报</str>
  76. <long name="_version_">1463443978558898176</long>
  77. </doc>
  78. <doc>
  79. <str name="TypeId">1</str>
  80. <str name="ArticleId">10</str>
  81. <bool name="IsDelete">false</bool>
  82. <str name="EditorialOfficeId">2</str>
  83. <str name="Content">content ten</str>
  84. <date name="CreateDate">2014-03-23T16:00:00Z</date>
  85. <str name="Title">test title 10 ten</str>
  86. <str name="ArticleTypeName">财经</str>
  87. <str name="EditorialOfficeName">燕赵都市报</str>
  88. <long name="_version_">1463443978559946752</long>
  89. </doc>
  90. </result>
  91. <lst name="facet_counts">
  92. <lst name="facet_queries"/>
  93. <lst name="facet_fields"/>
  94. <lst name="facet_dates">
  95. <lst name="CreateDate">
  96. <int name="2014-03-10T00:00:00Z">0</int>
  97. <int name="2014-03-11T00:00:00Z">0</int>
  98. <int name="2014-03-12T00:00:00Z">0</int>
  99. <int name="2014-03-13T00:00:00Z">0</int>
  100. <int name="2014-03-14T00:00:00Z">0</int>
  101. <int name="2014-03-15T00:00:00Z">0</int>
  102. <int name="2014-03-16T00:00:00Z">0</int>
  103. <int name="2014-03-17T00:00:00Z">0</int>
  104. <int name="2014-03-18T00:00:00Z">0</int>
  105. <int name="2014-03-19T00:00:00Z">0</int>
  106. <int name="2014-03-20T00:00:00Z">0</int>
  107. <int name="2014-03-21T00:00:00Z">0</int>
  108. <int name="2014-03-22T00:00:00Z">0</int>
  109. <int name="2014-03-23T00:00:00Z">1</int>
  110. <int name="2014-03-24T00:00:00Z">1</int>
  111. <int name="2014-03-25T00:00:00Z">1</int>
  112. <str name="gap">+1DAY</str>
  113. <date name="start">2014-03-10T00:00:00Z</date>
  114. <date name="end">2014-03-26T00:00:00Z</date>
  115. <int name="before">0</int>
  116. <int name="after">3</int>
  117. <int name="between">3</int>
  118. </lst>
  119. </lst>
  120. <lst name="facet_ranges"/>
  121. </lst>
  122. </response>

Facet Query :Facet Query利用类似于filter query的语法提供了更为灵活的Facet,通过facet.query参数,可以对任意字段进行筛选。

举例:

http://localhost:8080/solr/collection1/select/?q=*:*&indent=on&facet=on&facet.query=CreateDate:[2014-3-24T0:0:0Z TO 2014-3-26T0:0:0Z]

该请求执行结果如下:

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <response>
  3. <lst name="responseHeader">
  4. <int name="status">0</int>
  5. <int name="QTime">1</int>
  6. <lst name="params">
  7. <str name="facet">on</str>
  8. <str name="indent">on</str>
  9. <str name="facet.query">CreateDate:[2014-3-24T0:0:0Z TO 2014-3-26T0:0:0Z]</str>
  10. <str name="q">*:*</str>
  11. </lst>
  12. </lst>
  13. <result name="response" start="0" numFound="6">
  14. <doc>
  15. <str name="TypeId">2</str>
  16. <str name="ArticleId">5</str>
  17. <bool name="IsDelete">false</bool>
  18. <str name="EditorialOfficeId">2</str>
  19. <str name="Content">content five</str>
  20. <date name="CreateDate">2014-03-24T16:00:00Z</date>
  21. <str name="Title">test title 5 five</str>
  22. <str name="ArticleTypeName">体育</str>
  23. <str name="EditorialOfficeName">燕赵都市报</str>
  24. <long name="_version_">1463443978500177920</long>
  25. </doc>
  26. <doc>
  27. <str name="TypeId">2</str>
  28. <str name="ArticleId">6</str>
  29. <bool name="IsDelete">false</bool>
  30. <str name="EditorialOfficeId">3</str>
  31. <str name="Content">content six</str>
  32. <date name="CreateDate">2014-03-25T16:00:00Z</date>
  33. <str name="Title">test title 6 six</str>
  34. <str name="ArticleTypeName">体育</str>
  35. <str name="EditorialOfficeName">北京晚报</str>
  36. <long name="_version_">1463443978552606720</long>
  37. </doc>
  38. <doc>
  39. <str name="TypeId">1</str>
  40. <str name="ArticleId">7</str>
  41. <bool name="IsDelete">false</bool>
  42. <str name="EditorialOfficeId">1</str>
  43. <str name="Content">content seven</str>
  44. <date name="CreateDate">2014-03-26T16:00:00Z</date>
  45. <str name="Title">test title 7 seven</str>
  46. <str name="ArticleTypeName">财经</str>
  47. <str name="EditorialOfficeName">光明日报</str>
  48. <long name="_version_">1463443978554703872</long>
  49. </doc>
  50. <doc>
  51. <str name="TypeId">1</str>
  52. <str name="ArticleId">8</str>
  53. <bool name="IsDelete">false</bool>
  54. <str name="EditorialOfficeId">2</str>
  55. <str name="Content">content eight</str>
  56. <date name="CreateDate">2014-03-27T16:00:00Z</date>
  57. <str name="Title">test title 8 eight</str>
  58. <str name="ArticleTypeName">财经</str>
  59. <str name="EditorialOfficeName">燕赵都市报</str>
  60. <long name="_version_">1463443978556801024</long>
  61. </doc>
  62. <doc>
  63. <str name="TypeId">1</str>
  64. <str name="ArticleId">9</str>
  65. <bool name="IsDelete">false</bool>
  66. <str name="EditorialOfficeId">3</str>
  67. <str name="Content">content nine</str>
  68. <date name="CreateDate">2014-03-28T16:00:00Z</date>
  69. <str name="Title">test title 9 nine</str>
  70. <str name="ArticleTypeName">财经</str>
  71. <str name="EditorialOfficeName">北京晚报</str>
  72. <long name="_version_">1463443978558898176</long>
  73. </doc>
  74. <doc>
  75. <str name="TypeId">1</str>
  76. <str name="ArticleId">10</str>
  77. <bool name="IsDelete">false</bool>
  78. <str name="EditorialOfficeId">2</str>
  79. <str name="Content">content ten</str>
  80. <date name="CreateDate">2014-03-23T16:00:00Z</date>
  81. <str name="Title">test title 10 ten</str>
  82. <str name="ArticleTypeName">财经</str>
  83. <str name="EditorialOfficeName">燕赵都市报</str>
  84. <long name="_version_">1463443978559946752</long>
  85. </doc>
  86. </result>
  87. <lst name="facet_counts">
  88. <lst name="facet_queries">
  89. <int name="CreateDate:[2014-3-24T0:0:0Z TO 2014-3-26T0:0:0Z]">2</int>
  90. </lst>
  91. <lst name="facet_fields"/>
  92. <lst name="facet_dates"/>
  93. <lst name="facet_ranges"/>
  94. </lst>
  95. </response>

Range Facet 举例:范围查询统计

http://localhost:8080/solr/collection1/select/?q=*:*&indent=on&facet=on&facet.range=CreateDate&facet.range.start=2014-03-24T16:00:00Z&facet.range.end=2014-03-26T16:00:00Z&facet.range.gap=%2B1DAY

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <response>
  3. <lst name="responseHeader">
  4. <int name="status">0</int>
  5. <int name="QTime">2</int>
  6. <lst name="params">
  7. <str name="facet">on</str>
  8. <str name="indent">on</str>
  9. <str name="q">*:*</str>
  10. <str name="facet.range.start">2014-03-24T16:00:00Z</str>
  11. <str name="facet.range">CreateDate</str>
  12. <str name="facet.range.gap">+1DAY</str>
  13. <str name="facet.range.end">2014-03-26T16:00:00Z</str>
  14. </lst>
  15. </lst>
  16. <result name="response" start="0" numFound="6">
  17. <doc>
  18. <str name="TypeId">2</str>
  19. <str name="ArticleId">5</str>
  20. <bool name="IsDelete">false</bool>
  21. <str name="EditorialOfficeId">2</str>
  22. <str name="Content">content five</str>
  23. <date name="CreateDate">2014-03-24T16:00:00Z</date>
  24. <str name="Title">test title 5 five</str>
  25. <str name="ArticleTypeName">体育</str>
  26. <str name="EditorialOfficeName">燕赵都市报</str>
  27. <long name="_version_">1463443978500177920</long>
  28. </doc>
  29. <doc>
  30. <str name="TypeId">2</str>
  31. <str name="ArticleId">6</str>
  32. <bool name="IsDelete">false</bool>
  33. <str name="EditorialOfficeId">3</str>
  34. <str name="Content">content six</str>
  35. <date name="CreateDate">2014-03-25T16:00:00Z</date>
  36. <str name="Title">test title 6 six</str>
  37. <str name="ArticleTypeName">体育</str>
  38. <str name="EditorialOfficeName">北京晚报</str>
  39. <long name="_version_">1463443978552606720</long>
  40. </doc>
  41. <doc>
  42. <str name="TypeId">1</str>
  43. <str name="ArticleId">7</str>
  44. <bool name="IsDelete">false</bool>
  45. <str name="EditorialOfficeId">1</str>
  46. <str name="Content">content seven</str>
  47. <date name="CreateDate">2014-03-26T16:00:00Z</date>
  48. <str name="Title">test title 7 seven</str>
  49. <str name="ArticleTypeName">财经</str>
  50. <str name="EditorialOfficeName">光明日报</str>
  51. <long name="_version_">1463443978554703872</long>
  52. </doc>
  53. <doc>
  54. <str name="TypeId">1</str>
  55. <str name="ArticleId">8</str>
  56. <bool name="IsDelete">false</bool>
  57. <str name="EditorialOfficeId">2</str>
  58. <str name="Content">content eight</str>
  59. <date name="CreateDate">2014-03-27T16:00:00Z</date>
  60. <str name="Title">test title 8 eight</str>
  61. <str name="ArticleTypeName">财经</str>
  62. <str name="EditorialOfficeName">燕赵都市报</str>
  63. <long name="_version_">1463443978556801024</long>
  64. </doc>
  65. <doc>
  66. <str name="TypeId">1</str>
  67. <str name="ArticleId">9</str>
  68. <bool name="IsDelete">false</bool>
  69. <str name="EditorialOfficeId">3</str>
  70. <str name="Content">content nine</str>
  71. <date name="CreateDate">2014-03-28T16:00:00Z</date>
  72. <str name="Title">test title 9 nine</str>
  73. <str name="ArticleTypeName">财经</str>
  74. <str name="EditorialOfficeName">北京晚报</str>
  75. <long name="_version_">1463443978558898176</long>
  76. </doc>
  77. <doc>
  78. <str name="TypeId">1</str>
  79. <str name="ArticleId">10</str>
  80. <bool name="IsDelete">false</bool>
  81. <str name="EditorialOfficeId">2</str>
  82. <str name="Content">content ten</str>
  83. <date name="CreateDate">2014-03-23T16:00:00Z</date>
  84. <str name="Title">test title 10 ten</str>
  85. <str name="ArticleTypeName">财经</str>
  86. <str name="EditorialOfficeName">燕赵都市报</str>
  87. <long name="_version_">1463443978559946752</long>
  88. </doc>
  89. </result>
  90. <lst name="facet_counts">
  91. <lst name="facet_queries"/>
  92. <lst name="facet_fields"/>
  93. <lst name="facet_dates"/>
  94. <lst name="facet_ranges">
  95. <lst name="CreateDate">
  96. <lst name="counts">
  97. <int name="2014-03-24T16:00:00Z">1</int>
  98. <int name="2014-03-25T16:00:00Z">1</int>
  99. </lst>
  100. <str name="gap">+1DAY</str>
  101. <date name="start">2014-03-24T16:00:00Z</date>
  102. <date name="end">2014-03-26T16:00:00Z</date>
  103. </lst>
  104. </lst>
  105. </lst>
  106. </response>

分组统计:

分组示例(group):

http://localhost:8080/solr/collection1/select?q=*:*&wt=xml&indent=true&group=true&group.field=TypeId&group.ngroups=true

统计示例(stats):

httphttp://localhost:8080/solr/select?q=*:*&stats=true&stats.field=Price&rows=10&indent=true

该请求执行结果如下:

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <response>
  3. <lst name="responseHeader">
  4. <int name="status">0</int>
  5. <int name="QTime">32</int>
  6. <lst name="params">
  7. <str name="indent">true</str>
  8. <str name="stats.field">Price</str>
  9. <str name="stats">true</str>
  10. <str name="q">*:*</str>
  11. <str name="rows">10</str>
  12. </lst>
  13. </lst>
  14. <result name="response" start="0" numFound="5">
  15. <doc>
  16. <str name="TypeId">2</str>
  17. <str name="ArticleId">6</str>
  18. <double name="Price">6.0</double>
  19. <bool name="IsDelete">false</bool>
  20. <str name="EditorialOfficeId">3</str>
  21. <str name="Content">content six</str>
  22. <date name="CreateDate">2014-03-25T16:00:00Z</date>
  23. <str name="Title">test title 6 six</str>
  24. <str name="ArticleTypeName">体育</str>
  25. <str name="EditorialOfficeName">北京晚报</str>
  26. <long name="_version_">1463715628722421760</long>
  27. </doc>
  28. <doc>
  29. <str name="TypeId">1</str>
  30. <str name="ArticleId">7</str>
  31. <double name="Price">7.0</double>
  32. <bool name="IsDelete">false</bool>
  33. <str name="EditorialOfficeId">1</str>
  34. <str name="Content">content seven</str>
  35. <date name="CreateDate">2014-03-26T16:00:00Z</date>
  36. <str name="Title">test title 7 seven</str>
  37. <str name="ArticleTypeName">财经</str>
  38. <str name="EditorialOfficeName">光明日报</str>
  39. <long name="_version_">1463715628782190592</long>
  40. </doc>
  41. <doc>
  42. <str name="TypeId">1</str>
  43. <str name="ArticleId">8</str>
  44. <double name="Price">8.0</double>
  45. <bool name="IsDelete">false</bool>
  46. <str name="EditorialOfficeId">2</str>
  47. <str name="Content">content eight</str>
  48. <date name="CreateDate">2014-03-27T16:00:00Z</date>
  49. <str name="Title">test title 8 eight</str>
  50. <str name="ArticleTypeName">财经</str>
  51. <str name="EditorialOfficeName">燕赵都市报</str>
  52. <long name="_version_">1463715628784287744</long>
  53. </doc>
  54. <doc>
  55. <str name="TypeId">1</str>
  56. <str name="ArticleId">9</str>
  57. <double name="Price">9.0</double>
  58. <bool name="IsDelete">false</bool>
  59. <str name="EditorialOfficeId">3</str>
  60. <str name="Content">content nine</str>
  61. <date name="CreateDate">2014-03-28T16:00:00Z</date>
  62. <str name="Title">test title 9 nine</str>
  63. <str name="ArticleTypeName">财经</str>
  64. <str name="EditorialOfficeName">北京晚报</str>
  65. <long name="_version_">1463715628786384896</long>
  66. </doc>
  67. <doc>
  68. <str name="TypeId">1</str>
  69. <str name="ArticleId">10</str>
  70. <double name="Price">10.0</double>
  71. <bool name="IsDelete">false</bool>
  72. <str name="EditorialOfficeId">2</str>
  73. <str name="Content">content ten</str>
  74. <date name="CreateDate">2014-03-23T16:00:00Z</date>
  75. <str name="Title">test title 10 ten</str>
  76. <str name="ArticleTypeName">财经</str>
  77. <str name="EditorialOfficeName">燕赵都市报</str>
  78. <long name="_version_">1463715628788482048</long>
  79. </doc>
  80. </result>
  81. <lst name="stats">
  82. <lst name="stats_fields">
  83. <lst name="Price">
  84. <double name="min">6.0</double>
  85. <double name="max">10.0</double>
  86. <long name="count">5</long>
  87. <long name="missing">0</long>
  88. <double name="sum">40.0</double>
  89. <double name="sumOfSquares">330.0</double>
  90. <double name="mean">8.0</double>
  91. <double name="stddev">1.5811388300841898</double>
  92. <lst name="facets"/>
  93. </lst>
  94. </lst>
  95. </lst>
  96. </response>

注:统计字段应为数字类型,如果为字符串类型则统计结果不全。

自动聚合(clustering)

作用:能够把检索到的内容自动分类。

配置:在solrconfig.xml中配置如下

  1. <config>
  2. <searchComponent name="clustering"
  3. enable="${solr.clustering.enabled:true}"
  4. class="solr.clustering.ClusteringComponent" >
  5. <lst name="engine">
  6. <str name="name">lingo</str>
  7. <str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>
  8. <str name="carrot.resourcesDir">clustering/carrot2</str>
  9. </lst>
  10.  
  11. <!-- An example definition for the STC clustering algorithm. -->
  12. <lst name="engine">
  13. <str name="name">stc</str>
  14. <str name="carrot.algorithm">org.carrot2.clustering.stc.STCClusteringAlgorithm</str>
  15. </lst>
  16.  
  17. <!-- An example definition for the bisecting kmeans clustering algorithm. -->
  18. <lst name="engine">
  19. <str name="name">kmeans</str>
  20. <str name="carrot.algorithm">org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm</str>
  21. </lst>
  22. </searchComponent>
  23.  
  24. <requestHandler name="/clustering"
  25. startup="lazy"
  26. enable="${solr.clustering.enabled:true}"
  27. class="solr.SearchHandler">
  28. <lst name="defaults">
  29. <bool name="clustering">true</bool>
  30. <bool name="clustering.results">true</bool>
  31. <str name="carrot.title">name</str>
  32. <str name="carrot.url">id</str>
  33. <str name="carrot.snippet">features</str>
  34. <bool name="carrot.produceSummary">true</bool>
  35. <bool name="carrot.outputSubClusters">false</bool>
  36. <str name="defType">edismax</str>
  37. <str name="qf">
  38. text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
  39. </str>
  40. <str name="q.alt">*:*</str>
  41. <str name="rows">10</str>
  42. <str name="fl">*,score</str>
  43. </lst>
  44. <arr name="last-components">
  45. <str>clustering</str>
  46. </arr>
  47. </requestHandler>
  48. </config>

举例:

http://localhost:8080/solr/clustering?q=*:*&rows=10&LingoClusteringAlgorithm.desiredClusterCountBase=20

该请求执行结果如下:

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <response>
  3. <lst name="responseHeader">
  4. <int name="status">0</int>
  5. <int name="QTime">12</int>
  6. </lst>
  7. <result name="response" start="0" numFound="6" maxScore="0.42292467">
  8. <doc>
  9. <str name="TypeId">2</str>
  10. <str name="ArticleId">5</str>
  11. <bool name="IsDelete">false</bool>
  12. <str name="EditorialOfficeId">2</str>
  13. <str name="Content">content five</str>
  14. <date name="CreateDate">2014-03-24T16:00:00Z</date>
  15. <str name="Title">test title 5 five</str>
  16. <str name="ArticleTypeName">体育</str>
  17. <str name="EditorialOfficeName">燕赵都市报</str>
  18. <long name="_version_">1463443978500177920</long>
  19. <float name="score">0.42292467</float>
  20. </doc>
  21. <doc>
  22. <str name="TypeId">2</str>
  23. <str name="ArticleId">6</str>
  24. <bool name="IsDelete">false</bool>
  25. <str name="EditorialOfficeId">3</str>
  26. <str name="Content">content six</str>
  27. <date name="CreateDate">2014-03-25T16:00:00Z</date>
  28. <str name="Title">test title 6 six</str>
  29. <str name="ArticleTypeName">体育</str>
  30. <str name="EditorialOfficeName">北京晚报</str>
  31. <long name="_version_">1463443978552606720</long>
  32. <float name="score">0.42292467</float>
  33. </doc>
  34. <doc>
  35. <str name="TypeId">1</str>
  36. <str name="ArticleId">7</str>
  37. <bool name="IsDelete">false</bool>
  38. <str name="EditorialOfficeId">1</str>
  39. <str name="Content">content seven</str>
  40. <date name="CreateDate">2014-03-26T16:00:00Z</date>
  41. <str name="Title">test title 7 seven</str>
  42. <str name="ArticleTypeName">财经</str>
  43. <str name="EditorialOfficeName">光明日报</str>
  44. <long name="_version_">1463443978554703872</long>
  45. <float name="score">0.42292467</float>
  46. </doc>
  47. <doc>
  48. <str name="TypeId">1</str>
  49. <str name="ArticleId">8</str>
  50. <bool name="IsDelete">false</bool>
  51. <str name="EditorialOfficeId">2</str>
  52. <str name="Content">content eight</str>
  53. <date name="CreateDate">2014-03-27T16:00:00Z</date>
  54. <str name="Title">test title 8 eight</str>
  55. <str name="ArticleTypeName">财经</str>
  56. <str name="EditorialOfficeName">燕赵都市报</str>
  57. <long name="_version_">1463443978556801024</long>
  58. <float name="score">0.42292467</float>
  59. </doc>
  60. <doc>
  61. <str name="TypeId">1</str>
  62. <str name="ArticleId">9</str>
  63. <bool name="IsDelete">false</bool>
  64. <str name="EditorialOfficeId">3</str>
  65. <str name="Content">content nine</str>
  66. <date name="CreateDate">2014-03-28T16:00:00Z</date>
  67. <str name="Title">test title 9 nine</str>
  68. <str name="ArticleTypeName">财经</str>
  69. <str name="EditorialOfficeName">北京晚报</str>
  70. <long name="_version_">1463443978558898176</long>
  71. <float name="score">0.42292467</float>
  72. </doc>
  73. <doc>
  74. <str name="TypeId">1</str>
  75. <str name="ArticleId">10</str>
  76. <bool name="IsDelete">false</bool>
  77. <str name="EditorialOfficeId">2</str>
  78. <str name="Content">content ten</str>
  79. <date name="CreateDate">2014-03-23T16:00:00Z</date>
  80. <str name="Title">test title 10 ten</str>
  81. <str name="ArticleTypeName">财经</str>
  82. <str name="EditorialOfficeName">燕赵都市报</str>
  83. <long name="_version_">1463443978559946752</long>
  84. <float name="score">0.42292467</float>
  85. </doc>
  86. </result>
  87. <arr name="clusters">
  88. <lst>
  89. <arr name="labels">
  90. <str>Other Topics</str>
  91. </arr>
  92. <double name="score">0.0</double>
  93. <bool name="other-topics">true</bool>
  94. <arr name="docs">
  95. <str>5</str>
  96. <str>6</str>
  97. <str>7</str>
  98. <str>8</str>
  99. <str>9</str>
  100. <str>10</str>
  101. </arr>
  102. </lst>
  103. </arr>
  104. </response>

注意事项:

使用该功能需要在%solr_home%/lib目录下添加扩展包:

从下载的solr项目中将

dist/apache-solr-clustering-*.jar,

contrib/clustering目录下的所有jar包,

contrib/clustering/downloads 目录下的所有jar包

加入到%solr_home%/lib中。

简单方法:直接拷贝源码中 dist 及 contrib 文件夹到 %solr_home%/collection1\conf中即可。

相似匹配(MoreLikeThis)

作用:查找相似的document

配置:在solrconfig.xml中配置如下

  1. <!--相似查询-->
  2. <requestHandler name="/mlt" class="solr.MoreLikeThisHandler">
  3. </requestHandler>

参数说明:
mlt:在查询时,打开/关闭 MoreLikeThisComponent 的布尔值。 (true|false)
mlt.count:可选。每一个结果要检索的相似文档数。 (> 0)
mlt.fl:用于创建 MLT 查询的字段。 模式中任何被储存的或含有检索词向量的字段。
mlt.maxqt:可选。查询词语的最大数量。由于长文档可能会有很多关键词语,这样 MLT 查询可能会很大,从而导致反应缓慢或可怕的 TooManyClausesException,该参数只保留最关键的词语。 (> 0)

举例:

http://localhost:8080/solr/mlt?q=ArticleId:5&mlt.true&mlt.fl=Title&mlt.mintf=1&mlt.mindf=1

该请求的意思是查找 ArticleId为 5 的 document ,然后返回与此 document 在 Title 字段上相似的其他 document。需要注意的是 mlt.fl 中的 field 的 termVector=true 才有效果

该请求执行结果如下:

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <response>
  3. <lst name="responseHeader">
  4. <int name="status">0</int>
  5. <int name="QTime">34</int>
  6. </lst>
  7. <result name="match" start="0" numFound="1">
  8. <doc>
  9. <str name="TypeId">2</str>
  10. <str name="ArticleId">5</str>
  11. <bool name="IsDelete">false</bool>
  12. <str name="EditorialOfficeId">2</str>
  13. <str name="Content">content five</str>
  14. <date name="CreateDate">2014-03-24T16:00:00Z</date>
  15. <str name="Title">test title 5 five</str>
  16. <str name="ArticleTypeName">体育</str>
  17. <str name="EditorialOfficeName">燕赵都市报</str>
  18. <long name="_version_">1463443978500177920</long>
  19. </doc>
  20. </result>
  21. <result name="response" start="0" numFound="5">
  22. <doc>
  23. <str name="TypeId">2</str>
  24. <str name="ArticleId">6</str>
  25. <bool name="IsDelete">false</bool>
  26. <str name="EditorialOfficeId">3</str>
  27. <str name="Content">content six</str>
  28. <date name="CreateDate">2014-03-25T16:00:00Z</date>
  29. <str name="Title">test title 6 six</str>
  30. <str name="ArticleTypeName">体育</str>
  31. <str name="EditorialOfficeName">北京晚报</str>
  32. <long name="_version_">1463443978552606720</long>
  33. </doc>
  34. <doc>
  35. <str name="TypeId">1</str>
  36. <str name="ArticleId">7</str>
  37. <bool name="IsDelete">false</bool>
  38. <str name="EditorialOfficeId">1</str>
  39. <str name="Content">content seven</str>
  40. <date name="CreateDate">2014-03-26T16:00:00Z</date>
  41. <str name="Title">test title 7 seven</str>
  42. <str name="ArticleTypeName">财经</str>
  43. <str name="EditorialOfficeName">光明日报</str>
  44. <long name="_version_">1463443978554703872</long>
  45. </doc>
  46. <doc>
  47. <str name="TypeId">1</str>
  48. <str name="ArticleId">8</str>
  49. <bool name="IsDelete">false</bool>
  50. <str name="EditorialOfficeId">2</str>
  51. <str name="Content">content eight</str>
  52. <date name="CreateDate">2014-03-27T16:00:00Z</date>
  53. <str name="Title">test title 8 eight</str>
  54. <str name="ArticleTypeName">财经</str>
  55. <str name="EditorialOfficeName">燕赵都市报</str>
  56. <long name="_version_">1463443978556801024</long>
  57. </doc>
  58. <doc>
  59. <str name="TypeId">1</str>
  60. <str name="ArticleId">9</str>
  61. <bool name="IsDelete">false</bool>
  62. <str name="EditorialOfficeId">3</str>
  63. <str name="Content">content nine</str>
  64. <date name="CreateDate">2014-03-28T16:00:00Z</date>
  65. <str name="Title">test title 9 nine</str>
  66. <str name="ArticleTypeName">财经</str>
  67. <str name="EditorialOfficeName">北京晚报</str>
  68. <long name="_version_">1463443978558898176</long>
  69. </doc>
  70. <doc>
  71. <str name="TypeId">1</str>
  72. <str name="ArticleId">10</str>
  73. <bool name="IsDelete">false</bool>
  74. <str name="EditorialOfficeId">2</str>
  75. <str name="Content">content ten</str>
  76. <date name="CreateDate">2014-03-23T16:00:00Z</date>
  77. <str name="Title">test title 10 ten</str>
  78. <str name="ArticleTypeName">财经</str>
  79. <str name="EditorialOfficeName">燕赵都市报</str>
  80. <long name="_version_">1463443978559946752</long>
  81. </doc>
  82. </result>
  83. </response>

高亮显示

作用:将结果中与搜索关键词匹配的地方高亮显示。

配置:无需额外配置

参数说明:

hl 是否启用高亮显示 (true|false)

hl.fl 要进行高亮显示的字段,如需对多个字段显示用逗号分隔(hl.fl=name,name2,name3)

hl.simple.pre 高亮显示前缀标签 (默认<em>)

hl.simple.post 高亮显示后缀标签(默认</em>)

举例:

http://localhost:8080/solr/select?q=ArticleId:9&start=0&rows=10&hl=true&hl.fl=Title

该请求执行结果如下:

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <response>
  3. <lst name="responseHeader">
  4. <int name="status">0</int>
  5. <int name="QTime">2</int>
  6. <lst name="params">
  7. <str name="start">0</str>
  8. <str name="q">ArticleId:9</str>
  9. <str name="hl.fl">Title</str>
  10. <str name="hl">true</str>
  11. <str name="rows">10</str>
  12. </lst>
  13. </lst>
  14. <result name="response" start="0" numFound="1">
  15. <doc>
  16. <str name="TypeId">1</str>
  17. <str name="ArticleId">9</str>
  18. <bool name="IsDelete">false</bool>
  19. <str name="EditorialOfficeId">3</str>
  20. <str name="Content">content nine</str>
  21. <date name="CreateDate">2014-03-28T16:00:00Z</date>
  22. <str name="Title">test title 9 nine</str>
  23. <str name="ArticleTypeName">财经</str>
  24. <str name="EditorialOfficeName">北京晚报</str>
  25. <long name="_version_">1463443978558898176</long>
  26. </doc>
  27. </result>
  28. <lst name="highlighting">
  29. <lst name="9">
  30. <arr name="Title">
  31. <str>
  32. test title <em>9</em> nine
  33. </str>
  34. </arr>
  35. </lst>
  36. </lst>
  37. </response>

二、索引篇

更新索引(update)

Solr学习笔记之5、Component(组件)与Handler(处理器)学习的更多相关文章

  1. C#学习笔记——面向对象、面向组件以及类型基础

    C#学习笔记——面向对象.面向组件以及类型基础 目录 一 面向对象与面向组件 二 基元类型与 new 操作 三 值类型与引用类型 四 类型转换 五 相等性与同一性 六 对象哈希码 一 面向对象与面向组 ...

  2. amazeui学习笔记--css(常用组件16)--文章页Article

    amazeui学习笔记--css(常用组件16)--文章页Article 一.总结 1.基本使用:文章内容页的排版样式,包括标题.文章元信息.分隔线等样式. .am-article 文章内容容器 .a ...

  3. amazeui学习笔记--css(常用组件15)--CSS动画Animation

    amazeui学习笔记--css(常用组件15)--CSS动画Animation 一.总结 1.css3动画封装:CSS3 动画封装,浏览器需支持 CSS3 动画. Class 描述 .am-anim ...

  4. amazeui学习笔记--css(常用组件14)--缩略图Thumbnail

    amazeui学习笔记--css(常用组件14)--缩略图Thumbnail 一.总结 1.基本样式:在 <img> 添加 .am-thumbnail 类:也可以在 <img> ...

  5. amazeui学习笔记--css(常用组件13)--进度条Progress

    amazeui学习笔记--css(常用组件13)--进度条Progress 一.总结 1.进度条基本使用:进度条组件,.am-progress 为容器,.am-progress-bar 为进度显示信息 ...

  6. amazeui学习笔记--css(常用组件12)--面板Panel

    amazeui学习笔记--css(常用组件12)--面板Panel 一.总结 1.面板基本样式:默认的 .am-panel 提供基本的阴影和边距,默认边框添加 .am-panel-default,内容 ...

  7. amazeui学习笔记--css(常用组件11)--分页Pagination

    amazeui学习笔记--css(常用组件11)--分页Pagination 一.总结 1.分页使用:还是ul包li的形式: 分页组件,<ul> / <ol> 添加 .am-p ...

  8. amazeui学习笔记--css(常用组件10)--导航条Topbar

    amazeui学习笔记--css(常用组件10)--导航条Topbar 一.总结 1. 导航条:就是页面最顶端的导航条:在容器上添加 .am-topbar class,然后按照示例组织所需内容.< ...

  9. amazeui学习笔记--css(常用组件9)--导航nav

    amazeui学习笔记--css(常用组件9)--导航nav 一.总结 1.导航基本使用:<ul> 添加 .am-nav class 以后就是一个基本的垂直导航.默认样式中并没有限定导航的 ...

  10. amazeui学习笔记--css(常用组件8)--列表list

    amazeui学习笔记--css(常用组件8)--列表list 一.总结 1.链接列表:就是多个链接在一起组成的列表, 使用 <ul> 结构嵌套链接列表,添加 .am-list.还是ui包 ...

随机推荐

  1. MTK 隐藏通知栏

     步骤: 源码/frameworks/base/packages/SystemUI/src/com/android/systemui/statusbar/phone/PhoneStatusBarVie ...

  2. iOS中js与objective-c的交互(转)

    因为在iOS中没有WebKit.Framework这个库的,所以也就没有 windowScriptObject对象方法了.要是有这个的方法的话 就方便多了,(ps:MacOS中有貌似) 现在我们利用其 ...

  3. 5 -- Hibernate的基本用法 --1 4 Hibernate概述

    Hibernate 不仅仅管理Java类到数据库的映射(包括Java数据类型到SQL数据类型的映射),还提供数据查询和获取数据的方法,可以大幅度减少开发时人工使用SQL和JDBC处理数据的时间.

  4. 使用gradle 编译生成 apk出现的问题

    首先出现的问题是:  Failed to read key from keystore 是我的Key Alias 填写错了, 还有一种可能就是真的把key放错位置了

  5. 绑定方式开始服务&调用服务的方法

    1.编写activity_main.xml <LinearLayout xmlns:android="http://schemas.android.com/apk/res/androi ...

  6. 使用 requests 维持会话

    什么是 Cookie 和 Session: 简单来说,我们访问每一个互联网页面,都是通过 HTTP 协议进行的,而 HTTP 协议是一个无状态协议,所谓的无状态协议即无法维持会话之间的状态.比如,仅使 ...

  7. Oracle的闪回技术--闪回已删除的表

    注意闪回技术只能保护非系统表决空间中的表,而且表空间必须本地管理, 外键将不可以被恢复, 索引和约束的名字将会被命名为以BIN开头,由系统生成的名字 查看是否开启闪回: SQL> show pa ...

  8. MQTT_DEMO

    1 /* 2 Copyright (c) 2009-2012 Roger Light <roger@atchoo.org> 3 All rights reserved. 4 5 Redis ...

  9. 编写java的时候出现“编码GBK的不可映射字符”

    今天在编写文件的时候,使用 javac ***.java 但是java文件里面会出现一些中文的信息,So:会报错 方法: 加参数-encoding UTF-8 例如:javac -encodig UT ...

  10. Java网络编程之查找Internet地址

    一.概述 连接到Internet上计算机都有一个称为Internet地址或IP地址的唯一的数来标识.由于IP很难记住,人们设计了域名系统(DNS),DNS可以将人们可以记忆的主机名与计算机可以记忆的I ...