Solr4.8.0源码分析(13)之LuceneCore的索引修复

题记:今天在公司研究elasticsearch,突然看到一篇博客说elasticsearch具有索引修复功能,顿感好奇,于是点进去看了下,发现原来是Lucene Core自带的功能。说实话之前学习Lucene文件格式的时候就想做一个索引文件解析和检测的工具,也动手写了一部分,最后没想到发现了一个已有的工具,正好对照着学习下。

索引的修复主要是用到CheckIndex.java这个类,可以直接查看类的Main函数来了解下。

1. CheckIndex的使用

首先使用以下命令来查看lucenecore.jar怎么使用:

  1. :lib rcf$ java -cp lucene-core-4.8-SNAPSHOT.jar -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex
  2.  
  3. ERROR: index path not specified
  4.  
  5. Usage: java org.apache.lucene.index.CheckIndex pathToIndex [-fix] [-crossCheckTermVectors] [-segment X] [-segment Y] [-dir-impl X]
  6.  
  7. -fix: actually write a new segments_N file, removing any problematic segments
  8. -crossCheckTermVectors: verifies that term vectors match postings; THIS IS VERY SLOW!
  9. -codec X: when fixing, codec to write the new segments_N file with
  10. -verbose: print additional details
  11. -segment X: only check the specified segments. This can be specified multiple
  12. times, to check more than one segment, eg '-segment _2 -segment _a'.
  13. You can't use this with the -fix option
  14. -dir-impl X: use a specific FSDirectory implementation. If no package is specified the org.apache.lucene.store package will be used.
  15.  
  16. **WARNING**: -fix should only be used on an emergency basis as it will cause
  17. documents (perhaps many) to be permanently removed from the index. Always make
  18. a backup copy of your index before running this! Do not run this tool on an index
  19. that is actively being written to. You have been warned!
  20.  
  21. Run without -fix, this tool will open the index, report version information
  22. and report any exceptions it hits and what action it would take if -fix were
  23. specified. With -fix, this tool will remove any segments that have issues and
  24. write a new segments_N file. This means all documents contained in the affected
  25. segments will be removed.
  26.  
  27. This tool exits with exit code if the index cannot be opened or has any
  28. corruption, else .

当敲java -cp lucene-core-4.8-SNAPSHOT.jar -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex 这个就能看到相当于help的信息啦,但是为什么这里用这么一串奇怪的命令呢?通过java -help来查看-cp 以及 -ea就可以发现,-cp其实等同于-classpath 提供类和jar的搜索路径,-ea等同于-enableassertions提供是否启动断言设置。所以上述的命令其实可以简化为java -cp lucene-core-4.8-SNAPSHOT.jar org.apache.lucene.index.CheckIndex 。

首先来检查下索引的情况:可以看出信息蛮清楚明了的。

  1. userdeMacBook-Pro:lib rcf$ java -cp lucene-core-4.8-SNAPSHOT.jar -ea:org.apache.lucene.index... org.apache.lucene.index.CheckIndex ../../../../../../solr/Solr/test/data/index
  2.  
  3. Opening index @ ../../../../../../solr/Solr/test/data/index
  4.  
  5. Segments file=segments_r numSegments= version=4.8 format= userData={commitTimeMSec=}
  6. of : name=_k docCount=
  7. codec=Lucene46
  8. compound=false
  9. numFiles=
  10. size (MB)=0.493
  11. diagnostics = {timestamp=, os=Mac OS X, os.version=10.9., mergeFactor=, source=merge, lucene.version=4.8-SNAPSHOT Unversioned directory - rcf - -- ::, os.arch=x86_64, mergeMaxNumSegments=-, java.version=1.7.0_60, java.vendor=Oracle Corporation}
  12. no deletions
  13. test: open reader.........OK
  14. test: check integrity.....OK
  15. test: check live docs.....OK
  16. test: fields..............OK [ fields]
  17. test: field norms.........OK [ fields]
  18. test: terms, freq, prox...OK [ terms; terms/docs pairs; tokens]
  19. test: stored fields.......OK [ total field count; avg fields per doc]
  20. test: term vectors........OK [ total vector count; avg term/freq vector fields per doc]
  21. test: docvalues...........OK [ docvalues fields; BINARY; NUMERIC; SORTED; SORTED_SET]
  22.  
  23. of : name=_l docCount=
  24. codec=Lucene46
  25. compound=false
  26. numFiles=
  27. size (MB)=0.028
  28. diagnostics = {timestamp=, os=Mac OS X, os.version=10.9., source=flush, lucene.version=4.8-SNAPSHOT Unversioned directory - rcf - -- ::, os.arch=x86_64, java.version=1.7.0_60, java.vendor=Oracle Corporation}
  29. no deletions
  30. test: open reader.........OK
  31. test: check integrity.....OK
  32. test: check live docs.....OK
  33. test: fields..............OK [ fields]
  34. test: field norms.........OK [ fields]
  35. test: terms, freq, prox...OK [ terms; terms/docs pairs; tokens]
  36. test: stored fields.......OK [ total field count; avg fields per doc]
  37. test: term vectors........OK [ total vector count; avg term/freq vector fields per doc]
  38. test: docvalues...........OK [ docvalues fields; BINARY; NUMERIC; SORTED; SORTED_SET]
  39.  
  40. of : name=_m docCount=
  41. codec=Lucene46
  42. compound=false
  43. numFiles=
  44. size (MB)=0.028
  45. diagnostics = {timestamp=, os=Mac OS X, os.version=10.9., source=flush, lucene.version=4.8-SNAPSHOT Unversioned directory - rcf - -- ::, os.arch=x86_64, java.version=1.7.0_60, java.vendor=Oracle Corporation}
  46. no deletions
  47. test: open reader.........OK
  48. test: check integrity.....OK
  49. test: check live docs.....OK
  50. test: fields..............OK [ fields]
  51. test: field norms.........OK [ fields]
  52. test: terms, freq, prox...OK [ terms; terms/docs pairs; tokens]
  53. test: stored fields.......OK [ total field count; avg fields per doc]
  54. test: term vectors........OK [ total vector count; avg term/freq vector fields per doc]
  55. test: docvalues...........OK [ docvalues fields; BINARY; NUMERIC; SORTED; SORTED_SET]
  56.  
  57. of : name=_n docCount=
  58. codec=Lucene46
  59. compound=false
  60. numFiles=
  61. size (MB)=0.028
  62. diagnostics = {timestamp=, os=Mac OS X, os.version=10.9., source=flush, lucene.version=4.8-SNAPSHOT Unversioned directory - rcf - -- ::, os.arch=x86_64, java.version=1.7.0_60, java.vendor=Oracle Corporation}
  63. no deletions
  64. test: open reader.........OK
  65. test: check integrity.....OK
  66. test: check live docs.....OK
  67. test: fields..............OK [ fields]
  68. test: field norms.........OK [ fields]
  69. test: terms, freq, prox...OK [ terms; terms/docs pairs; tokens]
  70. test: stored fields.......OK [ total field count; avg fields per doc]
  71. test: term vectors........OK [ total vector count; avg term/freq vector fields per doc]
  72. test: docvalues...........OK [ docvalues fields; BINARY; NUMERIC; SORTED; SORTED_SET]
  73.  
  74. of : name=_o docCount=
  75. codec=Lucene46
  76. compound=false
  77. numFiles=
  78. size (MB)=0.028
  79. diagnostics = {timestamp=, os=Mac OS X, os.version=10.9., source=flush, lucene.version=4.8-SNAPSHOT Unversioned directory - rcf - -- ::, os.arch=x86_64, java.version=1.7.0_60, java.vendor=Oracle Corporation}
  80. no deletions
  81. test: open reader.........OK
  82. test: check integrity.....OK
  83. test: check live docs.....OK
  84. test: fields..............OK [ fields]
  85. test: field norms.........OK [ fields]
  86. test: terms, freq, prox...OK [ terms; terms/docs pairs; tokens]
  87. test: stored fields.......OK [ total field count; avg fields per doc]
  88. test: term vectors........OK [ total vector count; avg term/freq vector fields per doc]
  89. test: docvalues...........OK [ docvalues fields; BINARY; NUMERIC; SORTED; SORTED_SET]
  90.  
  91. of : name=_p docCount=
  92. codec=Lucene46
  93. compound=false
  94. numFiles=
  95. size (MB)=0.028
  96. diagnostics = {timestamp=, os=Mac OS X, os.version=10.9., source=flush, lucene.version=4.8-SNAPSHOT Unversioned directory - rcf - -- ::, os.arch=x86_64, java.version=1.7.0_60, java.vendor=Oracle Corporation}
  97. no deletions
  98. test: open reader.........OK
  99. test: check integrity.....OK
  100. test: check live docs.....OK
  101. test: fields..............OK [ fields]
  102. test: field norms.........OK [ fields]
  103. test: terms, freq, prox...OK [ terms; terms/docs pairs; tokens]
  104. test: stored fields.......OK [ total field count; avg fields per doc]
  105. test: term vectors........OK [ total vector count; avg term/freq vector fields per doc]
  106. test: docvalues...........OK [ docvalues fields; BINARY; NUMERIC; SORTED; SORTED_SET]
  107.  
  108. of : name=_q docCount=
  109. codec=Lucene46
  110. compound=false
  111. numFiles=
  112. size (MB)=0.027
  113. diagnostics = {timestamp=, os=Mac OS X, os.version=10.9., source=flush, lucene.version=4.8-SNAPSHOT Unversioned directory - rcf - -- ::, os.arch=x86_64, java.version=1.7.0_60, java.vendor=Oracle Corporation}
  114. no deletions
  115. test: open reader.........OK
  116. test: check integrity.....OK
  117. test: check live docs.....OK
  118. test: fields..............OK [ fields]
  119. test: field norms.........OK [ fields]
  120. test: terms, freq, prox...OK [ terms; terms/docs pairs; tokens]
  121. test: stored fields.......OK [ total field count; avg fields per doc]
  122. test: term vectors........OK [ total vector count; avg term/freq vector fields per doc]
  123. test: docvalues...........OK [ docvalues fields; BINARY; NUMERIC; SORTED; SORTED_SET]
  124.  
  125. No problems were detected with this index.

由于我的索引文件是正常的,那么通过网上的例子来查看下错误的情况下是什么样子的,并且-fix是怎么样子的效果:

来自网友:http://blog.csdn.net/laigood/article/details/8296678

  1. Segments file=segments_2cg numSegments= version=3.6. format=FORMAT_3_1 [Lucene 3.1+] userData={translog_id=}
  2. of : name=_59ct docCount=
  3. compound=false
  4. hasProx=true
  5. numFiles=
  6. size (MB)=,233.694
  7. diagnostics = {mergeFactor=, os.version=2.6.-.el6.x86_64, os=Linux, lucene.version=3.6. - thetaphi - -- ::, source=merge, os.arch=amd64, mergeMaxNumSegments=-, java.version=1.6.0_24, java.vendor=Sun Microsystems Inc.}
  8. has deletions [delFileName=_59ct_1b.del]
  9. test: open reader.........OK [ deleted docs]
  10. test: fields..............OK [ fields]
  11. test: field norms.........OK [ fields]
  12. test: terms, freq, prox...OK [ terms; terms/docs pairs; tokens]
  13. test: stored fields.......ERROR [read past EOF: MMapIndexInput(path="/usr/local/sas/escluster/data/cluster/nodes/0/indices/index/5/index/_59ct.fdt")]
  14. java.io.EOFException: read past EOF: MMapIndexInput(path="/usr/local/sas/escluster/data/cluster/nodes/0/indices/index/5/index/_59ct.fdt")
  15. at org.apache.lucene.store.MMapDirectory$MMapIndexInput.readBytes(MMapDirectory.java:)
  16. at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:)
  17. at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:)
  18. at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:)
  19. at org.apache.lucene.index.IndexReader.document(IndexReader.java:)
  20. at org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:)
  21. at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:)
  22. at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:)
  23. test: term vectors........OK [ total vector count; avg term/freq vector fields per doc]
  24. FAILED
  25. WARNING: fixIndex() would remove reference to this segment; full exception:
  26. java.lang.RuntimeException: Stored Field test failed
  27. at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:)
  28. at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:)
  29.  
  30. WARNING: broken segments (containing documents) detected
  31. WARNING: documents will be lost
在检查结果中可以看到,分片5的_59ct.fdt索引文件损坏,.fdt文件主要存储lucene索引中存储的fields,所以在检查test: stored fields时出错。
下面的警告是说有一个损坏了的segment,里面有4708135个文档。
在原来的命令基础上加上-fix参数可以进行修复索引操作(ps:在进行修改前最好对要修复的索引进行备份,不要在正在执行写操作的索引上执行修复。)
  1. java -cp lucene-core-3.6..jar -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex /usr/local/sas/escluster/data/cluster/nodes//indices/index//index/ -fix
  2. NOTE: will write new segments file in seconds; this will remove docs from the index. THIS IS YOUR LAST CHANCE TO CTRL+C!
  3. ...
  4. ...
  5. ...
  6. ...
  7. ...
  8. Writing...
  9. OK
  10. Wrote new segments file "segments_2ch"

还可以通过检查某一个segment:

  1. userdeMacBook-Pro:lib rcf$ java -cp lucene-core-4.8-SNAPSHOT.jar -ea:org.apache.lucene.index... org.apache.lucene.index.CheckIndex ../../../../../../solr/Solr/test/data/index -segment _9
  2.  
  3. Opening index @ ../../../../../../solr/Solr/test/data/index
  4.  
  5. Segments file=segments_r numSegments= version=4.8 format= userData={commitTimeMSec=}
  6.  
  7. Checking only these segments: _9:
  8. No problems were detected with this index.

还可以通过-verbose查看更多详细信息,这里就不在详述。

2. CheckIndex的源码

接着我们再来学习下CheckIndex的源码是怎么来实现以上功能的,检查索引的功能主要集中在checkindex()函数上,

  1. public Status checkIndex(List<String> onlySegments) throws IOException {
  2. ...
  3. final int numSegments = sis.size(); //获取segment个数
  4. final String segmentsFileName = sis.getSegmentsFileName(); //获取segment_N名字
  5. // note: we only read the format byte (required preamble) here!
  6. IndexInput input = null;
  7. try {
  8. input = dir.openInput(segmentsFileName, IOContext.READONCE);//读取segment_N文件
  9. } catch (Throwable t) {
  10. msg(infoStream, "ERROR: could not open segments file in directory");
  11. if (infoStream != null)
  12. t.printStackTrace(infoStream);
  13. result.cantOpenSegments = true;
  14. return result;
  15. }
  16. int format = ;
  17. try {
  18. format = input.readInt(); //读取segment_N version
  19. } catch (Throwable t) {
  20. msg(infoStream, "ERROR: could not read segment file version in directory");
  21. if (infoStream != null)
  22. t.printStackTrace(infoStream);
  23. result.missingSegmentVersion = true;
  24. return result;
  25. } finally {
  26. if (input != null)
  27. input.close();
  28. }
  29.  
  30. String sFormat = "";
  31. boolean skip = false;
  32.  
  33. result.segmentsFileName = segmentsFileName;//segment_N名字
  34. result.numSegments = numSegments; //segment个数
  35. result.userData = sis.getUserData(); //获取user信息,如userData={commitTimeMSec=1411221019854}
  36. String userDataString;
  37. if (sis.getUserData().size() > ) {
  38. userDataString = " userData=" + sis.getUserData();
  39. } else {
  40. userDataString = "";
  41. }
  42. //获取版本信息,如version=4.8
  43. String versionString = null;
  44. if (oldSegs != null) {
  45. if (foundNonNullVersion) {
  46. versionString = "versions=[" + oldSegs + " .. " + newest + "]";
  47. } else {
  48. versionString = "version=" + oldSegs;
  49. }
  50. } else {
  51. versionString = oldest.equals(newest) ? ( "version=" + oldest ) : ("versions=[" + oldest + " .. " + newest + "]");
  52. }
  53.  
  54. msg(infoStream, "Segments file=" + segmentsFileName + " numSegments=" + numSegments
  55. + " " + versionString + " format=" + sFormat + userDataString);
  56.  
  57. if (onlySegments != null) {
  58. result.partial = true;
  59. if (infoStream != null) {
  60. infoStream.print("\nChecking only these segments:");
  61. for (String s : onlySegments) {
  62. infoStream.print(" " + s);
  63. }
  64. }
  65. result.segmentsChecked.addAll(onlySegments);
  66. msg(infoStream, ":");
  67. }
  68.  
  69. if (skip) {
  70. msg(infoStream, "\nERROR: this index appears to be created by a newer version of Lucene than this tool was compiled on; please re-compile this tool on the matching version of Lucene; exiting");
  71. result.toolOutOfDate = true;
  72. return result;
  73. }
  74.  
  75. result.newSegments = sis.clone();
  76. result.newSegments.clear();
  77. result.maxSegmentName = -;
  78. //开始遍历segment,检查segment
  79. for(int i=;i<numSegments;i++) {
  80. final SegmentCommitInfo info = sis.info(i); //获取segment信息
  81. int segmentName = Integer.parseInt(info.info.name.substring(), Character.MAX_RADIX);
  82. if (segmentName > result.maxSegmentName) {
  83. result.maxSegmentName = segmentName;
  84. }
  85. if (onlySegments != null && !onlySegments.contains(info.info.name)) {
  86. continue;
  87. }
  88. Status.SegmentInfoStatus segInfoStat = new Status.SegmentInfoStatus();
  89. result.segmentInfos.add(segInfoStat);
  90. //获取segments编号,名字,document个数,如下信息 1 of 7: name=_k docCount=18001
  91. msg(infoStream, " " + (+i) + " of " + numSegments + ": name=" + info.info.name + " docCount=" + info.info.getDocCount());
  92. segInfoStat.name = info.info.name;
  93. segInfoStat.docCount = info.info.getDocCount();
  94.  
  95. final String version = info.info.getVersion();
  96. if (info.info.getDocCount() <= && version != null && versionComparator.compare(version, "4.5") >= ) {
  97. throw new RuntimeException("illegal number of documents: maxDoc=" + info.info.getDocCount());
  98. }
  99.  
  100. int toLoseDocCount = info.info.getDocCount();
  101.  
  102. AtomicReader reader = null;
  103.  
  104. try {
  105. final Codec codec = info.info.getCodec(); //获取codec信息,如codec=Lucene46
  106. msg(infoStream, " codec=" + codec);
  107. segInfoStat.codec = codec;
  108. msg(infoStream, " compound=" + info.info.getUseCompoundFile());//获取复合文档格式标志位:compound=false
  109. segInfoStat.compound = info.info.getUseCompoundFile();
  110. msg(infoStream, " numFiles=" + info.files().size());
  111. segInfoStat.numFiles = info.files().size(); //获取段内文件个数numFiles=10
  112. segInfoStat.sizeMB = info.sizeInBytes()/(.*.);//获取segment大小如size (MB)=0.493
  113. if (info.info.getAttribute(Lucene3xSegmentInfoFormat.DS_OFFSET_KEY) == null) {
  114. // don't print size in bytes if its a 3.0 segment with shared docstores
  115. msg(infoStream, " size (MB)=" + nf.format(segInfoStat.sizeMB));
  116. }
  117. //获取诊断信息,diagnostics = {timestamp=1411221019346, os=Mac OS X, os.version=10.9.4, mergeFactor=10,
  118. //source=merge, lucene.version=4.8-SNAPSHOT Unversioned directory - rcf - 2014-09-20 21:11:36,
  119. //os.arch=x86_64, mergeMaxNumSegments=-1, java.version=1.7.0_60, java.vendor=Oracle Corporation}
  120. Map<String,String> diagnostics = info.info.getDiagnostics();
  121. segInfoStat.diagnostics = diagnostics;
  122. if (diagnostics.size() > ) {
  123. msg(infoStream, " diagnostics = " + diagnostics);
  124. }
  125. //判断是否有document删除,如输出no deletions 或者 has deletions [delFileName=_59ct_1b.del]
  126. if (!info.hasDeletions()) {
  127. msg(infoStream, " no deletions");
  128. segInfoStat.hasDeletions = false;
  129. }
  130. else{
  131. msg(infoStream, " has deletions [delGen=" + info.getDelGen() + "]");
  132. segInfoStat.hasDeletions = true;
  133. segInfoStat.deletionsGen = info.getDelGen();
  134. }
  135.  
  136. //通过新建SegmentReader对象来检查是否能获取索引reader,如果扔出错误说明不能打开。例如输出test: open reader.........OK,
  137. if (infoStream != null)
  138. infoStream.print(" test: open reader.........");
  139. reader = new SegmentReader(info, DirectoryReader.DEFAULT_TERMS_INDEX_DIVISOR, IOContext.DEFAULT);
  140. msg(infoStream, "OK");
  141.  
  142. segInfoStat.openReaderPassed = true;
  143. //通过checkIntegrity来检查文件的完整性,例如输出:test: check integrity.....OK
  144. //checkIntegrity函数是通过CodecUtil.checksumEntireFile()实现来检查文件的完整。
  145. if (infoStream != null)
  146. infoStream.print(" test: check integrity.....");
  147. reader.checkIntegrity();
  148. msg(infoStream, "OK");
  149.  
  150. //检查document数量是否正确,如果有删除的document时候,通过reader.numDocs()
  151. //== info.info.getDocCount() - info.getDelCount()以及统计以及检测livedocs的个数是否于reader.numDocs() 一致。
  152. //注意:没有删除的document时候,livedocs为null;
  153. //当没有删除的document时候,reader.maxDoc() == info.info.getDocCount()来检查是否一致。
  154. //输出结果比如:test: check live docs.....OK
  155. //solr的管理界面可以显示: 现有document,删除的document以及所有的document数量,获取方法就是如下。
  156. if (infoStream != null)
  157. infoStream.print(" test: check live docs.....");
  158. final int numDocs = reader.numDocs();
  159. toLoseDocCount = numDocs;
  160. if (reader.hasDeletions()) {
  161. if (reader.numDocs() != info.info.getDocCount() - info.getDelCount()) {
  162. throw new RuntimeException("delete count mismatch: info=" + (info.info.getDocCount() - info.getDelCount()) + " vs reader=" + reader.numDocs());
  163. }
  164. if ((info.info.getDocCount()-reader.numDocs()) > reader.maxDoc()) {
  165. throw new RuntimeException("too many deleted docs: maxDoc()=" + reader.maxDoc() + " vs del count=" + (info.info.getDocCount()-reader.numDocs()));
  166. }
  167. if (info.info.getDocCount() - numDocs != info.getDelCount()) {
  168. throw new RuntimeException("delete count mismatch: info=" + info.getDelCount() + " vs reader=" + (info.info.getDocCount() - numDocs));
  169. }
  170. Bits liveDocs = reader.getLiveDocs();
  171. if (liveDocs == null) {
  172. throw new RuntimeException("segment should have deletions, but liveDocs is null");
  173. } else {
  174. int numLive = ;
  175. for (int j = ; j < liveDocs.length(); j++) {
  176. if (liveDocs.get(j)) {
  177. numLive++;
  178. }
  179. }
  180. if (numLive != numDocs) {
  181. throw new RuntimeException("liveDocs count mismatch: info=" + numDocs + ", vs bits=" + numLive);
  182. }
  183. }
  184.  
  185. segInfoStat.numDeleted = info.info.getDocCount() - numDocs;
  186. msg(infoStream, "OK [" + (segInfoStat.numDeleted) + " deleted docs]");
  187. } else {
  188. if (info.getDelCount() != ) {
  189. throw new RuntimeException("delete count mismatch: info=" + info.getDelCount() + " vs reader=" + (info.info.getDocCount() - numDocs));
  190. }
  191. Bits liveDocs = reader.getLiveDocs();
  192. if (liveDocs != null) {
  193. // its ok for it to be non-null here, as long as none are set right?
  194. // 这里好像有点问题,当delete document不存在时候,liveDocs应该为null。
  195. for (int j = ; j < liveDocs.length(); j++) {
  196. if (!liveDocs.get(j)) {
  197. throw new RuntimeException("liveDocs mismatch: info says no deletions but doc " + j + " is deleted.");
  198. }
  199. }
  200. }
  201. msg(infoStream, "OK");
  202. }
  203. if (reader.maxDoc() != info.info.getDocCount()) {
  204. throw new RuntimeException("SegmentReader.maxDoc() " + reader.maxDoc() + " != SegmentInfos.docCount " + info.info.getDocCount());
  205. }
  206.  
  207. // Test getFieldInfos()
  208. // 获取域状态以及数量 如:test: fields..............OK [3 fields]
  209. if (infoStream != null) {
  210. infoStream.print(" test: fields..............");
  211. }
  212. FieldInfos fieldInfos = reader.getFieldInfos();
  213. msg(infoStream, "OK [" + fieldInfos.size() + " fields]");
  214. segInfoStat.numFields = fieldInfos.size();
  215.  
  216. // Test Field Norms
  217. // 获取Field Norms的状态以及数量 test: field norms.........OK [1 fields]
  218. segInfoStat.fieldNormStatus = testFieldNorms(reader, infoStream);
  219.  
  220. // Test the Term Index
  221. // 获取Field Index的状态以及数量 test: terms, freq, prox...OK [36091 terms; 54003 terms/docs pairs; 18001 tokens]
  222. segInfoStat.termIndexStatus = testPostings(reader, infoStream, verbose);
  223.  
  224. // Test Stored Fields
  225. // 获取Stored Field 的状态 test: stored fields.......OK [54003 total field count; avg 3 fields per doc]
  226. segInfoStat.storedFieldStatus = testStoredFields(reader, infoStream);
  227.  
  228. // Test Term Vectors
  229. // 获取Term Field 的状态 test: stored fields.......OK [54003 total field count; avg 3 fields per doc]
  230. segInfoStat.termVectorStatus = testTermVectors(reader, infoStream, verbose, crossCheckTermVectors);
  231.  
  232. // 获取Doc Value 的状态 test: docvalues...........OK [0 docvalues fields; 0 BINARY; 0 NUMERIC; 0 SORTED; 0 SORTED_SET]
  233. segInfoStat.docValuesStatus = testDocValues(reader, infoStream);
  234.  
  235. // Rethrow the first exception we encountered
  236. // This will cause stats for failed segments to be incremented properly
  237. if (segInfoStat.fieldNormStatus.error != null) {
  238. throw new RuntimeException("Field Norm test failed");
  239. } else if (segInfoStat.termIndexStatus.error != null) {
  240. throw new RuntimeException("Term Index test failed");
  241. } else if (segInfoStat.storedFieldStatus.error != null) {
  242. throw new RuntimeException("Stored Field test failed");
  243. } else if (segInfoStat.termVectorStatus.error != null) {
  244. throw new RuntimeException("Term Vector test failed");
  245. } else if (segInfoStat.docValuesStatus.error != null) {
  246. throw new RuntimeException("DocValues test failed");
  247. }
  248.  
  249. msg(infoStream, "");
  250.  
  251. } catch (Throwable t) {
  252. msg(infoStream, "FAILED");
  253. String comment;
  254. comment = "fixIndex() would remove reference to this segment";
  255. msg(infoStream, " WARNING: " + comment + "; full exception:");
  256. if (infoStream != null)
  257. t.printStackTrace(infoStream);
  258. msg(infoStream, "");
  259. result.totLoseDocCount += toLoseDocCount;
  260. result.numBadSegments++;
  261. continue;
  262. } finally {
  263. if (reader != null)
  264. reader.close();
  265. }
  266.  
  267. // Keeper
  268. result.newSegments.add(info.clone());
  269. }
  270.  
  271. if ( == result.numBadSegments) {
  272. result.clean = true;
  273. } else
  274. msg(infoStream, "WARNING: " + result.numBadSegments + " broken segments (containing " + result.totLoseDocCount + " documents) detected");
  275.  
  276. if ( ! (result.validCounter = (result.maxSegmentName < sis.counter))) {
  277. result.clean = false;
  278. result.newSegments.counter = result.maxSegmentName + ;
  279. msg(infoStream, "ERROR: Next segment name counter " + sis.counter + " is not greater than max segment name " + result.maxSegmentName);
  280. }
  281.  
  282. if (result.clean) {
  283. msg(infoStream, "No problems were detected with this index.\n");
  284. }
  285.  
  286. return result;
  287. }

其中关于testFieldNorms这几个的源码明天继续学习

Solr4.8.0源码分析(13)之LuceneCore的索引修复的更多相关文章

  1. Solr4.8.0源码分析(12)之Lucene的索引文件(5)

    Solr4.8.0源码分析(12)之Lucene的索引文件(5) 1. 存储域数据文件(.fdt和.fdx) Solr4.8.0里面使用的fdt和fdx的格式是lucene4.1的.为了提升压缩比,S ...

  2. Solr4.8.0源码分析(11)之Lucene的索引文件(4)

    Solr4.8.0源码分析(11)之Lucene的索引文件(4) 1. .dvd和.dvm文件 .dvm是存放了DocValue域的元数据,比如DocValue偏移量. .dvd则存放了DocValu ...

  3. Solr4.8.0源码分析(10)之Lucene的索引文件(3)

    Solr4.8.0源码分析(10)之Lucene的索引文件(3) 1. .si文件 .si文件存储了段的元数据,主要涉及SegmentInfoFormat.java和Segmentinfo.java这 ...

  4. Solr4.8.0源码分析(9)之Lucene的索引文件(2)

    Solr4.8.0源码分析(9)之Lucene的索引文件(2) 一. Segments_N文件 一个索引对应一个目录,索引文件都存放在目录里面.Solr的索引文件存放在Solr/Home下的core/ ...

  5. Solr4.8.0源码分析(8)之Lucene的索引文件(1)

    Solr4.8.0源码分析(8)之Lucene的索引文件(1) 题记:最近有幸看到觉先大神的Lucene的博客,感觉自己之前学习的以及工作的太为肤浅,所以决定先跟随觉先大神的博客学习下Lucene的原 ...

  6. Solr4.8.0源码分析(25)之SolrCloud的Split流程

    Solr4.8.0源码分析(25)之SolrCloud的Split流程(一) 题记:昨天有位网友问我SolrCloud的split的机制是如何的,这个还真不知道,所以今天抽空去看了Split的原理,大 ...

  7. Solr4.8.0源码分析(24)之SolrCloud的Recovery策略(五)

    Solr4.8.0源码分析(24)之SolrCloud的Recovery策略(五) 题记:关于SolrCloud的Recovery策略已经写了四篇了,这篇应该是系统介绍Recovery策略的最后一篇了 ...

  8. Solr4.8.0源码分析(23)之SolrCloud的Recovery策略(四)

    Solr4.8.0源码分析(23)之SolrCloud的Recovery策略(四) 题记:本来计划的SolrCloud的Recovery策略的文章是3篇的,但是没想到Recovery的内容蛮多的,前面 ...

  9. Solr4.8.0源码分析(22)之SolrCloud的Recovery策略(三)

    Solr4.8.0源码分析(22)之SolrCloud的Recovery策略(三) 本文是SolrCloud的Recovery策略系列的第三篇文章,前面两篇主要介绍了Recovery的总体流程,以及P ...

随机推荐

  1. .net core4

  2. OpenGL ES2.0基础入门

    1.OpenGL ES 1.x渲染管线(又称为渲染流水线) (1).基本处理: 基本处理主要是设定3D空间中物体的顶点坐标.顶点对应的颜色.顶点的纹理坐标等属性,并且指定绘制方式. 常见的绘制方式有: ...

  3. mybatis 多参数处理

    接口交互比较多, 所以 入参比较多,  有五个参数,是排序 参数, 跟这个五个参数排序,本来想写个对象的, 怕麻烦, 就把 五个参数 变成一个参数, 升序 1 ,降序2 ,比如  11221 ,第三第 ...

  4. [AngularJS] ng-if vs ng-show

    ng-show: ng-show element will stay in dom, just added a ng-hide attr, so it won't show. ng-if: It ha ...

  5. PHP CodeBase: 判断用户是否手机访问(转)

    随着移动设备的普及,网站也会迎来越来越多移动设备的访问.用适应PC的页面,很多时候对手机用户不友好,那么有些时候,我们需要判断用户是否用手机访问,如果是手机的话,就跳转到指定的手机友好页面.这里就介绍 ...

  6. HDFS Architecture--官方文档

    HDFS Architecture Introduction The Hadoop Distributed File System (HDFS) is a distributed file syste ...

  7. navigaitonBar的自定义设置

    navigaitonBar的自定义设置 navigationBar介绍: navigationbar就是一个导航视图控制器上面的导航栏. 如何设置这个navigationbar? 首先我们来探讨如何来 ...

  8. Getting Started with the NDK

    The Native Development Kit (NDK) is a set of tools that allow you to leverage C and C++ code in your ...

  9. linux之CentOS-7.0环境搭建

    此文作为新手安装centos-7的图文教程.  一.  前言 最近,师兄要进行实验室架构搭建,需要学习docker.而docker是完全依赖于linux系统的.所以,有了这篇文章. linux有很多发 ...

  10. python字符串replace()方法

    python字符串replace()方法 >>> help(str.replace)Help on method_descriptor:replace(...)    S.repla ...