Solr4.8.0源码分析(13)之LuceneCore的索引修复
Solr4.8.0源码分析(13)之LuceneCore的索引修复
题记:今天在公司研究elasticsearch,突然看到一篇博客说elasticsearch具有索引修复功能,顿感好奇,于是点进去看了下,发现原来是Lucene Core自带的功能。说实话之前学习Lucene文件格式的时候就想做一个索引文件解析和检测的工具,也动手写了一部分,最后没想到发现了一个已有的工具,正好对照着学习下。
索引的修复主要是用到CheckIndex.java这个类,可以直接查看类的Main函数来了解下。
1. CheckIndex的使用
首先使用以下命令来查看lucenecore.jar怎么使用:
- :lib rcf$ java -cp lucene-core-4.8-SNAPSHOT.jar -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex
- ERROR: index path not specified
- Usage: java org.apache.lucene.index.CheckIndex pathToIndex [-fix] [-crossCheckTermVectors] [-segment X] [-segment Y] [-dir-impl X]
- -fix: actually write a new segments_N file, removing any problematic segments
- -crossCheckTermVectors: verifies that term vectors match postings; THIS IS VERY SLOW!
- -codec X: when fixing, codec to write the new segments_N file with
- -verbose: print additional details
- -segment X: only check the specified segments. This can be specified multiple
- times, to check more than one segment, eg '-segment _2 -segment _a'.
- You can't use this with the -fix option
- -dir-impl X: use a specific FSDirectory implementation. If no package is specified the org.apache.lucene.store package will be used.
- **WARNING**: -fix should only be used on an emergency basis as it will cause
- documents (perhaps many) to be permanently removed from the index. Always make
- a backup copy of your index before running this! Do not run this tool on an index
- that is actively being written to. You have been warned!
- Run without -fix, this tool will open the index, report version information
- and report any exceptions it hits and what action it would take if -fix were
- specified. With -fix, this tool will remove any segments that have issues and
- write a new segments_N file. This means all documents contained in the affected
- segments will be removed.
- This tool exits with exit code if the index cannot be opened or has any
- corruption, else .
当敲java -cp lucene-core-4.8-SNAPSHOT.jar -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex 这个就能看到相当于help的信息啦,但是为什么这里用这么一串奇怪的命令呢?通过java -help来查看-cp 以及 -ea就可以发现,-cp其实等同于-classpath 提供类和jar的搜索路径,-ea等同于-enableassertions提供是否启动断言设置。所以上述的命令其实可以简化为java -cp lucene-core-4.8-SNAPSHOT.jar org.apache.lucene.index.CheckIndex 。
首先来检查下索引的情况:可以看出信息蛮清楚明了的。
- userdeMacBook-Pro:lib rcf$ java -cp lucene-core-4.8-SNAPSHOT.jar -ea:org.apache.lucene.index... org.apache.lucene.index.CheckIndex ../../../../../../solr/Solr/test/data/index
- Opening index @ ../../../../../../solr/Solr/test/data/index
- Segments file=segments_r numSegments= version=4.8 format= userData={commitTimeMSec=}
- of : name=_k docCount=
- codec=Lucene46
- compound=false
- numFiles=
- size (MB)=0.493
- diagnostics = {timestamp=, os=Mac OS X, os.version=10.9., mergeFactor=, source=merge, lucene.version=4.8-SNAPSHOT Unversioned directory - rcf - -- ::, os.arch=x86_64, mergeMaxNumSegments=-, java.version=1.7.0_60, java.vendor=Oracle Corporation}
- no deletions
- test: open reader.........OK
- test: check integrity.....OK
- test: check live docs.....OK
- test: fields..............OK [ fields]
- test: field norms.........OK [ fields]
- test: terms, freq, prox...OK [ terms; terms/docs pairs; tokens]
- test: stored fields.......OK [ total field count; avg fields per doc]
- test: term vectors........OK [ total vector count; avg term/freq vector fields per doc]
- test: docvalues...........OK [ docvalues fields; BINARY; NUMERIC; SORTED; SORTED_SET]
- of : name=_l docCount=
- codec=Lucene46
- compound=false
- numFiles=
- size (MB)=0.028
- diagnostics = {timestamp=, os=Mac OS X, os.version=10.9., source=flush, lucene.version=4.8-SNAPSHOT Unversioned directory - rcf - -- ::, os.arch=x86_64, java.version=1.7.0_60, java.vendor=Oracle Corporation}
- no deletions
- test: open reader.........OK
- test: check integrity.....OK
- test: check live docs.....OK
- test: fields..............OK [ fields]
- test: field norms.........OK [ fields]
- test: terms, freq, prox...OK [ terms; terms/docs pairs; tokens]
- test: stored fields.......OK [ total field count; avg fields per doc]
- test: term vectors........OK [ total vector count; avg term/freq vector fields per doc]
- test: docvalues...........OK [ docvalues fields; BINARY; NUMERIC; SORTED; SORTED_SET]
- of : name=_m docCount=
- codec=Lucene46
- compound=false
- numFiles=
- size (MB)=0.028
- diagnostics = {timestamp=, os=Mac OS X, os.version=10.9., source=flush, lucene.version=4.8-SNAPSHOT Unversioned directory - rcf - -- ::, os.arch=x86_64, java.version=1.7.0_60, java.vendor=Oracle Corporation}
- no deletions
- test: open reader.........OK
- test: check integrity.....OK
- test: check live docs.....OK
- test: fields..............OK [ fields]
- test: field norms.........OK [ fields]
- test: terms, freq, prox...OK [ terms; terms/docs pairs; tokens]
- test: stored fields.......OK [ total field count; avg fields per doc]
- test: term vectors........OK [ total vector count; avg term/freq vector fields per doc]
- test: docvalues...........OK [ docvalues fields; BINARY; NUMERIC; SORTED; SORTED_SET]
- of : name=_n docCount=
- codec=Lucene46
- compound=false
- numFiles=
- size (MB)=0.028
- diagnostics = {timestamp=, os=Mac OS X, os.version=10.9., source=flush, lucene.version=4.8-SNAPSHOT Unversioned directory - rcf - -- ::, os.arch=x86_64, java.version=1.7.0_60, java.vendor=Oracle Corporation}
- no deletions
- test: open reader.........OK
- test: check integrity.....OK
- test: check live docs.....OK
- test: fields..............OK [ fields]
- test: field norms.........OK [ fields]
- test: terms, freq, prox...OK [ terms; terms/docs pairs; tokens]
- test: stored fields.......OK [ total field count; avg fields per doc]
- test: term vectors........OK [ total vector count; avg term/freq vector fields per doc]
- test: docvalues...........OK [ docvalues fields; BINARY; NUMERIC; SORTED; SORTED_SET]
- of : name=_o docCount=
- codec=Lucene46
- compound=false
- numFiles=
- size (MB)=0.028
- diagnostics = {timestamp=, os=Mac OS X, os.version=10.9., source=flush, lucene.version=4.8-SNAPSHOT Unversioned directory - rcf - -- ::, os.arch=x86_64, java.version=1.7.0_60, java.vendor=Oracle Corporation}
- no deletions
- test: open reader.........OK
- test: check integrity.....OK
- test: check live docs.....OK
- test: fields..............OK [ fields]
- test: field norms.........OK [ fields]
- test: terms, freq, prox...OK [ terms; terms/docs pairs; tokens]
- test: stored fields.......OK [ total field count; avg fields per doc]
- test: term vectors........OK [ total vector count; avg term/freq vector fields per doc]
- test: docvalues...........OK [ docvalues fields; BINARY; NUMERIC; SORTED; SORTED_SET]
- of : name=_p docCount=
- codec=Lucene46
- compound=false
- numFiles=
- size (MB)=0.028
- diagnostics = {timestamp=, os=Mac OS X, os.version=10.9., source=flush, lucene.version=4.8-SNAPSHOT Unversioned directory - rcf - -- ::, os.arch=x86_64, java.version=1.7.0_60, java.vendor=Oracle Corporation}
- no deletions
- test: open reader.........OK
- test: check integrity.....OK
- test: check live docs.....OK
- test: fields..............OK [ fields]
- test: field norms.........OK [ fields]
- test: terms, freq, prox...OK [ terms; terms/docs pairs; tokens]
- test: stored fields.......OK [ total field count; avg fields per doc]
- test: term vectors........OK [ total vector count; avg term/freq vector fields per doc]
- test: docvalues...........OK [ docvalues fields; BINARY; NUMERIC; SORTED; SORTED_SET]
- of : name=_q docCount=
- codec=Lucene46
- compound=false
- numFiles=
- size (MB)=0.027
- diagnostics = {timestamp=, os=Mac OS X, os.version=10.9., source=flush, lucene.version=4.8-SNAPSHOT Unversioned directory - rcf - -- ::, os.arch=x86_64, java.version=1.7.0_60, java.vendor=Oracle Corporation}
- no deletions
- test: open reader.........OK
- test: check integrity.....OK
- test: check live docs.....OK
- test: fields..............OK [ fields]
- test: field norms.........OK [ fields]
- test: terms, freq, prox...OK [ terms; terms/docs pairs; tokens]
- test: stored fields.......OK [ total field count; avg fields per doc]
- test: term vectors........OK [ total vector count; avg term/freq vector fields per doc]
- test: docvalues...........OK [ docvalues fields; BINARY; NUMERIC; SORTED; SORTED_SET]
- No problems were detected with this index.
由于我的索引文件是正常的,那么通过网上的例子来查看下错误的情况下是什么样子的,并且-fix是怎么样子的效果:
来自网友:http://blog.csdn.net/laigood/article/details/8296678
- Segments file=segments_2cg numSegments= version=3.6. format=FORMAT_3_1 [Lucene 3.1+] userData={translog_id=}
- of : name=_59ct docCount=
- compound=false
- hasProx=true
- numFiles=
- size (MB)=,233.694
- diagnostics = {mergeFactor=, os.version=2.6.-.el6.x86_64, os=Linux, lucene.version=3.6. - thetaphi - -- ::, source=merge, os.arch=amd64, mergeMaxNumSegments=-, java.version=1.6.0_24, java.vendor=Sun Microsystems Inc.}
- has deletions [delFileName=_59ct_1b.del]
- test: open reader.........OK [ deleted docs]
- test: fields..............OK [ fields]
- test: field norms.........OK [ fields]
- test: terms, freq, prox...OK [ terms; terms/docs pairs; tokens]
- test: stored fields.......ERROR [read past EOF: MMapIndexInput(path="/usr/local/sas/escluster/data/cluster/nodes/0/indices/index/5/index/_59ct.fdt")]
- java.io.EOFException: read past EOF: MMapIndexInput(path="/usr/local/sas/escluster/data/cluster/nodes/0/indices/index/5/index/_59ct.fdt")
- at org.apache.lucene.store.MMapDirectory$MMapIndexInput.readBytes(MMapDirectory.java:)
- at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:)
- at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:)
- at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:)
- at org.apache.lucene.index.IndexReader.document(IndexReader.java:)
- at org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:)
- at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:)
- at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:)
- test: term vectors........OK [ total vector count; avg term/freq vector fields per doc]
- FAILED
- WARNING: fixIndex() would remove reference to this segment; full exception:
- java.lang.RuntimeException: Stored Field test failed
- at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:)
- at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:)
- WARNING: broken segments (containing documents) detected
- WARNING: documents will be lost
- java -cp lucene-core-3.6..jar -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex /usr/local/sas/escluster/data/cluster/nodes//indices/index//index/ -fix
- NOTE: will write new segments file in seconds; this will remove docs from the index. THIS IS YOUR LAST CHANCE TO CTRL+C!
- ...
- ...
- ...
- ...
- ...
- Writing...
- OK
- Wrote new segments file "segments_2ch"
还可以通过检查某一个segment:
- userdeMacBook-Pro:lib rcf$ java -cp lucene-core-4.8-SNAPSHOT.jar -ea:org.apache.lucene.index... org.apache.lucene.index.CheckIndex ../../../../../../solr/Solr/test/data/index -segment _9
- Opening index @ ../../../../../../solr/Solr/test/data/index
- Segments file=segments_r numSegments= version=4.8 format= userData={commitTimeMSec=}
- Checking only these segments: _9:
- No problems were detected with this index.
还可以通过-verbose查看更多详细信息,这里就不在详述。
2. CheckIndex的源码
接着我们再来学习下CheckIndex的源码是怎么来实现以上功能的,检查索引的功能主要集中在checkindex()函数上,
- public Status checkIndex(List<String> onlySegments) throws IOException {
- ...
- final int numSegments = sis.size(); //获取segment个数
- final String segmentsFileName = sis.getSegmentsFileName(); //获取segment_N名字
- // note: we only read the format byte (required preamble) here!
- IndexInput input = null;
- try {
- input = dir.openInput(segmentsFileName, IOContext.READONCE);//读取segment_N文件
- } catch (Throwable t) {
- msg(infoStream, "ERROR: could not open segments file in directory");
- if (infoStream != null)
- t.printStackTrace(infoStream);
- result.cantOpenSegments = true;
- return result;
- }
- int format = ;
- try {
- format = input.readInt(); //读取segment_N version
- } catch (Throwable t) {
- msg(infoStream, "ERROR: could not read segment file version in directory");
- if (infoStream != null)
- t.printStackTrace(infoStream);
- result.missingSegmentVersion = true;
- return result;
- } finally {
- if (input != null)
- input.close();
- }
- String sFormat = "";
- boolean skip = false;
- result.segmentsFileName = segmentsFileName;//segment_N名字
- result.numSegments = numSegments; //segment个数
- result.userData = sis.getUserData(); //获取user信息,如userData={commitTimeMSec=1411221019854}
- String userDataString;
- if (sis.getUserData().size() > ) {
- userDataString = " userData=" + sis.getUserData();
- } else {
- userDataString = "";
- }
- //获取版本信息,如version=4.8
- String versionString = null;
- if (oldSegs != null) {
- if (foundNonNullVersion) {
- versionString = "versions=[" + oldSegs + " .. " + newest + "]";
- } else {
- versionString = "version=" + oldSegs;
- }
- } else {
- versionString = oldest.equals(newest) ? ( "version=" + oldest ) : ("versions=[" + oldest + " .. " + newest + "]");
- }
- msg(infoStream, "Segments file=" + segmentsFileName + " numSegments=" + numSegments
- + " " + versionString + " format=" + sFormat + userDataString);
- if (onlySegments != null) {
- result.partial = true;
- if (infoStream != null) {
- infoStream.print("\nChecking only these segments:");
- for (String s : onlySegments) {
- infoStream.print(" " + s);
- }
- }
- result.segmentsChecked.addAll(onlySegments);
- msg(infoStream, ":");
- }
- if (skip) {
- msg(infoStream, "\nERROR: this index appears to be created by a newer version of Lucene than this tool was compiled on; please re-compile this tool on the matching version of Lucene; exiting");
- result.toolOutOfDate = true;
- return result;
- }
- result.newSegments = sis.clone();
- result.newSegments.clear();
- result.maxSegmentName = -;
- //开始遍历segment,检查segment
- for(int i=;i<numSegments;i++) {
- final SegmentCommitInfo info = sis.info(i); //获取segment信息
- int segmentName = Integer.parseInt(info.info.name.substring(), Character.MAX_RADIX);
- if (segmentName > result.maxSegmentName) {
- result.maxSegmentName = segmentName;
- }
- if (onlySegments != null && !onlySegments.contains(info.info.name)) {
- continue;
- }
- Status.SegmentInfoStatus segInfoStat = new Status.SegmentInfoStatus();
- result.segmentInfos.add(segInfoStat);
- //获取segments编号,名字,document个数,如下信息 1 of 7: name=_k docCount=18001
- msg(infoStream, " " + (+i) + " of " + numSegments + ": name=" + info.info.name + " docCount=" + info.info.getDocCount());
- segInfoStat.name = info.info.name;
- segInfoStat.docCount = info.info.getDocCount();
- final String version = info.info.getVersion();
- if (info.info.getDocCount() <= && version != null && versionComparator.compare(version, "4.5") >= ) {
- throw new RuntimeException("illegal number of documents: maxDoc=" + info.info.getDocCount());
- }
- int toLoseDocCount = info.info.getDocCount();
- AtomicReader reader = null;
- try {
- final Codec codec = info.info.getCodec(); //获取codec信息,如codec=Lucene46
- msg(infoStream, " codec=" + codec);
- segInfoStat.codec = codec;
- msg(infoStream, " compound=" + info.info.getUseCompoundFile());//获取复合文档格式标志位:compound=false
- segInfoStat.compound = info.info.getUseCompoundFile();
- msg(infoStream, " numFiles=" + info.files().size());
- segInfoStat.numFiles = info.files().size(); //获取段内文件个数numFiles=10
- segInfoStat.sizeMB = info.sizeInBytes()/(.*.);//获取segment大小如size (MB)=0.493
- if (info.info.getAttribute(Lucene3xSegmentInfoFormat.DS_OFFSET_KEY) == null) {
- // don't print size in bytes if its a 3.0 segment with shared docstores
- msg(infoStream, " size (MB)=" + nf.format(segInfoStat.sizeMB));
- }
- //获取诊断信息,diagnostics = {timestamp=1411221019346, os=Mac OS X, os.version=10.9.4, mergeFactor=10,
- //source=merge, lucene.version=4.8-SNAPSHOT Unversioned directory - rcf - 2014-09-20 21:11:36,
- //os.arch=x86_64, mergeMaxNumSegments=-1, java.version=1.7.0_60, java.vendor=Oracle Corporation}
- Map<String,String> diagnostics = info.info.getDiagnostics();
- segInfoStat.diagnostics = diagnostics;
- if (diagnostics.size() > ) {
- msg(infoStream, " diagnostics = " + diagnostics);
- }
- //判断是否有document删除,如输出no deletions 或者 has deletions [delFileName=_59ct_1b.del]
- if (!info.hasDeletions()) {
- msg(infoStream, " no deletions");
- segInfoStat.hasDeletions = false;
- }
- else{
- msg(infoStream, " has deletions [delGen=" + info.getDelGen() + "]");
- segInfoStat.hasDeletions = true;
- segInfoStat.deletionsGen = info.getDelGen();
- }
- //通过新建SegmentReader对象来检查是否能获取索引reader,如果扔出错误说明不能打开。例如输出test: open reader.........OK,
- if (infoStream != null)
- infoStream.print(" test: open reader.........");
- reader = new SegmentReader(info, DirectoryReader.DEFAULT_TERMS_INDEX_DIVISOR, IOContext.DEFAULT);
- msg(infoStream, "OK");
- segInfoStat.openReaderPassed = true;
- //通过checkIntegrity来检查文件的完整性,例如输出:test: check integrity.....OK
- //checkIntegrity函数是通过CodecUtil.checksumEntireFile()实现来检查文件的完整。
- if (infoStream != null)
- infoStream.print(" test: check integrity.....");
- reader.checkIntegrity();
- msg(infoStream, "OK");
- //检查document数量是否正确,如果有删除的document时候,通过reader.numDocs()
- //== info.info.getDocCount() - info.getDelCount()以及统计以及检测livedocs的个数是否于reader.numDocs() 一致。
- //注意:没有删除的document时候,livedocs为null;
- //当没有删除的document时候,reader.maxDoc() == info.info.getDocCount()来检查是否一致。
- //输出结果比如:test: check live docs.....OK
- //solr的管理界面可以显示: 现有document,删除的document以及所有的document数量,获取方法就是如下。
- if (infoStream != null)
- infoStream.print(" test: check live docs.....");
- final int numDocs = reader.numDocs();
- toLoseDocCount = numDocs;
- if (reader.hasDeletions()) {
- if (reader.numDocs() != info.info.getDocCount() - info.getDelCount()) {
- throw new RuntimeException("delete count mismatch: info=" + (info.info.getDocCount() - info.getDelCount()) + " vs reader=" + reader.numDocs());
- }
- if ((info.info.getDocCount()-reader.numDocs()) > reader.maxDoc()) {
- throw new RuntimeException("too many deleted docs: maxDoc()=" + reader.maxDoc() + " vs del count=" + (info.info.getDocCount()-reader.numDocs()));
- }
- if (info.info.getDocCount() - numDocs != info.getDelCount()) {
- throw new RuntimeException("delete count mismatch: info=" + info.getDelCount() + " vs reader=" + (info.info.getDocCount() - numDocs));
- }
- Bits liveDocs = reader.getLiveDocs();
- if (liveDocs == null) {
- throw new RuntimeException("segment should have deletions, but liveDocs is null");
- } else {
- int numLive = ;
- for (int j = ; j < liveDocs.length(); j++) {
- if (liveDocs.get(j)) {
- numLive++;
- }
- }
- if (numLive != numDocs) {
- throw new RuntimeException("liveDocs count mismatch: info=" + numDocs + ", vs bits=" + numLive);
- }
- }
- segInfoStat.numDeleted = info.info.getDocCount() - numDocs;
- msg(infoStream, "OK [" + (segInfoStat.numDeleted) + " deleted docs]");
- } else {
- if (info.getDelCount() != ) {
- throw new RuntimeException("delete count mismatch: info=" + info.getDelCount() + " vs reader=" + (info.info.getDocCount() - numDocs));
- }
- Bits liveDocs = reader.getLiveDocs();
- if (liveDocs != null) {
- // its ok for it to be non-null here, as long as none are set right?
- // 这里好像有点问题,当delete document不存在时候,liveDocs应该为null。
- for (int j = ; j < liveDocs.length(); j++) {
- if (!liveDocs.get(j)) {
- throw new RuntimeException("liveDocs mismatch: info says no deletions but doc " + j + " is deleted.");
- }
- }
- }
- msg(infoStream, "OK");
- }
- if (reader.maxDoc() != info.info.getDocCount()) {
- throw new RuntimeException("SegmentReader.maxDoc() " + reader.maxDoc() + " != SegmentInfos.docCount " + info.info.getDocCount());
- }
- // Test getFieldInfos()
- // 获取域状态以及数量 如:test: fields..............OK [3 fields]
- if (infoStream != null) {
- infoStream.print(" test: fields..............");
- }
- FieldInfos fieldInfos = reader.getFieldInfos();
- msg(infoStream, "OK [" + fieldInfos.size() + " fields]");
- segInfoStat.numFields = fieldInfos.size();
- // Test Field Norms
- // 获取Field Norms的状态以及数量 test: field norms.........OK [1 fields]
- segInfoStat.fieldNormStatus = testFieldNorms(reader, infoStream);
- // Test the Term Index
- // 获取Field Index的状态以及数量 test: terms, freq, prox...OK [36091 terms; 54003 terms/docs pairs; 18001 tokens]
- segInfoStat.termIndexStatus = testPostings(reader, infoStream, verbose);
- // Test Stored Fields
- // 获取Stored Field 的状态 test: stored fields.......OK [54003 total field count; avg 3 fields per doc]
- segInfoStat.storedFieldStatus = testStoredFields(reader, infoStream);
- // Test Term Vectors
- // 获取Term Field 的状态 test: stored fields.......OK [54003 total field count; avg 3 fields per doc]
- segInfoStat.termVectorStatus = testTermVectors(reader, infoStream, verbose, crossCheckTermVectors);
- // 获取Doc Value 的状态 test: docvalues...........OK [0 docvalues fields; 0 BINARY; 0 NUMERIC; 0 SORTED; 0 SORTED_SET]
- segInfoStat.docValuesStatus = testDocValues(reader, infoStream);
- // Rethrow the first exception we encountered
- // This will cause stats for failed segments to be incremented properly
- if (segInfoStat.fieldNormStatus.error != null) {
- throw new RuntimeException("Field Norm test failed");
- } else if (segInfoStat.termIndexStatus.error != null) {
- throw new RuntimeException("Term Index test failed");
- } else if (segInfoStat.storedFieldStatus.error != null) {
- throw new RuntimeException("Stored Field test failed");
- } else if (segInfoStat.termVectorStatus.error != null) {
- throw new RuntimeException("Term Vector test failed");
- } else if (segInfoStat.docValuesStatus.error != null) {
- throw new RuntimeException("DocValues test failed");
- }
- msg(infoStream, "");
- } catch (Throwable t) {
- msg(infoStream, "FAILED");
- String comment;
- comment = "fixIndex() would remove reference to this segment";
- msg(infoStream, " WARNING: " + comment + "; full exception:");
- if (infoStream != null)
- t.printStackTrace(infoStream);
- msg(infoStream, "");
- result.totLoseDocCount += toLoseDocCount;
- result.numBadSegments++;
- continue;
- } finally {
- if (reader != null)
- reader.close();
- }
- // Keeper
- result.newSegments.add(info.clone());
- }
- if ( == result.numBadSegments) {
- result.clean = true;
- } else
- msg(infoStream, "WARNING: " + result.numBadSegments + " broken segments (containing " + result.totLoseDocCount + " documents) detected");
- if ( ! (result.validCounter = (result.maxSegmentName < sis.counter))) {
- result.clean = false;
- result.newSegments.counter = result.maxSegmentName + ;
- msg(infoStream, "ERROR: Next segment name counter " + sis.counter + " is not greater than max segment name " + result.maxSegmentName);
- }
- if (result.clean) {
- msg(infoStream, "No problems were detected with this index.\n");
- }
- return result;
- }
其中关于testFieldNorms这几个的源码明天继续学习
Solr4.8.0源码分析(13)之LuceneCore的索引修复的更多相关文章
- Solr4.8.0源码分析(12)之Lucene的索引文件(5)
Solr4.8.0源码分析(12)之Lucene的索引文件(5) 1. 存储域数据文件(.fdt和.fdx) Solr4.8.0里面使用的fdt和fdx的格式是lucene4.1的.为了提升压缩比,S ...
- Solr4.8.0源码分析(11)之Lucene的索引文件(4)
Solr4.8.0源码分析(11)之Lucene的索引文件(4) 1. .dvd和.dvm文件 .dvm是存放了DocValue域的元数据,比如DocValue偏移量. .dvd则存放了DocValu ...
- Solr4.8.0源码分析(10)之Lucene的索引文件(3)
Solr4.8.0源码分析(10)之Lucene的索引文件(3) 1. .si文件 .si文件存储了段的元数据,主要涉及SegmentInfoFormat.java和Segmentinfo.java这 ...
- Solr4.8.0源码分析(9)之Lucene的索引文件(2)
Solr4.8.0源码分析(9)之Lucene的索引文件(2) 一. Segments_N文件 一个索引对应一个目录,索引文件都存放在目录里面.Solr的索引文件存放在Solr/Home下的core/ ...
- Solr4.8.0源码分析(8)之Lucene的索引文件(1)
Solr4.8.0源码分析(8)之Lucene的索引文件(1) 题记:最近有幸看到觉先大神的Lucene的博客,感觉自己之前学习的以及工作的太为肤浅,所以决定先跟随觉先大神的博客学习下Lucene的原 ...
- Solr4.8.0源码分析(25)之SolrCloud的Split流程
Solr4.8.0源码分析(25)之SolrCloud的Split流程(一) 题记:昨天有位网友问我SolrCloud的split的机制是如何的,这个还真不知道,所以今天抽空去看了Split的原理,大 ...
- Solr4.8.0源码分析(24)之SolrCloud的Recovery策略(五)
Solr4.8.0源码分析(24)之SolrCloud的Recovery策略(五) 题记:关于SolrCloud的Recovery策略已经写了四篇了,这篇应该是系统介绍Recovery策略的最后一篇了 ...
- Solr4.8.0源码分析(23)之SolrCloud的Recovery策略(四)
Solr4.8.0源码分析(23)之SolrCloud的Recovery策略(四) 题记:本来计划的SolrCloud的Recovery策略的文章是3篇的,但是没想到Recovery的内容蛮多的,前面 ...
- Solr4.8.0源码分析(22)之SolrCloud的Recovery策略(三)
Solr4.8.0源码分析(22)之SolrCloud的Recovery策略(三) 本文是SolrCloud的Recovery策略系列的第三篇文章,前面两篇主要介绍了Recovery的总体流程,以及P ...
随机推荐
- .net core4
- OpenGL ES2.0基础入门
1.OpenGL ES 1.x渲染管线(又称为渲染流水线) (1).基本处理: 基本处理主要是设定3D空间中物体的顶点坐标.顶点对应的颜色.顶点的纹理坐标等属性,并且指定绘制方式. 常见的绘制方式有: ...
- mybatis 多参数处理
接口交互比较多, 所以 入参比较多, 有五个参数,是排序 参数, 跟这个五个参数排序,本来想写个对象的, 怕麻烦, 就把 五个参数 变成一个参数, 升序 1 ,降序2 ,比如 11221 ,第三第 ...
- [AngularJS] ng-if vs ng-show
ng-show: ng-show element will stay in dom, just added a ng-hide attr, so it won't show. ng-if: It ha ...
- PHP CodeBase: 判断用户是否手机访问(转)
随着移动设备的普及,网站也会迎来越来越多移动设备的访问.用适应PC的页面,很多时候对手机用户不友好,那么有些时候,我们需要判断用户是否用手机访问,如果是手机的话,就跳转到指定的手机友好页面.这里就介绍 ...
- HDFS Architecture--官方文档
HDFS Architecture Introduction The Hadoop Distributed File System (HDFS) is a distributed file syste ...
- navigaitonBar的自定义设置
navigaitonBar的自定义设置 navigationBar介绍: navigationbar就是一个导航视图控制器上面的导航栏. 如何设置这个navigationbar? 首先我们来探讨如何来 ...
- Getting Started with the NDK
The Native Development Kit (NDK) is a set of tools that allow you to leverage C and C++ code in your ...
- linux之CentOS-7.0环境搭建
此文作为新手安装centos-7的图文教程. 一. 前言 最近,师兄要进行实验室架构搭建,需要学习docker.而docker是完全依赖于linux系统的.所以,有了这篇文章. linux有很多发 ...
- python字符串replace()方法
python字符串replace()方法 >>> help(str.replace)Help on method_descriptor:replace(...) S.repla ...