企业搜索引擎开发之连接器connector（二十七）

ChangeQueue类实现ChangeSource接口，声明了拉取下一条Change对象的方法

 * A source of {@link Change} objects.

 *

 * @since 2.8

 */

public interface ChangeSource {

  /**

   * @return the next change, or {@code null} if there is no change available

   */

  public Change getNextChange();

}

在ChangeQueue类实例里面初始化阻塞队列private final BlockingQueue<Change> pendingChanges，作为保存Change对象容器

/**

   * 初始化阻塞队列pendingChanges

   * @param size

   * @param sleepInterval

   * @param introduceDelayAfterEachScan

   * @param activityLogger

   */

  private ChangeQueue(int size, long sleepInterval,

      boolean introduceDelayAfterEachScan, CrawlActivityLogger activityLogger) {

    pendingChanges = new ArrayBlockingQueue<Change>(size);

    this.sleepInterval = sleepInterval;

    this.activityLogger = activityLogger;

    this.introduceDelayAfterEveryScan = introduceDelayAfterEachScan;

  }

参数introduceDelayAfterEveryScan设置在数据迭代完毕是否延时

上文中提到在其内部类CallBack中将提交的数据添加到阻塞队列BlockingQueue<Change> pendingChanges之中

而在ChangeQueue实现ChangeSource接口的方法中，实现从阻塞队列获取Change对象

/**

   * 获取阻塞队列pendingChanges元素

   * Gets the next available change from the ChangeQueue.  Will wait up to

   * 1/4 second for a change to appear if none is immediately available.

   *

   * @return the next available change, or {@code null} if no changes are

   *         available

   */

  public Change getNextChange() {

    try {

      return pendingChanges.poll(250L, TimeUnit.MILLISECONDS);

    } catch (InterruptedException ie) {

      return null;

    }

  }

ChangeQueue对象作为保存Change对象的缓冲容器，上文中分析到Change对象是通过启动监控器对象DocumentSnapshotRepositoryMonitor的线程方法添加进来的

那么，由哪个对象实现调用ChangeQueue对象的getNextChange()方法取出Change对象数据呢？

通过跟踪CheckpointAndChangeQueue类的loadUpFromChangeSource方法调用了getNextChange()方法，在该方法里面将获取的Chnage对象经过包装为CheckpointAndChange类型对象后添加到成员属性List<CheckpointAndChange> checkpointAndChangeList之中

先熟悉一下相关成员属性和构造函数

 private final AtomicInteger maximumQueueSize =

      new AtomicInteger(DEFAULT_MAXIMUM_QUEUE_SIZE);

  private final List<CheckpointAndChange> checkpointAndChangeList;

  private final ChangeSource changeSource;

  private final DocumentHandleFactory internalDocumentHandleFactory;

  private final DocumentHandleFactory clientDocumentHandleFactory;

  private volatile DiffingConnectorCheckpoint lastCheckpoint;

  private final File persistDir;  // place to persist enqueued values

  private MonitorRestartState monitorPoints = new MonitorRestartState();

public CheckpointAndChangeQueue(ChangeSource changeSource, File persistDir,

      DocumentHandleFactory internalDocumentHandleFactory,

      DocumentHandleFactory clientDocumentHandleFactory) {

    this.changeSource = changeSource;

    this.checkpointAndChangeList

        = Collections.synchronizedList(

            new ArrayList<CheckpointAndChange>(maximumQueueSize.get()));

    this.persistDir = persistDir;

    this.internalDocumentHandleFactory = internalDocumentHandleFactory;

    this.clientDocumentHandleFactory = clientDocumentHandleFactory;

    ensurePersistDirExists();

  }

包括初始化ChangeSource类型对象changeSource（也即ChangeQueue类型对象）以及List容器List<CheckpointAndChange> checkpointAndChangeList

再来回顾loadUpFromChangeSource方法

 /**

   * 从ChangeSource拉取Change，加入checkpointAndChangeList

   */

  private void loadUpFromChangeSource() {

    int max = maximumQueueSize.get();

    if (checkpointAndChangeList.size() < max) {

      lastCheckpoint = lastCheckpoint.nextMajor();

    }

    while (checkpointAndChangeList.size() < max) {

      Change newChange = changeSource.getNextChange();

      if (newChange == null) {

        break;

      }

      lastCheckpoint = lastCheckpoint.next();

      checkpointAndChangeList.add(new CheckpointAndChange(

          lastCheckpoint, newChange));

    }

  }

方法主要行为即从changeSource对象取出Change对象，然后经过包装为CheckPointAndChange对象添加到容器List<CheckpointAndChange> checkpointAndChangeList之中

在其resume方法里面调用了loadUpFromChangeSource方法（resume方法在DiffingConnectorDocumentList类的构造函数中调用）

/**

   * 获取List<CheckpointAndChange>队列

   * Returns an {@link Iterator} for currently available

   * {@link CheckpointAndChange} objects that occur after the passed in

   * checkpoint. The {@link String} form of a {@link DiffingConnectorCheckpoint}

   * passed in is produced by calling

   * {@link DiffingConnectorCheckpoint#toString()}. As a side effect, Objects

   * up to and including the object with the passed in checkpoint are removed

   * from this queue.

   *

   * @param checkpointString null means return all {@link CheckpointAndChange}

   *        objects and a non null value means to return

   *        {@link CheckpointAndChange} objects with checkpoints after the

   *        passed in value.

   * @throws IOException if error occurs while manipulating recovery state

   */

  synchronized List<CheckpointAndChange> resume(String checkpointString)

      throws IOException {

      //移除已完成队列

    removeCompletedChanges(checkpointString);

    //从ChangeSource拉取Change，加入checkpointAndChangeList

    loadUpFromChangeSource();

    //更新monitorPoints

    monitorPoints.updateOnGuaranteed(checkpointAndChangeList);

    try {

        //持久化checkpointAndChangeList到队列文件

        //一次resume即生成一文件

      writeRecoveryState();

    } finally {

      // TODO: Enahnce with mechanism that remembers

      // information about recovery files to avoid re-reading.

        //移除冗余的队列文件 （已经消费完成的）

      removeExcessRecoveryState();

    }

    return getList();

  }

在填充List<CheckpointAndChange> checkpointAndChangeList容器后，将其中的数据以json格式持久化到队列文件

/**

   * 持久化json队列

   * @throws IOException

   */

  private void writeRecoveryState() throws IOException {

    // TODO(pjo): Move this method into RecoveryFile.

    File recoveryFile = new RecoveryFile(persistDir);

    FileOutputStream outStream = new FileOutputStream(recoveryFile);

    Writer writer = new OutputStreamWriter(outStream, Charsets.UTF_8);

    try {

      try {

        writeJson(writer);

      } catch (JSONException e) {

        throw IOExceptionHelper.newIOException("Failed writing recovery file.", e);

      }

      writer.flush();

      outStream.getFD().sync();

    } finally {

      writer.close();

    }

  }

队列文件命名包含了当前系统时间，用于比较文件创建的早晚

/**

   * 可用于比较时间的队列文件

   * A File that has some of the recovery logic.

   *  Original recovery files' names contained a single nanosecond timestamp,

   *  eg.  recovery.10220010065599398 .  These turned out to be flawed

   *  because nanosecond times can go "back in time" between JVM restarts.

   *  Updated recovery files' names contain a wall clock millis timestamp

   *  followed by an underscore followed by a nanotimestamp, eg.

   *  recovery.702522216012_10220010065599398 .

   */

  static class RecoveryFile extends File {

    final static long NO_TIME_AVAIL = -1;

    long milliTimestamp = NO_TIME_AVAIL;

    long nanoTimestamp;

    long parseTime(String s) throws IOException {

      try {

        return Long.parseLong(s);

      } catch(NumberFormatException e) {

        throw new LoggingIoException("Invalid recovery filename: "

            + getAbsolutePath());

      }

    }

    /**

     * 解析文件名称中包含的时间

     * @throws IOException

     */

    void parseOutTimes() throws IOException {

      try {

        String basename = getName();

        if (!basename.startsWith(RECOVERY_FILE_PREFIX)) {

          throw new LoggingIoException("Invalid recovery filename: "

              + getAbsolutePath());

        } else {

          String extension = basename.substring(RECOVERY_FILE_PREFIX.length());

          if (!extension.contains("_")) {  // Original name format.

            nanoTimestamp = parseTime(extension);

          } else {  // Updated name format.

            String timeParts[] = extension.split("_");

            if (2 != timeParts.length) {

              throw new LoggingIoException("Invalid recovery filename: "

                  + getAbsolutePath());

            }

            milliTimestamp = parseTime(timeParts[0]);

            nanoTimestamp = parseTime(timeParts[1]);

          }

        }

      } catch(IndexOutOfBoundsException e) {

        throw new LoggingIoException("Invalid recovery filename: "

            + getAbsolutePath());

      }

    }

    RecoveryFile(File persistanceDir) throws IOException {

      super(persistanceDir, RECOVERY_FILE_PREFIX + System.currentTimeMillis()

          + "_" + System.nanoTime());

      parseOutTimes();

    }

    /**

     * 该构造函数用于先获得文件绝对路径

     * @param absolutePath

     * @throws IOException

     */

    RecoveryFile(String absolutePath) throws IOException {

      super(absolutePath);

      parseOutTimes();

    }

    boolean isOlder(RecoveryFile other) {

      boolean weHaveMillis = milliTimestamp != NO_TIME_AVAIL;

      boolean otherHasMillis = other.milliTimestamp != NO_TIME_AVAIL;

      boolean bothHaveMillis = weHaveMillis && otherHasMillis;

      boolean neitherHasMillis = (!weHaveMillis) && (!otherHasMillis);

      if (bothHaveMillis) {

        if (this.milliTimestamp < other.milliTimestamp) {

          return true;

        } else if (this.milliTimestamp > other.milliTimestamp) {

          return false;

        } else {

          return this.nanoTimestamp < other.nanoTimestamp;

        }

      } else if (neitherHasMillis) {

        return this.nanoTimestamp < other.nanoTimestamp;

      } else if (weHaveMillis) {  // and other doesn't; we are newer.

        return false;

      } else {  // other has millis; other is newer.

        return true;

      }

    }

    /** A delete method that logs failures. */

    /**

     * 删除文件

     */

    public void logOnFailDelete() {

      boolean deleted = super.delete();

      if (!deleted) {

        LOG.severe("Failed to delete: " + getAbsolutePath());

      }

    }

    // TODO(pjo): Move more recovery logic into this class.

  }

下面来看在其启动方法（start方法）都做了什么

 /**

   * Initialize to start processing from after the passed in checkpoint

   * or from the beginning if the passed in checkpoint is null.  Part of

   * making DocumentSnapshotRepositoryMonitorManager go from "cold" to "warm".

   */

  public synchronized void start(String checkpointString) throws IOException {

    LOG.info("Starting CheckpointAndChangeQueue from " + checkpointString);

    //创建队列目录

    ensurePersistDirExists();

    checkpointAndChangeList.clear();

    lastCheckpoint = constructLastCheckpoint(checkpointString);

    if (null == checkpointString) {

        //删除队列文件

      removeAllRecoveryState();

    } else {

      RecoveryFile current = removeExcessRecoveryState();

      //加载monitorPoints和checkpointAndChangeList队列

      loadUpFromRecoveryState(current);

      //this.monitorPoints.points.entrySet();

    }

  }

无非从原先保存的队列文件中加载CheckPointAndChange对象列表到List<CheckpointAndChange> checkpointAndChangeList容器中（另外还包括MonitorCheckoint对象）

/**

   * 加载队列

   * @param file

   * @throws IOException

   */

  private void loadUpFromRecoveryState(RecoveryFile file) throws IOException {

    // TODO(pjo): Move this method into RecoveryFile.

    new LoadingQueueReader().readJson(file);

  }

在CheckpointAndChangeQueue类中定义了内部类，即用于从json格式文件加载CheckPointAndChange对象列表到List<CheckpointAndChange> checkpointAndChangeList容器

抽象队列读取抽象类AbstractQueueReader

/**

   * 从json文件加载队列抽象类

   * Reads JSON recovery files. Uses the Template Method pattern to

   * delegate what to do with the parsed objects to subclasses.

   *

   * Note: This class uses gson for streaming support.

   */

  private abstract class AbstractQueueReader {

    public void readJson(File file) throws IOException {

      readJson(new BufferedReader(new InputStreamReader(

                  new FileInputStream(file), Charsets.UTF_8)));

    }

    /**

     * Reads and parses the stream, calling the abstract methods to

     * take whatever action is required. The given stream will be

     * closed automatically.

     *

     * @param reader the stream to parse

     */

    @VisibleForTesting

    void readJson(Reader reader) throws IOException {

      JsonReader jsonReader = new JsonReader(reader);

      try {

        readJson(jsonReader);

      } finally {

        jsonReader.close();

      }

    }

    /**

     * Reads and parses the stream, calling the abstract methods to

     * take whatever action is required.

     */

    private void readJson(JsonReader reader) throws IOException {

      JsonParser parser = new JsonParser();

      reader.beginObject();

      while (reader.hasNext()) {

        String name = reader.nextName();

        if (name.equals(MONITOR_STATE_JSON_TAG)) {

          readMonitorPoints(parser.parse(reader));

        } else if (name.equals(QUEUE_JSON_TAG)) {

          reader.beginArray();

          while (reader.hasNext()) {

            readCheckpointAndChange(parser.parse(reader));

          }

          reader.endArray();

        } else {

          throw new IOException("Read invalid recovery file.");

        }

      }

      reader.endObject();

      reader.setLenient(true);

      String name = reader.nextString();

      if (!name.equals(SENTINAL)) {

        throw new IOException("Read invalid recovery file.");

      }

    }

    protected abstract void readMonitorPoints(JsonElement gson)

        throws IOException;

    protected abstract void readCheckpointAndChange(JsonElement gson)

        throws IOException;

  }

抽象方法由子类实现

/**

   * 检测队列文件的有效性

   * Verifies that a JSON recovery file is valid JSON with a

   * trailing sentinel.

   */

  private class ValidatingQueueReader extends AbstractQueueReader {

    protected void readMonitorPoints(JsonElement gson) throws IOException {

    }

    protected void readCheckpointAndChange(JsonElement gson)

        throws IOException {

    }

  }

  /**

   * 从json文件加载队列实现类

   */

  /** Loads the queue from a JSON recovery file. */

  /*

   * TODO(jlacey): Change everything downstream to gson. For now, we

   * reserialize the individual gson objects and deserialize them

   * using org.json.

   */

  @VisibleForTesting

  class LoadingQueueReader extends AbstractQueueReader {

    /**

     * 加载MonitorRestartState checkpoint(HashMap<String, MonitorCheckpoint> points)

     */

    protected void readMonitorPoints(JsonElement gson) throws IOException {

      try {

        JSONObject json = gsonToJson(gson);

        monitorPoints = new MonitorRestartState(json);

        //monitorPoints.updateOnGuaranteed(checkpointAndChangeList)

      } catch (JSONException e) {

        throw IOExceptionHelper.newIOException(

            "Failed reading persisted JSON queue.", e);

      }

    }

    /**

     * 加载checkpointAndChangeList

     */

    protected void readCheckpointAndChange(JsonElement gson)

        throws IOException {

      try {

        JSONObject json = gsonToJson(gson);

        checkpointAndChangeList.add(new CheckpointAndChange(json,

            internalDocumentHandleFactory, clientDocumentHandleFactory));

      } catch (JSONException e) {

        throw IOExceptionHelper.newIOException(

            "Failed reading persisted JSON queue.", e);

      }

    }

    // TODO(jlacey): This could be much more efficient, especially

    // with LOBs, if we directly transformed the objects with a little

    // recursive parser. This code is only used when recovering failed

    // batches, so I don't know if that's worth the effort.

    private JSONObject gsonToJson(JsonElement gson) throws JSONException {

      return new JSONObject(gson.toString());

    }

  }

---------------------------------------------------------------------------

本系列企业搜索引擎开发之连接器connector系本人原创

转载请注明出处博客园刺猬的温驯

本人邮箱： chenying998179@163#com （#改为.）

本文链接 http://www.cnblogs.com/chenying99/p/3789560.html

企业搜索引擎开发之连接器connector（二十七）的更多相关文章

企业搜索引擎开发之连接器connector（十七）
本文描述连接器的提供与外界交互的servlet接口,连接器与外部是通过xml格式数据交互的 1) 获取所有连接类型提交地址:http://localhost:8080/connector-mana ...
企业搜索引擎开发之连接器connector（二十九）
在哪里调用监控器管理对象snapshotRepositoryMonitorManager的start方法及stop方法,然后又在哪里调用CheckpointAndChangeQueue对象的resum ...
企业搜索引擎开发之连接器connector（二十八）
通常一个SnapshotRepository仓库对象对应一个DocumentSnapshotRepositoryMonitor监视器对象,同时也对应一个快照存储器对象,它们的关联是通过监视器管理对象D ...
企业搜索引擎开发之连接器connector（二十六）
连接器通过监视器对象DocumentSnapshotRepositoryMonitor从上文提到的仓库对象SnapshotRepository(数据库仓库为DBSnapshotRepository)中 ...
企业搜索引擎开发之连接器connector（二十五）
下面开始具体分析连接器是怎么与连接器实例交互的,这里主要是分析连接器怎么从连接器实例获取数据的(前面文章有涉及基于http协议与连接器的xml格式的交互,连接器对连接器实例的设置都是通过配置文件操作的 ...
企业搜索引擎开发之连接器connector（二十四）
本人在上文中提到,连接器实现了两种事件依赖的机制 ,其一是我们手动操作连接器实例时:其二是由连接器的自动更新机制上文中分析了连接器的自动更新机制,即定时器执行定时任务那么,如果我们手动操作连接器实 ...
企业搜索引擎开发之连接器connector（二十三）
我们在前面的文章已经看到,ConnectorCoordinatorImpl类也实现了ChangeHandler接口,本文接下来分析实现该接口的作用 class ConnectorCoordinator ...
企业搜索引擎开发之连接器connector（二十二）
下面来分析线程执行类,线程池ThreadPool类对该类的理解需要对java的线程池比较熟悉该类引用了一个内部类 /** * The lazily constructed LazyThreadPo ...
企业搜索引擎开发之连接器connector（二十一）
从上文中的QueryTraverser对象的BatchResult runBatch(BatchSize batchSize)方法上溯到CancelableBatch类,该类实现了TimedCance ...

随机推荐

Oracle中的一些语句
添加字段的语法:alter table tablename add (column datatype [default value][null/not null],….); 修改字段的语法:alter ...
MFC vs. SDK程序流程
大家都知道,windows API编程以及其消息处理,其过程都清晰可见,大体步骤如下: 1)声明消息窗口类 2)注册窗口类 3)createwindows 4)消息获得以及分派(windows pro ...
java wab----遇到经常用到集合list/map/
List与Vector的区别: vector适用:对象数量变化少,简单对象,随机访问元素频繁 list适用:对象数量变化大,对象复杂,插入和删除频] List首先是链表,它的元素不是连续的.vecto ...
处理DateTime.Now不经过ToString()转换的格式（带有AM、PM）问题
问题是这样的: DateTime.Now不经过ToString()转换,网站部署到测试服务器(国内)得到的时间格式是:2018/8/17 16:26:09,而部署到国外服务器得到的时间格式是:17/8 ...
2018 Multi-University Training Contest 4-Glad You Came(hdu 6356)
一.思路线段树维护一个区间最小值,然后对于每次操作,做区间更新即可.要注意的是,在更新的时候,记得剪枝:如果当前更新的值$v \le minv$(minv为当前线段树节点所管辖区间的最小值),直接返 ...
solr学习之六--------Analyzer（分析器）、Tokenizer（分词器）
首先,不知道大家在前面的例子中没有试着搜索文本串,就是在第二节,我们添加了很多文档.如果字段值是一个文本.你如果只搜索这个字段的某个单词,是不是发现搜不到? 这就是因为我们没有配置Analyzer,因 ...
Java各种Utils小结
原文地址:http://trinea.iteye.com/blog/1533616 最新内容建议直接访问原文:Android常用的工具类主要介绍总结的Android开发中常用的工具类,大部分同样适用 ...
leetcode434
public class Solution { public int CountSegments(string s) { s = s.Trim(); ) { ; } else { ; ; ; i &l ...
VB.Net与C# 的语法比较
最近看代码或写代码时,经常把VB与C#的基本语法搞混,为方便查看,特对其异同进行对比: 變數初始化 VB.NET 自動將所有的變數初始化成 0 或 nothing.C# 在你未初始化變數之前不准你用該 ...
views中class定义（类的写法）CBV
from django.views import View class Home(View): def dispatch(self, request, *args, **kwargs): print( ...

企业搜索引擎开发之连接器connector（二十七）

企业搜索引擎开发之连接器connector（二十七）的更多相关文章

随机推荐

热门专题