连接器通过监视器对象DocumentSnapshotRepositoryMonitor从上文提到的仓库对象SnapshotRepository(数据库仓库为DBSnapshotRepository)中迭代获取数据

监视器类DocumentSnapshotRepositoryMonitor在其构造方法初始化相关成员变量,这些成员属性都是与数据获取及数据处理逻辑相关的对象

 /** This connector instance's current traversal schedule. */
private volatile TraversalSchedule traversalSchedule; /** Directory that contains snapshots. */
private final SnapshotStore snapshotStore; /** The root of the repository to monitor */
private final SnapshotRepository<? extends DocumentSnapshot> query; /** Reader for the current snapshot. */
private SnapshotReader snapshotReader; /** Callback to invoke when a change is detected. */
private final Callback callback; /** Current record from the snapshot. */
private DocumentSnapshot current; /** The snapshot we are currently writing */
private OrderedSnapshotWriter snapshotWriter; private final String name; private final DocumentSnapshotFactory documentSnapshotFactory; private final DocumentSink documentSink; /* Contains a checkpoint confirmation from CM. */
private MonitorCheckpoint guaranteeCheckpoint; /* The monitor should exit voluntarily if set to false */
private volatile boolean isRunning = true; /**
* Creates a DocumentSnapshotRepositoryMonitor that monitors the
* Repository rooted at {@code root}.
*
* @param name the name of this monitor (a hash of the start path)
* @param query query for files
* @param snapshotStore where snapshots are stored
* @param callback client callback
* @param documentSink destination for filtered out file info
* @param initialCp checkpoint when system initiated, could be {@code null}
* @param documentSnapshotFactory for un-serializing
* {@link DocumentSnapshot} objects.
*/
public DocumentSnapshotRepositoryMonitor(String name,
SnapshotRepository<? extends DocumentSnapshot> query,
SnapshotStore snapshotStore, Callback callback,
DocumentSink documentSink, MonitorCheckpoint initialCp,
DocumentSnapshotFactory documentSnapshotFactory) {
this.name = name;
this.query = query;
this.snapshotStore = snapshotStore;
this.callback = callback;
this.documentSnapshotFactory = documentSnapshotFactory;
this.documentSink = documentSink;
guaranteeCheckpoint = initialCp;
}

同时实现了Runnable接口,在override的run方法里面实现数据的处理逻辑

@Override
public void run() {
// Call NDC.push() via reflection, if possible.
invoke(ndcPush, "Monitor " + name);
try {
while (true) {
tryToRunForever();
// TODO: Remove items from this monitor that are in queues.
// Watch out for race conditions. The queues are potentially
// giving docs to CM as bad things happen in monitor.
// This TODO would be mitigated by a reconciliation with GSA.
performExceptionRecovery();
}
} catch (InterruptedException ie) {
LOG.info("Repository Monitor " + name + " received stop signal. " + this);
} finally {
// Call NDC.remove() via reflection, if possible.
invoke(ndcRemove);
}
}

进一步调用tryToRunForever()方法

private void tryToRunForever() throws InterruptedException {
try {
while (true) {
if (traversalSchedule == null || traversalSchedule.shouldRun()) {
// Start traversal
doOnePass();
}
else {
LOG.finest("Currently out of traversal window. "
+ "Sleeping for 15 minutes.");
// TODO(nashi): Calculate when it should wake up while
// handling TraversalScheduleAware events properly.
//没到点,休息
callback.passPausing(15*60*1000);
}
}
} catch (SnapshotWriterException e) {
String msg = "Failed to write to snapshot file: " + snapshotWriter.getPath();
LOG.log(Level.SEVERE, msg, e);
} catch (SnapshotReaderException e) {
String msg = "Failed to read snapshot file: " + snapshotReader.getPath();
LOG.log(Level.SEVERE, msg, e);
} catch (SnapshotStoreException e) {
String msg = "Problem with snapshot store.";
LOG.log(Level.SEVERE, msg, e);
} catch (SnapshotRepositoryRuntimeException e) {
String msg = "Failed reading repository.";
LOG.log(Level.SEVERE, msg, e);
}
}

在doOnePass()方法实现从仓库对象SnapshotRepository中获取数据,并将数据快照持久化到快照文件,并实现相关的数据处理逻辑(判断是新增 删除或更新等,

这些数据最后通过回调Callback接口添加到ChangeQueue对象中的阻塞队列)

/**
* 在doOnePass()方法中生成独立的快照读写器
* Makes one pass through the repository, notifying {@code visitor} of any
* changes.
*
* @throws InterruptedException
*/
private void doOnePass() throws SnapshotStoreException,
InterruptedException {
callback.passBegin();
try {
//快照读取器
// Open the most recent snapshot and read the first record.
this.snapshotReader = snapshotStore.openMostRecentSnapshot();
current = snapshotReader.read();
//快照写入器
// Create an snapshot writer for this pass.
this.snapshotWriter =
new OrderedSnapshotWriter(snapshotStore.openNewSnapshotWriter());
//下面代码为从仓库里面获取数据
for(DocumentSnapshot ss : query) {
//检查是否停止
if (false == isRunning) {
LOG.log(Level.INFO, "Exiting the monitor thread " + name
+ " " + this);
throw new InterruptedException();
} if (Thread.currentThread().isInterrupted()) {
throw new InterruptedException();
}
processDeletes(ss);
safelyProcessDocumentSnapshot(ss);
} //迭代完数据后,删除快照读取器后面多出来的部分(考虑数据源删除了后面的数据)
// Take care of any trailing paths in the snapshot.
processDeletes(null); } finally {
try {
snapshotStore.close(snapshotReader, snapshotWriter);
} catch (IOException e) {
LOG.log(Level.WARNING, "Failed closing snapshot reader and writer.", e);
// Try to proceed anyway. Weird they are not closing.
}
}
if (current != null) {
throw new IllegalStateException(
"Should not finish pass until entire read snapshot is consumed.");
}
//完工了,休息
callback.passComplete(getCheckpoint(-1));
snapshotStore.deleteOldSnapshots();
if (!callback.hasEnqueuedAtLeastOneChangeThisPass()) {
// No monitor checkpoints from this pass went to queue because
// there were no changes, so we can delete the snapshot we just wrote.
new java.io.File(snapshotWriter.getPath()).delete();
// TODO: Check return value; log trouble.
}
snapshotWriter = null;
snapshotReader = null;
}

processDeletes方法实现数据删除逻辑的处理

/**
* Process snapshot entries as deletes until {@code current} catches up with
* {@code documentSnapshot}. Or, if {@code documentSnapshot} is {@code null},
* process all remaining snapshot entries as deletes.
*
* @param documentSnapshot where to stop
* @throws SnapshotReaderException
* @throws InterruptedException
*/
private void processDeletes(DocumentSnapshot documentSnapshot)
throws SnapshotReaderException, InterruptedException {
//参数documentSnapshot大于当前current的,则删除当前的current;然后继续迭代快照里面下一个documentSnapshot
while (current != null
&& (documentSnapshot == null
|| COMPARATOR.compare(documentSnapshot, current) > 0)) {
callback.deletedDocument(
new DeleteDocumentHandle(current.getDocumentId()), getCheckpoint());
current = snapshotReader.read();
}
}

下面跟踪safelyProcessDocumentSnapshot方法

private void safelyProcessDocumentSnapshot(DocumentSnapshot snapshot)
throws InterruptedException, SnapshotReaderException,
SnapshotWriterException {
try {
processDocument(snapshot);
} catch (RepositoryException re) {
//TODO Log the exception or its message? in document sink perhaps.
//处理异常的snapshot
documentSink.add(snapshot.getDocumentId(), FilterReason.IO_EXCEPTION);
}
}

进一步调用processDocument方法,里面包括更新和新增数据的处理逻辑

/**
* Processes a document found in the document repository.
*
* @param documentSnapshot
* @throws RepositoryException
* @throws InterruptedException
* @throws SnapshotReaderException
* @throws SnapshotWriterException
*/
private void processDocument(DocumentSnapshot documentSnapshot)
throws InterruptedException, RepositoryException, SnapshotReaderException,
SnapshotWriterException {
// At this point 'current' >= 'file', or possibly current == null if
// we've processed the previous snapshot entirely.
if (current != null
&& COMPARATOR.compare(documentSnapshot, current) == 0) {
//处理发生变化的documentSnapshot,并更新当前的documentSnapshot
processPossibleChange(documentSnapshot);
} else {
// This file didn't exist during the previous scan.
//不存在该documentSnapshot
DocumentHandle documentHandle = documentSnapshot.getUpdate(null);
snapshotWriter.write(documentSnapshot); // Null if filtered due to mime-type.
if (documentHandle != null) {
callback.newDocument(documentHandle, getCheckpoint(-1));
}
}
}

处理更新情况

/**
* Processes a document found in the document repository that also appeared
* in the previous scan. Determines whether the document has changed,
* propagates changes to the client and writes the snapshot record.
*
* @param documentSnapshot
* @throws RepositoryException
* @throws InterruptedException
* @throws SnapshotWriterException
* @throws SnapshotReaderException
*/
private void processPossibleChange(DocumentSnapshot documentSnapshot)
throws RepositoryException, InterruptedException, SnapshotWriterException,
SnapshotReaderException {
//大概是对比hash值
DocumentHandle documentHandle = documentSnapshot.getUpdate(current);
//写入快照文件
snapshotWriter.write(documentSnapshot);
if (documentHandle == null) {
// No change.
//如果未发生改变,则不发送
} else {
// Normal change - send the gsa an update.
callback.changedDocument(documentHandle, getCheckpoint());
}
current = snapshotReader.read();
}

更新数据的快照和新增数据的快照首先持久化到最新的快照文件

数据提交通过回调callback成员的相关方法,最后将数据提交到ChangeQueue队列对象

Callback接口定义了数据处理的相关方法

/**
* 回调接口
* The client provides an implementation of this interface to receive
* notification of changes to the repository.
*/
public static interface Callback {
public void passBegin() throws InterruptedException; public void newDocument(DocumentHandle documentHandle,
MonitorCheckpoint mcp) throws InterruptedException; public void deletedDocument(DocumentHandle documentHandle,
MonitorCheckpoint mcp) throws InterruptedException; public void changedDocument(DocumentHandle documentHandle,
MonitorCheckpoint mcp) throws InterruptedException; public void passComplete(MonitorCheckpoint mcp) throws InterruptedException; public boolean hasEnqueuedAtLeastOneChangeThisPass(); public void passPausing(int sleepms) throws InterruptedException;
}

在ChangeQueue队列类内部定义了内部类Callback,实现了该接口,在其实现方法里面将提交的数据添加到ChangeQueue队列类的成员阻塞队列之中

/**
* 回调接口实现:向阻塞队列pendingChanges加入Change元素
* Adds {@link Change Changes} to this queue.
*/
private class Callback implements DocumentSnapshotRepositoryMonitor.Callback {
private int changeCount = 0; public void passBegin() {
changeCount = 0;
activityLogger.scanBeginAt(new Timestamp(System.currentTimeMillis()));
} /* @Override */
public void changedDocument(DocumentHandle dh, MonitorCheckpoint mcp)
throws InterruptedException {
++changeCount;
pendingChanges.put(new Change(Change.FactoryType.CLIENT, dh, mcp));
activityLogger.gotChangedDocument(dh.getDocumentId());
} /* @Override */
public void deletedDocument(DocumentHandle dh, MonitorCheckpoint mcp)
throws InterruptedException {
++changeCount;
pendingChanges.put(new Change(Change.FactoryType.INTERNAL, dh, mcp));
activityLogger.gotDeletedDocument(dh.getDocumentId());
} /* @Override */
public void newDocument(DocumentHandle dh, MonitorCheckpoint mcp)
throws InterruptedException {
++changeCount;
pendingChanges.put(new Change(Change.FactoryType.CLIENT, dh, mcp));
activityLogger.gotNewDocument(dh.getDocumentId());
} /* @Override */
public void passComplete(MonitorCheckpoint mcp) throws InterruptedException {
activityLogger.scanEndAt(new Timestamp(System.currentTimeMillis()));
if (introduceDelayAfterEveryScan || changeCount == 0) {
Thread.sleep(sleepInterval);
}
} public boolean hasEnqueuedAtLeastOneChangeThisPass() {
return changeCount > 0;
} /* @Override */
public void passPausing(int sleepms) throws InterruptedException {
Thread.sleep(sleepms);
}
}

---------------------------------------------------------------------------

本系列企业搜索引擎开发之连接器connector系本人原创

转载请注明出处 博客园 刺猬的温驯

本人邮箱: chenying998179@163#com (#改为.)

本文链接 http://www.cnblogs.com/chenying99/p/3789505.html

企业搜索引擎开发之连接器connector(二十六)的更多相关文章

  1. 企业搜索引擎开发之连接器connector(十六)

    本人有一段时间没有接触企业搜索引擎之连接器的开发了,连接器是涉及企业搜索引擎一个重要的组件,在数据源与企业搜索引擎中间起一个桥梁的作用,类似于数据库之JDBC,通过连接器将不同数据源的数据适配到企业搜 ...

  2. 企业搜索引擎开发之连接器connector(十九)

    连接器是基于http协议通过推模式(push)向数据接收服务端推送数据,即xmlfeed格式数据(xml格式),其发送数据接口命名为Pusher Pusher接口定义了与发送数据相关的方法 publi ...

  3. 企业搜索引擎开发之连接器connector(十八)

    创建并启动连接器实例之后,连接器就会基于Http协议向指定的数据接收服务器发送xmlfeed格式数据,我们可以通过配置http代理服务器抓取当前基于http协议格式的数据(或者也可以通过其他网络抓包工 ...

  4. 企业搜索引擎开发之连接器connector(二十九)

    在哪里调用监控器管理对象snapshotRepositoryMonitorManager的start方法及stop方法,然后又在哪里调用CheckpointAndChangeQueue对象的resum ...

  5. 企业搜索引擎开发之连接器connector(二十八)

    通常一个SnapshotRepository仓库对象对应一个DocumentSnapshotRepositoryMonitor监视器对象,同时也对应一个快照存储器对象,它们的关联是通过监视器管理对象D ...

  6. 企业搜索引擎开发之连接器connector(二十五)

    下面开始具体分析连接器是怎么与连接器实例交互的,这里主要是分析连接器怎么从连接器实例获取数据的(前面文章有涉及基于http协议与连接器的xml格式的交互,连接器对连接器实例的设置都是通过配置文件操作的 ...

  7. 企业搜索引擎开发之连接器connector(二十四)

    本人在上文中提到,连接器实现了两种事件依赖的机制 ,其一是我们手动操作连接器实例时:其二是由连接器的自动更新机制 上文中分析了连接器的自动更新机制,即定时器执行定时任务 那么,如果我们手动操作连接器实 ...

  8. 企业搜索引擎开发之连接器connector(二十二)

    下面来分析线程执行类,线程池ThreadPool类 对该类的理解需要对java的线程池比较熟悉 该类引用了一个内部类 /** * The lazily constructed LazyThreadPo ...

  9. 企业搜索引擎开发之连接器connector(二十)

    连接器里面衔接数据源与数据推送对象的是QueryTraverser类对象,该类实现了Traverser接口 /** * Interface presented by a Traverser. Used ...

随机推荐

  1. Linux内核 TCP/IP参数调优

    http://www.360doc.com/content/14/0606/16/3300331_384326124.shtml

  2. linux下启动springboot服务

    错误日志 SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder] . __ ...

  3. arm-linux-gcc配置安装

    1.首先去下载arm-linux-gcc压缩包 https://pan.baidu.com/s/15QLL-mf0G5cEJbVP5OgZcA 密码:ygf3 2.把压缩包导入到Linux系统:htt ...

  4. Plex音乐名称乱码原因id3版本

    标签编码支持情况: ID3v1:ISO-8859-1ID3v2 2.3:ISO-8859-1.UTF-16ID3v2 2.4:ISO-8859-1.UTF-16.UTF-8APEv2:UTF-8 修改 ...

  5. 学习笔记之PHP

    学习 PHP,第 1 部分: 注册帐户.上传需要批准的文件.并查看和下载已批准的文件 https://www.ibm.com/developerworks/cn/opensource/tutorial ...

  6. mysql-2 数据类型

    mysql中定义数据字段的类型对数据库的优化是非常重要的. mysql数据类型大致分为三类:数值.日期/时间.字符串(字符)类型. 数值类型 MySQL支持所有标准SQL数值数据类型. 这些类型包括严 ...

  7. 自动手动随便你 Win7驱动程序安装自己设

    Win7系统是非常智能方便的操作系统,可以自动安装硬件驱动程序,为用户提供了很多方便.但是并不是所有的驱动程序和硬件都能完美兼容,如果不合适就需要卸载了重新安装:还有一些朋友就习惯自己安装驱动,那么, ...

  8. Linux gdb调试器用法全面解析

    GDB是GNU开源组织发布的一个强大的UNIX下的程序调试工具,GDB主要可帮助工程师完成下面4个方面的功能: 启动程序,可以按照工程师自定义的要求随心所欲的运行程序. 让被调试的程序在工程师指定的断 ...

  9. 记录在Python2.7 x64 bit 下 PyQt5.8的编译过程

    由于工作需要使用python下面的Qt库.PyQt现在只提供针对Python3.X系列的PyQt,所有需要自己手动编译.防止忘记,特意写下随笔记录备忘. 工 作  环境:Python版本:Python ...

  10. windows 下 YII2 配置 memcache

    环境: 操作系统 :Windows 7; php: 5.6.8 apche:2.4.12 1.首先安装PHP  memcache 拓展,安装方法如下: 1.1下载 memcache 拓展DLL: ht ...