企业搜索引擎开发之连接器connector(二十六)
连接器通过监视器对象DocumentSnapshotRepositoryMonitor从上文提到的仓库对象SnapshotRepository(数据库仓库为DBSnapshotRepository)中迭代获取数据
监视器类DocumentSnapshotRepositoryMonitor在其构造方法初始化相关成员变量,这些成员属性都是与数据获取及数据处理逻辑相关的对象
/** This connector instance's current traversal schedule. */
private volatile TraversalSchedule traversalSchedule; /** Directory that contains snapshots. */
private final SnapshotStore snapshotStore; /** The root of the repository to monitor */
private final SnapshotRepository<? extends DocumentSnapshot> query; /** Reader for the current snapshot. */
private SnapshotReader snapshotReader; /** Callback to invoke when a change is detected. */
private final Callback callback; /** Current record from the snapshot. */
private DocumentSnapshot current; /** The snapshot we are currently writing */
private OrderedSnapshotWriter snapshotWriter; private final String name; private final DocumentSnapshotFactory documentSnapshotFactory; private final DocumentSink documentSink; /* Contains a checkpoint confirmation from CM. */
private MonitorCheckpoint guaranteeCheckpoint; /* The monitor should exit voluntarily if set to false */
private volatile boolean isRunning = true; /**
* Creates a DocumentSnapshotRepositoryMonitor that monitors the
* Repository rooted at {@code root}.
*
* @param name the name of this monitor (a hash of the start path)
* @param query query for files
* @param snapshotStore where snapshots are stored
* @param callback client callback
* @param documentSink destination for filtered out file info
* @param initialCp checkpoint when system initiated, could be {@code null}
* @param documentSnapshotFactory for un-serializing
* {@link DocumentSnapshot} objects.
*/
public DocumentSnapshotRepositoryMonitor(String name,
SnapshotRepository<? extends DocumentSnapshot> query,
SnapshotStore snapshotStore, Callback callback,
DocumentSink documentSink, MonitorCheckpoint initialCp,
DocumentSnapshotFactory documentSnapshotFactory) {
this.name = name;
this.query = query;
this.snapshotStore = snapshotStore;
this.callback = callback;
this.documentSnapshotFactory = documentSnapshotFactory;
this.documentSink = documentSink;
guaranteeCheckpoint = initialCp;
}
同时实现了Runnable接口,在override的run方法里面实现数据的处理逻辑
@Override
public void run() {
// Call NDC.push() via reflection, if possible.
invoke(ndcPush, "Monitor " + name);
try {
while (true) {
tryToRunForever();
// TODO: Remove items from this monitor that are in queues.
// Watch out for race conditions. The queues are potentially
// giving docs to CM as bad things happen in monitor.
// This TODO would be mitigated by a reconciliation with GSA.
performExceptionRecovery();
}
} catch (InterruptedException ie) {
LOG.info("Repository Monitor " + name + " received stop signal. " + this);
} finally {
// Call NDC.remove() via reflection, if possible.
invoke(ndcRemove);
}
}
进一步调用tryToRunForever()方法
private void tryToRunForever() throws InterruptedException {
try {
while (true) {
if (traversalSchedule == null || traversalSchedule.shouldRun()) {
// Start traversal
doOnePass();
}
else {
LOG.finest("Currently out of traversal window. "
+ "Sleeping for 15 minutes.");
// TODO(nashi): Calculate when it should wake up while
// handling TraversalScheduleAware events properly.
//没到点,休息
callback.passPausing(15*60*1000);
}
}
} catch (SnapshotWriterException e) {
String msg = "Failed to write to snapshot file: " + snapshotWriter.getPath();
LOG.log(Level.SEVERE, msg, e);
} catch (SnapshotReaderException e) {
String msg = "Failed to read snapshot file: " + snapshotReader.getPath();
LOG.log(Level.SEVERE, msg, e);
} catch (SnapshotStoreException e) {
String msg = "Problem with snapshot store.";
LOG.log(Level.SEVERE, msg, e);
} catch (SnapshotRepositoryRuntimeException e) {
String msg = "Failed reading repository.";
LOG.log(Level.SEVERE, msg, e);
}
}
在doOnePass()方法实现从仓库对象SnapshotRepository中获取数据,并将数据快照持久化到快照文件,并实现相关的数据处理逻辑(判断是新增 删除或更新等,
这些数据最后通过回调Callback接口添加到ChangeQueue对象中的阻塞队列)
/**
* 在doOnePass()方法中生成独立的快照读写器
* Makes one pass through the repository, notifying {@code visitor} of any
* changes.
*
* @throws InterruptedException
*/
private void doOnePass() throws SnapshotStoreException,
InterruptedException {
callback.passBegin();
try {
//快照读取器
// Open the most recent snapshot and read the first record.
this.snapshotReader = snapshotStore.openMostRecentSnapshot();
current = snapshotReader.read();
//快照写入器
// Create an snapshot writer for this pass.
this.snapshotWriter =
new OrderedSnapshotWriter(snapshotStore.openNewSnapshotWriter());
//下面代码为从仓库里面获取数据
for(DocumentSnapshot ss : query) {
//检查是否停止
if (false == isRunning) {
LOG.log(Level.INFO, "Exiting the monitor thread " + name
+ " " + this);
throw new InterruptedException();
} if (Thread.currentThread().isInterrupted()) {
throw new InterruptedException();
}
processDeletes(ss);
safelyProcessDocumentSnapshot(ss);
} //迭代完数据后,删除快照读取器后面多出来的部分(考虑数据源删除了后面的数据)
// Take care of any trailing paths in the snapshot.
processDeletes(null); } finally {
try {
snapshotStore.close(snapshotReader, snapshotWriter);
} catch (IOException e) {
LOG.log(Level.WARNING, "Failed closing snapshot reader and writer.", e);
// Try to proceed anyway. Weird they are not closing.
}
}
if (current != null) {
throw new IllegalStateException(
"Should not finish pass until entire read snapshot is consumed.");
}
//完工了,休息
callback.passComplete(getCheckpoint(-1));
snapshotStore.deleteOldSnapshots();
if (!callback.hasEnqueuedAtLeastOneChangeThisPass()) {
// No monitor checkpoints from this pass went to queue because
// there were no changes, so we can delete the snapshot we just wrote.
new java.io.File(snapshotWriter.getPath()).delete();
// TODO: Check return value; log trouble.
}
snapshotWriter = null;
snapshotReader = null;
}
processDeletes方法实现数据删除逻辑的处理
/**
* Process snapshot entries as deletes until {@code current} catches up with
* {@code documentSnapshot}. Or, if {@code documentSnapshot} is {@code null},
* process all remaining snapshot entries as deletes.
*
* @param documentSnapshot where to stop
* @throws SnapshotReaderException
* @throws InterruptedException
*/
private void processDeletes(DocumentSnapshot documentSnapshot)
throws SnapshotReaderException, InterruptedException {
//参数documentSnapshot大于当前current的,则删除当前的current;然后继续迭代快照里面下一个documentSnapshot
while (current != null
&& (documentSnapshot == null
|| COMPARATOR.compare(documentSnapshot, current) > 0)) {
callback.deletedDocument(
new DeleteDocumentHandle(current.getDocumentId()), getCheckpoint());
current = snapshotReader.read();
}
}
下面跟踪safelyProcessDocumentSnapshot方法
private void safelyProcessDocumentSnapshot(DocumentSnapshot snapshot)
throws InterruptedException, SnapshotReaderException,
SnapshotWriterException {
try {
processDocument(snapshot);
} catch (RepositoryException re) {
//TODO Log the exception or its message? in document sink perhaps.
//处理异常的snapshot
documentSink.add(snapshot.getDocumentId(), FilterReason.IO_EXCEPTION);
}
}
进一步调用processDocument方法,里面包括更新和新增数据的处理逻辑
/**
* Processes a document found in the document repository.
*
* @param documentSnapshot
* @throws RepositoryException
* @throws InterruptedException
* @throws SnapshotReaderException
* @throws SnapshotWriterException
*/
private void processDocument(DocumentSnapshot documentSnapshot)
throws InterruptedException, RepositoryException, SnapshotReaderException,
SnapshotWriterException {
// At this point 'current' >= 'file', or possibly current == null if
// we've processed the previous snapshot entirely.
if (current != null
&& COMPARATOR.compare(documentSnapshot, current) == 0) {
//处理发生变化的documentSnapshot,并更新当前的documentSnapshot
processPossibleChange(documentSnapshot);
} else {
// This file didn't exist during the previous scan.
//不存在该documentSnapshot
DocumentHandle documentHandle = documentSnapshot.getUpdate(null);
snapshotWriter.write(documentSnapshot); // Null if filtered due to mime-type.
if (documentHandle != null) {
callback.newDocument(documentHandle, getCheckpoint(-1));
}
}
}
处理更新情况
/**
* Processes a document found in the document repository that also appeared
* in the previous scan. Determines whether the document has changed,
* propagates changes to the client and writes the snapshot record.
*
* @param documentSnapshot
* @throws RepositoryException
* @throws InterruptedException
* @throws SnapshotWriterException
* @throws SnapshotReaderException
*/
private void processPossibleChange(DocumentSnapshot documentSnapshot)
throws RepositoryException, InterruptedException, SnapshotWriterException,
SnapshotReaderException {
//大概是对比hash值
DocumentHandle documentHandle = documentSnapshot.getUpdate(current);
//写入快照文件
snapshotWriter.write(documentSnapshot);
if (documentHandle == null) {
// No change.
//如果未发生改变,则不发送
} else {
// Normal change - send the gsa an update.
callback.changedDocument(documentHandle, getCheckpoint());
}
current = snapshotReader.read();
}
更新数据的快照和新增数据的快照首先持久化到最新的快照文件
数据提交通过回调callback成员的相关方法,最后将数据提交到ChangeQueue队列对象
Callback接口定义了数据处理的相关方法
/**
* 回调接口
* The client provides an implementation of this interface to receive
* notification of changes to the repository.
*/
public static interface Callback {
public void passBegin() throws InterruptedException; public void newDocument(DocumentHandle documentHandle,
MonitorCheckpoint mcp) throws InterruptedException; public void deletedDocument(DocumentHandle documentHandle,
MonitorCheckpoint mcp) throws InterruptedException; public void changedDocument(DocumentHandle documentHandle,
MonitorCheckpoint mcp) throws InterruptedException; public void passComplete(MonitorCheckpoint mcp) throws InterruptedException; public boolean hasEnqueuedAtLeastOneChangeThisPass(); public void passPausing(int sleepms) throws InterruptedException;
}
在ChangeQueue队列类内部定义了内部类Callback,实现了该接口,在其实现方法里面将提交的数据添加到ChangeQueue队列类的成员阻塞队列之中
/**
* 回调接口实现:向阻塞队列pendingChanges加入Change元素
* Adds {@link Change Changes} to this queue.
*/
private class Callback implements DocumentSnapshotRepositoryMonitor.Callback {
private int changeCount = 0; public void passBegin() {
changeCount = 0;
activityLogger.scanBeginAt(new Timestamp(System.currentTimeMillis()));
} /* @Override */
public void changedDocument(DocumentHandle dh, MonitorCheckpoint mcp)
throws InterruptedException {
++changeCount;
pendingChanges.put(new Change(Change.FactoryType.CLIENT, dh, mcp));
activityLogger.gotChangedDocument(dh.getDocumentId());
} /* @Override */
public void deletedDocument(DocumentHandle dh, MonitorCheckpoint mcp)
throws InterruptedException {
++changeCount;
pendingChanges.put(new Change(Change.FactoryType.INTERNAL, dh, mcp));
activityLogger.gotDeletedDocument(dh.getDocumentId());
} /* @Override */
public void newDocument(DocumentHandle dh, MonitorCheckpoint mcp)
throws InterruptedException {
++changeCount;
pendingChanges.put(new Change(Change.FactoryType.CLIENT, dh, mcp));
activityLogger.gotNewDocument(dh.getDocumentId());
} /* @Override */
public void passComplete(MonitorCheckpoint mcp) throws InterruptedException {
activityLogger.scanEndAt(new Timestamp(System.currentTimeMillis()));
if (introduceDelayAfterEveryScan || changeCount == 0) {
Thread.sleep(sleepInterval);
}
} public boolean hasEnqueuedAtLeastOneChangeThisPass() {
return changeCount > 0;
} /* @Override */
public void passPausing(int sleepms) throws InterruptedException {
Thread.sleep(sleepms);
}
}
---------------------------------------------------------------------------
本系列企业搜索引擎开发之连接器connector系本人原创
转载请注明出处 博客园 刺猬的温驯
本人邮箱: chenying998179@163#com (#改为.)
本文链接 http://www.cnblogs.com/chenying99/p/3789505.html
企业搜索引擎开发之连接器connector(二十六)的更多相关文章
- 企业搜索引擎开发之连接器connector(十六)
本人有一段时间没有接触企业搜索引擎之连接器的开发了,连接器是涉及企业搜索引擎一个重要的组件,在数据源与企业搜索引擎中间起一个桥梁的作用,类似于数据库之JDBC,通过连接器将不同数据源的数据适配到企业搜 ...
- 企业搜索引擎开发之连接器connector(十九)
连接器是基于http协议通过推模式(push)向数据接收服务端推送数据,即xmlfeed格式数据(xml格式),其发送数据接口命名为Pusher Pusher接口定义了与发送数据相关的方法 publi ...
- 企业搜索引擎开发之连接器connector(十八)
创建并启动连接器实例之后,连接器就会基于Http协议向指定的数据接收服务器发送xmlfeed格式数据,我们可以通过配置http代理服务器抓取当前基于http协议格式的数据(或者也可以通过其他网络抓包工 ...
- 企业搜索引擎开发之连接器connector(二十九)
在哪里调用监控器管理对象snapshotRepositoryMonitorManager的start方法及stop方法,然后又在哪里调用CheckpointAndChangeQueue对象的resum ...
- 企业搜索引擎开发之连接器connector(二十八)
通常一个SnapshotRepository仓库对象对应一个DocumentSnapshotRepositoryMonitor监视器对象,同时也对应一个快照存储器对象,它们的关联是通过监视器管理对象D ...
- 企业搜索引擎开发之连接器connector(二十五)
下面开始具体分析连接器是怎么与连接器实例交互的,这里主要是分析连接器怎么从连接器实例获取数据的(前面文章有涉及基于http协议与连接器的xml格式的交互,连接器对连接器实例的设置都是通过配置文件操作的 ...
- 企业搜索引擎开发之连接器connector(二十四)
本人在上文中提到,连接器实现了两种事件依赖的机制 ,其一是我们手动操作连接器实例时:其二是由连接器的自动更新机制 上文中分析了连接器的自动更新机制,即定时器执行定时任务 那么,如果我们手动操作连接器实 ...
- 企业搜索引擎开发之连接器connector(二十二)
下面来分析线程执行类,线程池ThreadPool类 对该类的理解需要对java的线程池比较熟悉 该类引用了一个内部类 /** * The lazily constructed LazyThreadPo ...
- 企业搜索引擎开发之连接器connector(二十)
连接器里面衔接数据源与数据推送对象的是QueryTraverser类对象,该类实现了Traverser接口 /** * Interface presented by a Traverser. Used ...
随机推荐
- /etc/sysctl.conf参数解释(转)
来自<深入理解Nginx模块开发与架构解析> P9 #表示进程(例如一个worker进程)可能同时打开的最大句柄数,直接限制最大并发连接数fs.file max = 999999 #1代表 ...
- 02 - Unit04:笔记本加载功能
@ExceptionHandler 在控制器中统一处理异常. 为了重用异常处理方法,可以将处理方法抽象到父类中,子类共享异常处理方法. 语法: @ExceptionHandler(Exception. ...
- Mysql慢查询日志过程
原创地址 :http://itlab.idcquan.com/linux/MYSQL/922126.html mysql慢查询日志对于跟踪有问题的查询非常有用,可以分析出代码实现中耗费资源的sql语句 ...
- linux 内核调试之关键函数名记要
gdbserver + gdb 调试内核 记到函数名,其它就能用gdb看了 start_kernel 内核启动 run_init_process init进程启动 主要是根据shell脚本初始化 ...
- [转]Web前端浏览器兼容
转自: http://www.admin10000.com/document/1900.html 前言 浏览器兼容是前端开发人员必须掌握的一个技能,但是初入前端的同学或者其他后台web开发同学往往容易 ...
- 利用VMware在虚拟机上安装Zookeeper集群
http://blog.csdn.net/u010246789/article/details/52101026 利用VMware在虚拟机上安装Zookeeper集群 pasting
- 基于SOA的编程模型
1.webservice是SOA架构的一种实现 ============================================================================ ...
- 《OpenCL异构并行编程实战》补充笔记散点,第五至十二章
▶ 第五章,OpenCL 的并发与执行模型 ● 内存对象与上下文相关而不是与设备相关.设备在不同设备之间的移动如下,如果 kernel 在第二个设备上运行,那么在第一个设备上产生的任何数据结果在第二个 ...
- Haartraining 训练方法(这个样例真有用,能行)
目标检测方法最初由Paul Viola [Viola01]提出,并由Rainer Lienhart [Lienhart02]对这一方法进行了改善.该方法的基本步骤为: 首先,利用样本(大约几百幅样本图 ...
- 【C#】串口操作实用类
做工业通 信有很长时间了,特别是串口(232/485),有VB/VC/C各种版本的串口操作代码,这些代码也经过了多年的现场考验,应该说是比较健壮的代码,但 是目前却没有C#相对成熟的串口操作代码,最近 ...