连接器里面衔接数据源与数据推送对象的是QueryTraverser类对象,该类实现了Traverser接口

/**
* Interface presented by a Traverser. Used by the Scheduler.
*/
public interface Traverser { /**
* Interval to wait after a transient error before retrying a traversal.
*/
public static final int ERROR_WAIT_MILLIS = 15 * 60 * 1000; /**
* Runs a batch of documents. The Traversal method may be hard (impossible?)
* to interrupt while it is executing runBatch(). It is expected that a
* thread loop running a traversal method would call runBatch(), then check
* for InterruptedException, then decide whether it wants to stop of itself,
* for scheduling reasons, or for a clean shutdown. It could then re-adjust
* the batch hint if desired, then repeat.
*
* @param batchSize A {@link BatchSize} instructs the traversal method to
* process approximately {@code batchSize.getHint()}, but no more
* than {@code batchSize.getMaximum()} number of documents in this
* batch.
* @return A {@link BatchResult} containing the actual number of documents
* from this batch given to the feed and a possible policy to delay
* before requesting another batch.
*/
public BatchResult runBatch(BatchSize batchSize); /**
* Cancel the Batch in progress. Discard the batch. This might be called
* when the workItem times out, connector deletion or reconfiguration, or
* during shutdown.
*/
public void cancelBatch();
}

即上面的BatchResult runBatch(BatchSize batchSize)方法,参数BatchSize batchSize表示批次大小

QueryTraverser类对象通过引用TraversalManager queryTraversalManager对象实例获取数据源数据,同时引用PusherFactory pusherFactory对象实例实例化docPuser对象实例发送document对象数据,成员变量TraversalStateStore stateStore用于获取状态和保存状态(用于断点发送)

 @Override
public BatchResult runBatch(BatchSize batchSize) {
//开始时间
final long startTime = clock.getTimeMillis();
//超时时间
final long timeoutTime = startTime
+ traversalContext.traversalTimeLimitSeconds() * 1000;
//已取消
if (isCancelled()) {
LOGGER.warning("Attempting to run a cancelled QueryTraverser");
return new BatchResult(TraversalDelayPolicy.ERROR);
}
try {
//批次大小
queryTraversalManager.setBatchHint(batchSize.getHint());
} catch (RepositoryException e) {
LOGGER.log(Level.WARNING, "Unable to set batch hint", e);
} String connectorState;
try {
if (stateStore != null) {
//获取断点状态
connectorState = stateStore.getTraversalState();
} else {
throw new IllegalStateException("null TraversalStateStore");
}
} catch (IllegalStateException ise) {
// We get here if the store for the connector is disabled.
// That happens if the connector was deleted while we were asleep.
// Our connector seems to have been deleted. Don't process a batch.
LOGGER.fine("Halting traversal for connector " + connectorName
+ ": " + ise.getMessage());
return new BatchResult(TraversalDelayPolicy.ERROR);
} DocumentList resultSet = null;
if (connectorState == null) {
try {
LOGGER.fine("START TRAVERSAL: Starting traversal for connector "
+ connectorName);
resultSet = queryTraversalManager.startTraversal();
} catch (Exception e) {
LOGGER.log(Level.WARNING, "startTraversal threw exception: ", e);
return new BatchResult(TraversalDelayPolicy.ERROR);
}
} else {
try {
LOGGER.fine("RESUME TRAVERSAL: Resuming traversal for connector "
+ connectorName + " from checkpoint " + connectorState);
resultSet = queryTraversalManager.resumeTraversal(connectorState);
} catch (Exception e) {
LOGGER.log(Level.WARNING, "resumeTraversal threw exception: ", e);
return new BatchResult(TraversalDelayPolicy.ERROR);
}
} // If the traversal returns null, that means that the repository has
// no new content to traverse.
if (resultSet == null) {
LOGGER.fine("Result set from connector " + connectorName
+ " is NULL, no documents returned for traversal.");
return new BatchResult(TraversalDelayPolicy.POLL, 0);
} Pusher pusher = null;
//反馈信息
BatchResult result = null;
int counter = 0;
try {
//同一批次同一个pusher实例
// Get a Pusher for feeding the returned Documents.
pusher = pusherFactory.newPusher(connectorName); while (true) {
if (Thread.currentThread().isInterrupted() || isCancelled()) {
LOGGER.fine("Traversal for connector " + connectorName
+ " has been interrupted; breaking out of batch run.");
break;
}
if (clock.getTimeMillis() >= timeoutTime) {
LOGGER.fine("Traversal batch for connector " + connectorName
+ " is completing due to time limit.");
break;
} String docid = null;
try {
LOGGER.finer("Pulling next document from connector " + connectorName); Document nextDocument = resultSet.nextDocument();
//该resultSet数据集合批次已发送完毕
if (nextDocument == null) {
LOGGER.finer("Traversal batch for connector " + connectorName
+ " at end after processing " + counter + " documents."); break;
} else {
//System.out.println("resultSet.getClass().getName():"+resultSet.getClass().getName());
//System.out.println("nextDocument.getClass().getName():"+nextDocument.getClass().getName());
// Since there are a couple of places below that could throw
// exceptions but not exit the while loop, the counter should be
// incremented here to insure it represents documents returned from
// the list. Note the call to nextDocument() could also throw a
// RepositoryDocumentException signaling a skipped document in which
// case the call will not be counted against the batch maximum.
counter++;
// Fetch DocId to use in messages.
try {
docid = Value.getSingleValueString(nextDocument,
SpiConstants.PROPNAME_DOCID);
} catch (IllegalArgumentException e1) {
LOGGER.finer("Unable to get document id for document ("
+ nextDocument + "): " + e1.getMessage());
} catch (RepositoryException e1) {
LOGGER.finer("Unable to get document id for document ("
+ nextDocument + "): " + e1.getMessage());
}
}
LOGGER.finer("Sending document (" + docid + ") from connector "
+ connectorName + " to Pusher");
//发布document
if (pusher.take(nextDocument) != PusherStatus.OK) {
LOGGER.fine("Traversal batch for connector " + connectorName
+ " is completing at the request of the Pusher,"
+ " after processing " + counter + " documents.");
break;
}
} catch (SkippedDocumentException e) {
/* TODO (bmj): This is a temporary solution and should be replaced.
* It uses Exceptions for non-exceptional cases.
*/
// Skip this document. Proceed on to the next one.
logSkippedDocument(docid, e);
} catch (RepositoryDocumentException e) {
// Skip individual documents that fail. Proceed on to the next one.
logSkippedDocument(docid, e);
} catch (RuntimeException e) {
// Skip individual documents that fail. Proceed on to the next one.
logSkippedDocument(docid, e);
}
}
// No more documents. Wrap up any accumulated feed data and send it off.
if (!isCancelled()) {
pusher.flush();
}
} catch (OutOfMemoryError e) {
pusher.cancel();
System.runFinalization();
System.gc();
result = new BatchResult(TraversalDelayPolicy.ERROR);
try {
LOGGER.severe("Out of JVM Heap Space. Will retry later.");
LOGGER.log(Level.FINEST, e.getMessage(), e);
} catch (Throwable t) {
// OutOfMemory state may prevent us from logging the error.
// Don't make matters worse by rethrowing something meaningless.
}
} catch (RepositoryException e) {
// Drop the entire batch on the floor. Do not call checkpoint
// (as there is a discrepancy between what the Connector thinks
// it has fed, and what actually has been pushed).
LOGGER.log(Level.SEVERE, "Repository Exception during traversal.", e);
result = new BatchResult(TraversalDelayPolicy.ERROR);
} catch (PushException e) {
LOGGER.log(Level.SEVERE, "Push Exception during traversal.", e);
// Drop the entire batch on the floor. Do not call checkpoint
// (as there is a discrepancy between what the Connector thinks
// it has fed, and what actually has been pushed).
result = new BatchResult(TraversalDelayPolicy.ERROR);
} catch (FeedException e) {
LOGGER.log(Level.SEVERE, "Feed Exception during traversal.", e);
// Drop the entire batch on the floor. Do not call checkpoint
// (as there is a discrepancy between what the Connector thinks
// it has fed, and what actually has been pushed).
result = new BatchResult(TraversalDelayPolicy.ERROR);
} catch (Throwable t) {
LOGGER.log(Level.SEVERE, "Uncaught Exception during traversal.", t);
// Drop the entire batch on the floor. Do not call checkpoint
// (as there is a discrepancy between what the Connector thinks
// it has fed, and what actually has been pushed).
result = new BatchResult(TraversalDelayPolicy.ERROR);
} finally {
// If we have cancelled the work, abandon the batch.
if (isCancelled()) {
result = new BatchResult(TraversalDelayPolicy.ERROR);
} //更新断点状态
// Checkpoint completed work as well as skip past troublesome documents
// (e.g. documents that are too large and will always fail).
if ((result == null) && (checkpointAndSave(resultSet) == null)) {
// Unable to get a checkpoint, so wait a while, then retry batch.
result = new BatchResult(TraversalDelayPolicy.ERROR);
}
}
if (result == null) {
result = new BatchResult(TraversalDelayPolicy.IMMEDIATE, counter,
startTime, clock.getTimeMillis());
} else if (pusher != null) {
// We are returning an error from this batch. Cancel any feed that
// might be in progress.
pusher.cancel();
}
return result;
}

关键代码本人已作了注释,通过遍历该数据集合批次,向docPusher对象提交document对象,遍历document对象执行完毕后更新断点状态用于下次从数据源获取数据

/**
* 保存断点状态
* @param pm
* @return
*/
private String checkpointAndSave(DocumentList pm) {
String connectorState = null;
LOGGER.fine("CHECKPOINT: Generating checkpoint for connector "
+ connectorName);
try {
connectorState = pm.checkpoint();
} catch (RepositoryException re) {
// If checkpoint() throws RepositoryException, it means there is no
// new checkpoint.
LOGGER.log(Level.FINE, "Failed to obtain checkpoint for connector "
+ connectorName, re);
return null;
} catch (Exception e) {
LOGGER.log(Level.INFO, "Failed to obtain checkpoint for connector "
+ connectorName, e);
return null;
}
try {
if (connectorState != null) {
if (stateStore != null) {
stateStore.storeTraversalState(connectorState);
} else {
throw new IllegalStateException("null TraversalStateStore");
}
LOGGER.fine("CHECKPOINT: " + connectorState);
}
return connectorState;
} catch (IllegalStateException ise) {
// We get here if the store for the connector is disabled.
// That happens if the connector was deleted while we were working.
// Our connector seems to have been deleted. Don't save a checkpoint.
LOGGER.fine("Checkpoint discarded: " + connectorState);
}
return null;
}

取消执行方法通过设置布尔变量值,注意需要考虑同步

/**
* 取消执行
*/
@Override
public void cancelBatch() {
synchronized(cancelLock) {
cancelWork = true;
}
LOGGER.fine("Cancelling traversal for connector " + connectorName);
}

---------------------------------------------------------------------------

本系列企业搜索引擎开发之连接器connector系本人原创

转载请注明出处 博客园 刺猬的温驯

本人邮箱: chenying998179@163#com (#改为.)

本文链接 http://www.cnblogs.com/chenying99/p/3775534.html

企业搜索引擎开发之连接器connector(二十)的更多相关文章

  1. 企业搜索引擎开发之连接器connector(十九)

    连接器是基于http协议通过推模式(push)向数据接收服务端推送数据,即xmlfeed格式数据(xml格式),其发送数据接口命名为Pusher Pusher接口定义了与发送数据相关的方法 publi ...

  2. 企业搜索引擎开发之连接器connector(十八)

    创建并启动连接器实例之后,连接器就会基于Http协议向指定的数据接收服务器发送xmlfeed格式数据,我们可以通过配置http代理服务器抓取当前基于http协议格式的数据(或者也可以通过其他网络抓包工 ...

  3. 企业搜索引擎开发之连接器connector(十六)

    本人有一段时间没有接触企业搜索引擎之连接器的开发了,连接器是涉及企业搜索引擎一个重要的组件,在数据源与企业搜索引擎中间起一个桥梁的作用,类似于数据库之JDBC,通过连接器将不同数据源的数据适配到企业搜 ...

  4. 企业搜索引擎开发之连接器connector(二十九)

    在哪里调用监控器管理对象snapshotRepositoryMonitorManager的start方法及stop方法,然后又在哪里调用CheckpointAndChangeQueue对象的resum ...

  5. 企业搜索引擎开发之连接器connector(二十八)

    通常一个SnapshotRepository仓库对象对应一个DocumentSnapshotRepositoryMonitor监视器对象,同时也对应一个快照存储器对象,它们的关联是通过监视器管理对象D ...

  6. 企业搜索引擎开发之连接器connector(二十六)

    连接器通过监视器对象DocumentSnapshotRepositoryMonitor从上文提到的仓库对象SnapshotRepository(数据库仓库为DBSnapshotRepository)中 ...

  7. 企业搜索引擎开发之连接器connector(二十五)

    下面开始具体分析连接器是怎么与连接器实例交互的,这里主要是分析连接器怎么从连接器实例获取数据的(前面文章有涉及基于http协议与连接器的xml格式的交互,连接器对连接器实例的设置都是通过配置文件操作的 ...

  8. 企业搜索引擎开发之连接器connector(二十四)

    本人在上文中提到,连接器实现了两种事件依赖的机制 ,其一是我们手动操作连接器实例时:其二是由连接器的自动更新机制 上文中分析了连接器的自动更新机制,即定时器执行定时任务 那么,如果我们手动操作连接器实 ...

  9. 企业搜索引擎开发之连接器connector(二十二)

    下面来分析线程执行类,线程池ThreadPool类 对该类的理解需要对java的线程池比较熟悉 该类引用了一个内部类 /** * The lazily constructed LazyThreadPo ...

随机推荐

  1. Python yield详解***

    yield的英文单词意思是生产,有时候感到非常困惑,一直没弄明白yield的用法. 只是粗略的知道yield可以用来为一个函数返回值塞数据,比如下面的例子: def addlist(alist): f ...

  2. 智能家居入门DIY——【四、组合】

    前面几篇介绍了一些传感器和代码,这篇介绍一下把它们组合起来.之所以单独列出这部分,原因在于组合更多功能的时候发现使用软串口库驱动ESP8266时由于内存过小导致发送失败甚至整个系统无法工作的情况.所以 ...

  3. Traits

    'folly/Traits.h' Implements traits complementary to those provided in <type_traits> Implements ...

  4. 转帖:关于MongoDB你需要知道的几件事

    Henrique Lobo Weissmann 是一位来自于巴西的软件开发者,他是 itexto 公司的联合创始人,这是一家咨询公司.近日,Henrique 在博客上撰文谈到了关于 MongoDB 的 ...

  5. msxml3.dll 执行页内操作时的错误

    msxml3.dll 执行页内操作时的错误 regsvr32 msxml3.dll报错

  6. Date 当前程序日期格式 参数设置 DecimalSeparator

    日期格式.货币格式等 Date DateFormat DecimalSeparator FormatSettings FormatSettings.DateSeparator='-'; 控制面板的日期 ...

  7. 设置myeclipse文件的打开格式

  8. 回到顶部最简单的JQuery实现代码

    CSS代码,使用了fixed让对象固定于浏览器窗口: top{position:fixed;bottom:0;right:10px;} jQuery代码,注意正常使用的几个条件:$('#top').c ...

  9. JDK动态代理代码示例

    JDK动态代理代码示例 业务接口 实现了业务接口的业务类 实现了InvocationHandler接口的handler代理类 1.业务接口 package com.wzq.demo01; /** * ...

  10. js Excel导出

    <!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <meta http ...