连接器是基于http协议通过推模式(push)向数据接收服务端推送数据,即xmlfeed格式数据(xml格式),其发送数据接口命名为Pusher

Pusher接口定义了与发送数据相关的方法

public interface Pusher {

  /**
* Status indicating the readiness of the Pusher.
*/
public static enum PusherStatus {
OK, LOW_MEMORY, LOCAL_FEED_BACKLOG, GSA_FEED_BACKLOG, DISABLED;
} /**
* Takes an spi Document and pushes it along, presumably to the GSA Feed.
*
* @param document A Document
* @return PusherStatus. If OK, Pusher may accept more documents.
* @throws RepositoryException if transient error accessing the Repository
* @throws RepositoryDocumentException if fatal error accessing the Document
* @throws FeedException if a transient Feed error occurs in the Pusher
* @throws PushException if a transient error occurs in the Pusher
*/
public PusherStatus take(Document document)
throws PushException, FeedException, RepositoryException; /**
* Finishes processing a document feed. If the caller anticipates no
* further calls to {@link #take(Document)} will be
* made, this method should be called, so that the Pusher may send a cached,
* accumulated Feed to the feed processor.
*
* @throws RepositoryException if transient error accessing the Repository
* @throws RepositoryDocumentException if fatal error accessing the Document
* @throws FeedException if a transient Feed error occurs in the Pusher
* @throws PushException if a transient error occurs in the Pusher
*/
public void flush()
throws PushException, FeedException, RepositoryException; /**
* Cancels a feed. Discard any accumulated feed data.
*/
public void cancel(); /**
* Gets the current pusher status.
*
* @return the current PusherStatus
* @throws RepositoryException if transient error accessing the Repository
* @throws FeedException if a transient Feed error occurs in the Pusher
* @throws PushException if a transient error occurs in the Pusher
*/
public PusherStatus getPusherStatus()
throws PushException, FeedException, RepositoryException;
}

该接口的通过PusherStatus take(Document document)方法发送数据,我们可以查看其实现类DocPusher查看发送数据方法源码

/**
* Takes a Document and sends a the feed to the GSA.
*
* @param document Document corresponding to the document.
* @return true if Pusher should accept more documents, false otherwise.
* @throws PushException if Pusher problem
* @throws FeedException if transient Feed problem
* @throws RepositoryDocumentException if fatal Document problem
* @throws RepositoryException if transient Repository problem
*/
@Override
public PusherStatus take(Document document)
throws PushException, FeedException, RepositoryException {
if (feedSender.isShutdown()) {
return PusherStatus.DISABLED;
}
checkSubmissions(); // Apply any configured Document filters to the document.
document = documentFilterFactory.newDocumentFilter(document); FeedType feedType;
try {
feedType = DocUtils.getFeedType(document);
} catch (RuntimeException e) {
LOGGER.log(Level.WARNING,
"Rethrowing RuntimeException as RepositoryDocumentException", e);
throw new RepositoryDocumentException(e);
} // All feeds in a feed file must be of the same type.
// If the feed would change type, send the feed off to the GSA
// and start a new one.
// TODO: Fix this check to allow ACLs in any type feed.
if (xmlFeed != null && !feedType.isCompatible(xmlFeed.getFeedType())) {
if (LOGGER.isLoggable(Level.FINE)) {
LOGGER.fine("A new feedType, " + feedType + ", requires a new feed for "
+ connectorName + ". Closing feed and sending to GSA.");
}
submitFeed();
} if (xmlFeed == null) {
if (LOGGER.isLoggable(Level.FINE)) {
LOGGER.fine("Creating new " + feedType + " feed for " + connectorName);
}
try {
startNewFeed(feedType);
} catch (OutOfMemoryError me) {
throw new PushException("Unable to allocate feed buffer. Try reducing"
+ " the maxFeedSize setting, reducing the number of connector"
+ " intances, or adjusting the JVM heap size parameters.", me);
}
} boolean isThrowing = false;
int resetPoint = xmlFeed.size();
int resetCount = xmlFeed.getRecordCount();
try {
if (LOGGER.isLoggable(Level.FINER)) {
LOGGER.log(Level.FINER, "DOCUMENT: Adding document with docid={0} and "
+ "searchurl={1} from connector {2} to feed.", new Object[] {
DocUtils.getOptionalString(document, SpiConstants.PROPNAME_DOCID),
DocUtils.getOptionalString(document,
SpiConstants.PROPNAME_SEARCHURL),
connectorName});
} // Add this document to the feed.
xmlFeed.addRecord(document); // If the feed is full, send it off to the GSA.
if (xmlFeed.isFull() || lowMemory()) {
if (LOGGER.isLoggable(Level.FINE)) {
LOGGER.fine("Feed for " + connectorName + " has grown to "
+ xmlFeed.size() + " bytes. Closing feed and sending to GSA.");
}
submitFeed();
return getPusherStatus();
} // Indicate that this Pusher may accept more documents.
return PusherStatus.OK; } catch (OutOfMemoryError me) {
resetFeed(resetPoint, resetCount);
throw new PushException("Out of memory building feed, retrying.", me);
} catch (RuntimeException e) {
resetFeed(resetPoint, resetCount);
LOGGER.log(Level.WARNING,
"Rethrowing RuntimeException as RepositoryDocumentException", e);
throw new RepositoryDocumentException(e);
} catch (RepositoryDocumentException rde) {
// Skipping this document, remove it from the feed.
resetFeed(resetPoint, resetCount);
throw rde;
} catch (IOException ioe) {
LOGGER.log(Level.SEVERE, "IOException while reading: skipping", ioe);
resetFeed(resetPoint, resetCount);
Throwable t = ioe.getCause();
isThrowing = true;
if (t != null && (t instanceof RepositoryException)) {
throw (RepositoryException) t;
} else {
throw new RepositoryDocumentException("I/O error reading data", ioe);
}
}
}

在上面的方法中,首先需要将参数Document documen对象经过包装(如Base64编码等)添加到xmlFeed集合中,当xmlFeed集合满足条件的时候才向数据服务器发送过去,即每次向数据服务器发送的是document对象集合,而不是单独的document对象

当xmlFeed对象达到指定要求时,便调用submitFeed()方法提交xmlFeed对象

/**
* Takes the accumulated XmlFeed and sends the feed to the GSA.
*
* @throws PushException
* if Pusher problem
* @throws FeedException
* if transient Feed problem
* @throws RepositoryException
*/
private void submitFeed() throws PushException, FeedException,
RepositoryException {
if (xmlFeed == null) {
return;
} final XmlFeed feed = xmlFeed;
xmlFeed = null;
final String logMessage;
if (feedLog != null) {
logMessage = feedLog.toString();
feedLog = null;
} else {
logMessage = null;
} try {
feed.close();
} catch (IOException ioe) {
throw new PushException("Error closing feed", ioe);
} try {
// Send the feed to the GSA in a separate thread.
FutureTask<String> future = new FutureTask<String>(
new Callable<String>() {
public String call() throws PushException,
FeedException, RepositoryException {
try {
NDC.push("Feed " + feed.getDataSource());
return submitFeed(feed, logMessage);
} finally {
NDC.remove();
}
}
});
feedSender.execute(future);
// Add the future to list of outstanding submissions.
synchronized (submissions) {
submissions.add(future);
}
} catch (RejectedExecutionException ree) {
throw new FeedException("Asynchronous feed was rejected. ", ree);
}
}

该方法首先将数据发送方法封装到FutureTask对象的call()方法里面,然后在线程池里面执行之,最后将future结构句柄添加到LinkedList<FutureTask<String>> submissions集合

下面是调用feedConnection对象发送数据方法

/**
* Takes the supplied XmlFeed and sends that feed to the GSA.
*
* @param feed
* an XmlFeed
* @param logMessage
* a Feed Log message
* @return response String from GSA
* @throws PushException
* if Pusher problem
* @throws FeedException
* if transient Feed problem
* @throws RepositoryException
*/
private String submitFeed(XmlFeed feed, String logMessage)
throws PushException, FeedException, RepositoryException { if (LOGGER.isLoggable(Level.FINE)) {
LOGGER.fine("Submitting " + feed.getFeedType() + " feed for "
+ feed.getDataSource() + " to the GSA. "
+ feed.getRecordCount() + " records totaling "
+ feed.size() + " bytes.");
} // Write the generated feedLog message to the feed logger.
if (logMessage != null && FEED_LOGGER.isLoggable(FEED_LOG_LEVEL)) {
FEED_LOGGER.log(FEED_LOG_LEVEL, logMessage);
}
//将xmlfeed写入临时文件
// Write the Feed to the TeedFeedFile, if one was specified.
String teedFeedFilename = Context.getInstance().getTeedFeedFile();
// String teedFeedFilename = "D:/files/google2.txt";
if (teedFeedFilename != null) {
boolean isThrowing = false;
OutputStream os = null;
try {
os = new FileOutputStream(teedFeedFilename, true);
feed.writeTo(os);
} catch (IOException e) {
isThrowing = true;
throw new FeedException("Cannot write to file: "
+ teedFeedFilename, e);
} finally {
if (os != null) {
try {
os.close();
} catch (IOException e) {
if (!isThrowing) {
throw new FeedException("Cannot write to file: "
+ teedFeedFilename, e);
}
}
}
}
} String gsaResponse = feedConnection.sendData(feed);
if (!gsaResponse.equals(GsaFeedConnection.SUCCESS_RESPONSE)) {
String eMessage = gsaResponse;
if (GsaFeedConnection.UNAUTHORIZED_RESPONSE.equals(gsaResponse)) {
eMessage += ": Client is not authorized to send feeds. Make "
+ "sure the GSA is configured to trust feeds from your host.";
}
if (GsaFeedConnection.INTERNAL_ERROR_RESPONSE.equals(gsaResponse)) {
eMessage += ": Check GSA status or feed format.";
}
throw new PushException(eMessage);
}
return gsaResponse;
}

如果配置了teedFeedFilename属性,则现将xmlfeed对象写入该文件,然后才是调用feedConnection对象实例发送xmlfeed对象数据

feedConnection对象基于http协议发送数据,下面是feedConnection对象发送数据方法

/* @Override */
public String sendData(FeedData feedData) throws FeedException {
try {
String response = sendFeedData((XmlFeed) feedData);
gotFeedError = !response.equalsIgnoreCase(SUCCESS_RESPONSE);
return response;
} catch (FeedException fe) {
gotFeedError = true;
throw fe;
}
}

进一步调用sendFeedData(XmlFeed feed)方法发送

private String sendFeedData(XmlFeed feed) throws FeedException {
String feedType = feed.getFeedType().toLegacyString();
String dataSource = feed.getDataSource();
OutputStream outputStream;
HttpURLConnection uc;
StringBuilder buf = new StringBuilder();
byte[] prefix;
byte[] suffix;
try {
// Build prefix.
controlHeader(buf, "datasource", ServletUtil.MIMETYPE_TEXT_PLAIN);
buf.append(dataSource).append(CRLF);
controlHeader(buf, "feedtype", ServletUtil.MIMETYPE_TEXT_PLAIN);
buf.append(feedType).append(CRLF);
controlHeader(buf, "data", ServletUtil.MIMETYPE_XML);
prefix = buf.toString().getBytes("UTF-8"); // Build suffix.
buf.setLength(0);
buf.append(CRLF).append("--").append(BOUNDARY).append("--")
.append(CRLF);
suffix = buf.toString().getBytes("UTF-8"); LOGGER.finest("Opening feed connection to " + feedUrl); synchronized (this) {
uc = (HttpURLConnection) feedUrl.openConnection();
}
if (uc instanceof HttpsURLConnection && !validateCertificate) {
SslUtil.setTrustingHttpsOptions((HttpsURLConnection) uc);
} uc.setRequestProperty("Charsert", "UTF-8");
uc.setDoInput(true);
uc.setDoOutput(true);
uc.setFixedLengthStreamingMode(prefix.length + feed.size()
+ suffix.length);
uc.setRequestProperty("Content-Type",
"multipart/form-data; boundary=" + BOUNDARY); outputStream = uc.getOutputStream();
} catch (IOException ioe) {
throw new FeedException(feedUrl.toString(), ioe);
} catch (GeneralSecurityException e) {
throw new FeedException(feedUrl.toString(), e);
} boolean isThrowing = false;
buf.setLength(0);
try {
LOGGER.finest("Writing feed data to feed connection.");
// If there is an exception during this read/write, we do our
// best to close the url connection and read the result.
try {
outputStream.write(prefix);
feed.writeTo(outputStream);
outputStream.write(suffix);
outputStream.flush();
} catch (IOException e) {
LOGGER.log(Level.SEVERE,
"IOException while posting: will retry later", e);
isThrowing = true;
throw new FeedException(e);
} catch (RuntimeException e) {
isThrowing = true;
throw e;
} catch (Error e) {
isThrowing = true;
throw e;
} finally {
try {
outputStream.close();
} catch (IOException e) {
LOGGER.log(
Level.SEVERE,
"IOException while closing after post: will retry later",
e);
if (!isThrowing) {
isThrowing = true;
throw new FeedException(e);
}
}
}
} finally {
BufferedReader br = null;
try {
LOGGER.finest("Waiting for response from feed connection.");
InputStream inputStream = uc.getInputStream();
br = new BufferedReader(new InputStreamReader(inputStream,
"UTF8"));
String line;
while ((line = br.readLine()) != null) {
buf.append(line);
}
} catch (IOException ioe) {
if (!isThrowing) {
throw new FeedException(ioe);
}
} finally {
try {
if (br != null) {
br.close();
}
} catch (IOException e) {
LOGGER.log(Level.SEVERE,
"IOException while closing after post: continuing",
e);
}
if (uc != null) {
uc.disconnect();
}
if (LOGGER.isLoggable(Level.FINEST)) {
LOGGER.finest("Received response from feed connection: "
+ buf.toString());
}
}
}
return buf.toString();
}

这里的Content-Type设置为multipart/form-data,类似于我们通过表单上传文件时设置的form属性,通过向输出流写入xmlfeed的二进制数据,然后从输入流接收反馈结果,,具体发送数据的格式本人在企业搜索引擎开发之连接器connector(十八)已提及过,本文不再描述

---------------------------------------------------------------------------

本系列企业搜索引擎开发之连接器connector系本人原创

转载请注明出处 博客园 刺猬的温驯

本人邮箱: chenying998179@163#com (#改为.)

本文链接 http://www.cnblogs.com/chenying99/p/3775504.html

企业搜索引擎开发之连接器connector(十九)的更多相关文章

  1. 企业搜索引擎开发之连接器connector(二十九)

    在哪里调用监控器管理对象snapshotRepositoryMonitorManager的start方法及stop方法,然后又在哪里调用CheckpointAndChangeQueue对象的resum ...

  2. 企业搜索引擎开发之连接器connector(三十)

    连接器里面采用的什么样的数据结构,我们先从Document迭代器开始入手,具体的Document迭代器类都实现了DocumentList接口,该接口定义了两个方法 public interface D ...

  3. 企业搜索引擎开发之连接器connector(二十八)

    通常一个SnapshotRepository仓库对象对应一个DocumentSnapshotRepositoryMonitor监视器对象,同时也对应一个快照存储器对象,它们的关联是通过监视器管理对象D ...

  4. 企业搜索引擎开发之连接器connector(二十六)

    连接器通过监视器对象DocumentSnapshotRepositoryMonitor从上文提到的仓库对象SnapshotRepository(数据库仓库为DBSnapshotRepository)中 ...

  5. 企业搜索引擎开发之连接器connector(二十五)

    下面开始具体分析连接器是怎么与连接器实例交互的,这里主要是分析连接器怎么从连接器实例获取数据的(前面文章有涉及基于http协议与连接器的xml格式的交互,连接器对连接器实例的设置都是通过配置文件操作的 ...

  6. 企业搜索引擎开发之连接器connector(二十四)

    本人在上文中提到,连接器实现了两种事件依赖的机制 ,其一是我们手动操作连接器实例时:其二是由连接器的自动更新机制 上文中分析了连接器的自动更新机制,即定时器执行定时任务 那么,如果我们手动操作连接器实 ...

  7. 企业搜索引擎开发之连接器connector(二十二)

    下面来分析线程执行类,线程池ThreadPool类 对该类的理解需要对java的线程池比较熟悉 该类引用了一个内部类 /** * The lazily constructed LazyThreadPo ...

  8. 企业搜索引擎开发之连接器connector(二十)

    连接器里面衔接数据源与数据推送对象的是QueryTraverser类对象,该类实现了Traverser接口 /** * Interface presented by a Traverser. Used ...

  9. 企业搜索引擎开发之连接器connector(十八)

    创建并启动连接器实例之后,连接器就会基于Http协议向指定的数据接收服务器发送xmlfeed格式数据,我们可以通过配置http代理服务器抓取当前基于http协议格式的数据(或者也可以通过其他网络抓包工 ...

随机推荐

  1. 【pc杂谈】win7系统通过虚拟网卡共享wifi

    用管理员权限进入dos命令行 启用并设定虚拟WiFi网卡:netsh wlan set hostednetwork mode=allow  ssid=paulnet key=paulwinflo(注意 ...

  2. WinForm 每用户只允许创建一个实例

    string mutexName = System.Environment.UserName + "nono"; bool runone; System.Threading.Mut ...

  3. 介绍一个开源的 C++ 开发框架 openFrameworks 。

    作为一个图形图像方向的研究生,我经常都在和 OpenGL .OpenCV 等多种 C++ 库打交道.这些库遵循着不同的规则和用法:另外,为了让自己的程序具有更多的交互能力,编写界面也是一个家常便饭的工 ...

  4. python 面向对象 初始化

    参考学习: http://www.runoob.com/python/python-object.html 其中 函数里面 self.name 就是用 初始化的 name Employe.empCou ...

  5. mac电脑安装selenium 记录

    1.使用终端去命令安装 sudo easy_install selenium 参考:https://www.cnblogs.com/nichoc/p/5543654.html 2.听说驱动放在 /us ...

  6. OpenCL 前缀和

    ▶ 向量前缀和 ● 最简单的版本,代码 #include <stdio.h> #include <stdlib.h> #include <cl.h> const c ...

  7. ADO 缓存更新

    if (ADOQuery1->UpdateStatus() == usUnmodified)   return; ADOQuery1->UpdateBatch(arAll); Update ...

  8. 【转】volatile关键字。编译器不优化,多线程会改。防止随时变动的

    来自:http://www.cnblogs.com/yc_sunniwell/archive/2010/07/14/1777432.html 1. 为什么用volatile? C/C++ 中的 vol ...

  9. 【Arduino】、Arduino+ESP8266上传至oneNet云

    一.硬件简介 1. Arudino 是一种开源的电子平台,该平台最初主要基于AVR单片机的微控制器和相应的开发软件,包含硬件(各种型号的Arduino板)和软件(Arduino IDE). 2. ES ...

  10. Mybatis中DAO层接口没有写实现类,Mapper中的方法和DAO接口方法是怎么绑定到一起的

    参考mybatis入门基础(二)----原始dao的开发和mapper代理开发 其实也就是通过接口名与mapper的id绑定在一起,通过SQL去写实现类,返回数据.