连接器是基于http协议通过推模式(push)向数据接收服务端推送数据,即xmlfeed格式数据(xml格式),其发送数据接口命名为Pusher

Pusher接口定义了与发送数据相关的方法

public interface Pusher {

  /**
* Status indicating the readiness of the Pusher.
*/
public static enum PusherStatus {
OK, LOW_MEMORY, LOCAL_FEED_BACKLOG, GSA_FEED_BACKLOG, DISABLED;
} /**
* Takes an spi Document and pushes it along, presumably to the GSA Feed.
*
* @param document A Document
* @return PusherStatus. If OK, Pusher may accept more documents.
* @throws RepositoryException if transient error accessing the Repository
* @throws RepositoryDocumentException if fatal error accessing the Document
* @throws FeedException if a transient Feed error occurs in the Pusher
* @throws PushException if a transient error occurs in the Pusher
*/
public PusherStatus take(Document document)
throws PushException, FeedException, RepositoryException; /**
* Finishes processing a document feed. If the caller anticipates no
* further calls to {@link #take(Document)} will be
* made, this method should be called, so that the Pusher may send a cached,
* accumulated Feed to the feed processor.
*
* @throws RepositoryException if transient error accessing the Repository
* @throws RepositoryDocumentException if fatal error accessing the Document
* @throws FeedException if a transient Feed error occurs in the Pusher
* @throws PushException if a transient error occurs in the Pusher
*/
public void flush()
throws PushException, FeedException, RepositoryException; /**
* Cancels a feed. Discard any accumulated feed data.
*/
public void cancel(); /**
* Gets the current pusher status.
*
* @return the current PusherStatus
* @throws RepositoryException if transient error accessing the Repository
* @throws FeedException if a transient Feed error occurs in the Pusher
* @throws PushException if a transient error occurs in the Pusher
*/
public PusherStatus getPusherStatus()
throws PushException, FeedException, RepositoryException;
}

该接口的通过PusherStatus take(Document document)方法发送数据,我们可以查看其实现类DocPusher查看发送数据方法源码

/**
* Takes a Document and sends a the feed to the GSA.
*
* @param document Document corresponding to the document.
* @return true if Pusher should accept more documents, false otherwise.
* @throws PushException if Pusher problem
* @throws FeedException if transient Feed problem
* @throws RepositoryDocumentException if fatal Document problem
* @throws RepositoryException if transient Repository problem
*/
@Override
public PusherStatus take(Document document)
throws PushException, FeedException, RepositoryException {
if (feedSender.isShutdown()) {
return PusherStatus.DISABLED;
}
checkSubmissions(); // Apply any configured Document filters to the document.
document = documentFilterFactory.newDocumentFilter(document); FeedType feedType;
try {
feedType = DocUtils.getFeedType(document);
} catch (RuntimeException e) {
LOGGER.log(Level.WARNING,
"Rethrowing RuntimeException as RepositoryDocumentException", e);
throw new RepositoryDocumentException(e);
} // All feeds in a feed file must be of the same type.
// If the feed would change type, send the feed off to the GSA
// and start a new one.
// TODO: Fix this check to allow ACLs in any type feed.
if (xmlFeed != null && !feedType.isCompatible(xmlFeed.getFeedType())) {
if (LOGGER.isLoggable(Level.FINE)) {
LOGGER.fine("A new feedType, " + feedType + ", requires a new feed for "
+ connectorName + ". Closing feed and sending to GSA.");
}
submitFeed();
} if (xmlFeed == null) {
if (LOGGER.isLoggable(Level.FINE)) {
LOGGER.fine("Creating new " + feedType + " feed for " + connectorName);
}
try {
startNewFeed(feedType);
} catch (OutOfMemoryError me) {
throw new PushException("Unable to allocate feed buffer. Try reducing"
+ " the maxFeedSize setting, reducing the number of connector"
+ " intances, or adjusting the JVM heap size parameters.", me);
}
} boolean isThrowing = false;
int resetPoint = xmlFeed.size();
int resetCount = xmlFeed.getRecordCount();
try {
if (LOGGER.isLoggable(Level.FINER)) {
LOGGER.log(Level.FINER, "DOCUMENT: Adding document with docid={0} and "
+ "searchurl={1} from connector {2} to feed.", new Object[] {
DocUtils.getOptionalString(document, SpiConstants.PROPNAME_DOCID),
DocUtils.getOptionalString(document,
SpiConstants.PROPNAME_SEARCHURL),
connectorName});
} // Add this document to the feed.
xmlFeed.addRecord(document); // If the feed is full, send it off to the GSA.
if (xmlFeed.isFull() || lowMemory()) {
if (LOGGER.isLoggable(Level.FINE)) {
LOGGER.fine("Feed for " + connectorName + " has grown to "
+ xmlFeed.size() + " bytes. Closing feed and sending to GSA.");
}
submitFeed();
return getPusherStatus();
} // Indicate that this Pusher may accept more documents.
return PusherStatus.OK; } catch (OutOfMemoryError me) {
resetFeed(resetPoint, resetCount);
throw new PushException("Out of memory building feed, retrying.", me);
} catch (RuntimeException e) {
resetFeed(resetPoint, resetCount);
LOGGER.log(Level.WARNING,
"Rethrowing RuntimeException as RepositoryDocumentException", e);
throw new RepositoryDocumentException(e);
} catch (RepositoryDocumentException rde) {
// Skipping this document, remove it from the feed.
resetFeed(resetPoint, resetCount);
throw rde;
} catch (IOException ioe) {
LOGGER.log(Level.SEVERE, "IOException while reading: skipping", ioe);
resetFeed(resetPoint, resetCount);
Throwable t = ioe.getCause();
isThrowing = true;
if (t != null && (t instanceof RepositoryException)) {
throw (RepositoryException) t;
} else {
throw new RepositoryDocumentException("I/O error reading data", ioe);
}
}
}

在上面的方法中,首先需要将参数Document documen对象经过包装(如Base64编码等)添加到xmlFeed集合中,当xmlFeed集合满足条件的时候才向数据服务器发送过去,即每次向数据服务器发送的是document对象集合,而不是单独的document对象

当xmlFeed对象达到指定要求时,便调用submitFeed()方法提交xmlFeed对象

/**
* Takes the accumulated XmlFeed and sends the feed to the GSA.
*
* @throws PushException
* if Pusher problem
* @throws FeedException
* if transient Feed problem
* @throws RepositoryException
*/
private void submitFeed() throws PushException, FeedException,
RepositoryException {
if (xmlFeed == null) {
return;
} final XmlFeed feed = xmlFeed;
xmlFeed = null;
final String logMessage;
if (feedLog != null) {
logMessage = feedLog.toString();
feedLog = null;
} else {
logMessage = null;
} try {
feed.close();
} catch (IOException ioe) {
throw new PushException("Error closing feed", ioe);
} try {
// Send the feed to the GSA in a separate thread.
FutureTask<String> future = new FutureTask<String>(
new Callable<String>() {
public String call() throws PushException,
FeedException, RepositoryException {
try {
NDC.push("Feed " + feed.getDataSource());
return submitFeed(feed, logMessage);
} finally {
NDC.remove();
}
}
});
feedSender.execute(future);
// Add the future to list of outstanding submissions.
synchronized (submissions) {
submissions.add(future);
}
} catch (RejectedExecutionException ree) {
throw new FeedException("Asynchronous feed was rejected. ", ree);
}
}

该方法首先将数据发送方法封装到FutureTask对象的call()方法里面,然后在线程池里面执行之,最后将future结构句柄添加到LinkedList<FutureTask<String>> submissions集合

下面是调用feedConnection对象发送数据方法

/**
* Takes the supplied XmlFeed and sends that feed to the GSA.
*
* @param feed
* an XmlFeed
* @param logMessage
* a Feed Log message
* @return response String from GSA
* @throws PushException
* if Pusher problem
* @throws FeedException
* if transient Feed problem
* @throws RepositoryException
*/
private String submitFeed(XmlFeed feed, String logMessage)
throws PushException, FeedException, RepositoryException { if (LOGGER.isLoggable(Level.FINE)) {
LOGGER.fine("Submitting " + feed.getFeedType() + " feed for "
+ feed.getDataSource() + " to the GSA. "
+ feed.getRecordCount() + " records totaling "
+ feed.size() + " bytes.");
} // Write the generated feedLog message to the feed logger.
if (logMessage != null && FEED_LOGGER.isLoggable(FEED_LOG_LEVEL)) {
FEED_LOGGER.log(FEED_LOG_LEVEL, logMessage);
}
//将xmlfeed写入临时文件
// Write the Feed to the TeedFeedFile, if one was specified.
String teedFeedFilename = Context.getInstance().getTeedFeedFile();
// String teedFeedFilename = "D:/files/google2.txt";
if (teedFeedFilename != null) {
boolean isThrowing = false;
OutputStream os = null;
try {
os = new FileOutputStream(teedFeedFilename, true);
feed.writeTo(os);
} catch (IOException e) {
isThrowing = true;
throw new FeedException("Cannot write to file: "
+ teedFeedFilename, e);
} finally {
if (os != null) {
try {
os.close();
} catch (IOException e) {
if (!isThrowing) {
throw new FeedException("Cannot write to file: "
+ teedFeedFilename, e);
}
}
}
}
} String gsaResponse = feedConnection.sendData(feed);
if (!gsaResponse.equals(GsaFeedConnection.SUCCESS_RESPONSE)) {
String eMessage = gsaResponse;
if (GsaFeedConnection.UNAUTHORIZED_RESPONSE.equals(gsaResponse)) {
eMessage += ": Client is not authorized to send feeds. Make "
+ "sure the GSA is configured to trust feeds from your host.";
}
if (GsaFeedConnection.INTERNAL_ERROR_RESPONSE.equals(gsaResponse)) {
eMessage += ": Check GSA status or feed format.";
}
throw new PushException(eMessage);
}
return gsaResponse;
}

如果配置了teedFeedFilename属性,则现将xmlfeed对象写入该文件,然后才是调用feedConnection对象实例发送xmlfeed对象数据

feedConnection对象基于http协议发送数据,下面是feedConnection对象发送数据方法

/* @Override */
public String sendData(FeedData feedData) throws FeedException {
try {
String response = sendFeedData((XmlFeed) feedData);
gotFeedError = !response.equalsIgnoreCase(SUCCESS_RESPONSE);
return response;
} catch (FeedException fe) {
gotFeedError = true;
throw fe;
}
}

进一步调用sendFeedData(XmlFeed feed)方法发送

private String sendFeedData(XmlFeed feed) throws FeedException {
String feedType = feed.getFeedType().toLegacyString();
String dataSource = feed.getDataSource();
OutputStream outputStream;
HttpURLConnection uc;
StringBuilder buf = new StringBuilder();
byte[] prefix;
byte[] suffix;
try {
// Build prefix.
controlHeader(buf, "datasource", ServletUtil.MIMETYPE_TEXT_PLAIN);
buf.append(dataSource).append(CRLF);
controlHeader(buf, "feedtype", ServletUtil.MIMETYPE_TEXT_PLAIN);
buf.append(feedType).append(CRLF);
controlHeader(buf, "data", ServletUtil.MIMETYPE_XML);
prefix = buf.toString().getBytes("UTF-8"); // Build suffix.
buf.setLength(0);
buf.append(CRLF).append("--").append(BOUNDARY).append("--")
.append(CRLF);
suffix = buf.toString().getBytes("UTF-8"); LOGGER.finest("Opening feed connection to " + feedUrl); synchronized (this) {
uc = (HttpURLConnection) feedUrl.openConnection();
}
if (uc instanceof HttpsURLConnection && !validateCertificate) {
SslUtil.setTrustingHttpsOptions((HttpsURLConnection) uc);
} uc.setRequestProperty("Charsert", "UTF-8");
uc.setDoInput(true);
uc.setDoOutput(true);
uc.setFixedLengthStreamingMode(prefix.length + feed.size()
+ suffix.length);
uc.setRequestProperty("Content-Type",
"multipart/form-data; boundary=" + BOUNDARY); outputStream = uc.getOutputStream();
} catch (IOException ioe) {
throw new FeedException(feedUrl.toString(), ioe);
} catch (GeneralSecurityException e) {
throw new FeedException(feedUrl.toString(), e);
} boolean isThrowing = false;
buf.setLength(0);
try {
LOGGER.finest("Writing feed data to feed connection.");
// If there is an exception during this read/write, we do our
// best to close the url connection and read the result.
try {
outputStream.write(prefix);
feed.writeTo(outputStream);
outputStream.write(suffix);
outputStream.flush();
} catch (IOException e) {
LOGGER.log(Level.SEVERE,
"IOException while posting: will retry later", e);
isThrowing = true;
throw new FeedException(e);
} catch (RuntimeException e) {
isThrowing = true;
throw e;
} catch (Error e) {
isThrowing = true;
throw e;
} finally {
try {
outputStream.close();
} catch (IOException e) {
LOGGER.log(
Level.SEVERE,
"IOException while closing after post: will retry later",
e);
if (!isThrowing) {
isThrowing = true;
throw new FeedException(e);
}
}
}
} finally {
BufferedReader br = null;
try {
LOGGER.finest("Waiting for response from feed connection.");
InputStream inputStream = uc.getInputStream();
br = new BufferedReader(new InputStreamReader(inputStream,
"UTF8"));
String line;
while ((line = br.readLine()) != null) {
buf.append(line);
}
} catch (IOException ioe) {
if (!isThrowing) {
throw new FeedException(ioe);
}
} finally {
try {
if (br != null) {
br.close();
}
} catch (IOException e) {
LOGGER.log(Level.SEVERE,
"IOException while closing after post: continuing",
e);
}
if (uc != null) {
uc.disconnect();
}
if (LOGGER.isLoggable(Level.FINEST)) {
LOGGER.finest("Received response from feed connection: "
+ buf.toString());
}
}
}
return buf.toString();
}

这里的Content-Type设置为multipart/form-data,类似于我们通过表单上传文件时设置的form属性,通过向输出流写入xmlfeed的二进制数据,然后从输入流接收反馈结果,,具体发送数据的格式本人在企业搜索引擎开发之连接器connector(十八)已提及过,本文不再描述

---------------------------------------------------------------------------

本系列企业搜索引擎开发之连接器connector系本人原创

转载请注明出处 博客园 刺猬的温驯

本人邮箱: chenying998179@163#com (#改为.)

本文链接 http://www.cnblogs.com/chenying99/p/3775504.html

企业搜索引擎开发之连接器connector(十九)的更多相关文章

  1. 企业搜索引擎开发之连接器connector(二十九)

    在哪里调用监控器管理对象snapshotRepositoryMonitorManager的start方法及stop方法,然后又在哪里调用CheckpointAndChangeQueue对象的resum ...

  2. 企业搜索引擎开发之连接器connector(三十)

    连接器里面采用的什么样的数据结构,我们先从Document迭代器开始入手,具体的Document迭代器类都实现了DocumentList接口,该接口定义了两个方法 public interface D ...

  3. 企业搜索引擎开发之连接器connector(二十八)

    通常一个SnapshotRepository仓库对象对应一个DocumentSnapshotRepositoryMonitor监视器对象,同时也对应一个快照存储器对象,它们的关联是通过监视器管理对象D ...

  4. 企业搜索引擎开发之连接器connector(二十六)

    连接器通过监视器对象DocumentSnapshotRepositoryMonitor从上文提到的仓库对象SnapshotRepository(数据库仓库为DBSnapshotRepository)中 ...

  5. 企业搜索引擎开发之连接器connector(二十五)

    下面开始具体分析连接器是怎么与连接器实例交互的,这里主要是分析连接器怎么从连接器实例获取数据的(前面文章有涉及基于http协议与连接器的xml格式的交互,连接器对连接器实例的设置都是通过配置文件操作的 ...

  6. 企业搜索引擎开发之连接器connector(二十四)

    本人在上文中提到,连接器实现了两种事件依赖的机制 ,其一是我们手动操作连接器实例时:其二是由连接器的自动更新机制 上文中分析了连接器的自动更新机制,即定时器执行定时任务 那么,如果我们手动操作连接器实 ...

  7. 企业搜索引擎开发之连接器connector(二十二)

    下面来分析线程执行类,线程池ThreadPool类 对该类的理解需要对java的线程池比较熟悉 该类引用了一个内部类 /** * The lazily constructed LazyThreadPo ...

  8. 企业搜索引擎开发之连接器connector(二十)

    连接器里面衔接数据源与数据推送对象的是QueryTraverser类对象,该类实现了Traverser接口 /** * Interface presented by a Traverser. Used ...

  9. 企业搜索引擎开发之连接器connector(十八)

    创建并启动连接器实例之后,连接器就会基于Http协议向指定的数据接收服务器发送xmlfeed格式数据,我们可以通过配置http代理服务器抓取当前基于http协议格式的数据(或者也可以通过其他网络抓包工 ...

随机推荐

  1. linux系统报错日志学习

    linux本身会自动记录系统报错日志:/var/log/messages 这个日志记录,我是在什么时候发现其强大的作用的呢?它有点像我们使用php脚本开发接口的时候技术员在重要地方打日志的效果,方便技 ...

  2. Fedora安装vim失败解决方法

    今天安装在fedora上安装vim的时候,出现如下错误 ================================================================= Downlo ...

  3. MySQL锁之二:锁相关的配置参数

    锁相关的配置参数: mysql> SHOW VARIABLES LIKE '%timeout%'; +-----------------------------+----------+ | Va ...

  4. jquery textSearch实现页面搜索 注意!!!!调用这个插件后,js事件绑定如,on、bind、live delegate全部失效,折腾了我一整天!!!

    今天我们介绍的这个插件来着http://www.zhangxinxu.com/wordpress/,张鑫旭的文章写得挺好的,大家有兴趣的多看看. 我们今天的这个插件叫“jquery.textSearc ...

  5. JSTL-c:forEach标签详解

    c:forEach基本格式: <c:forEach var="每个变量名字" items="要迭代的list" varStatus="每个对象的 ...

  6. HTTP协议响应头之Transfer-Encoding:分块传输详解

    Http Connection有两种连接方式:短连接和长连接:短连接即一次请求对应一次TCP连接的建立和销毁过程,而长连接是多个请求共用同一个连接这样可以节省大量连接建立时间提高通信效率.目前主流浏览 ...

  7. Unity容器声明周期管理

    Having said that, here is a solution that you can use with the Unity container: Create some custom a ...

  8. 操作表单域中的value值

    HTML <form action=""> <input type="radio" name="sex" value=&q ...

  9. 实现一个最简单的plot函数调用:

    实现一个最简单的plot函数调用: 1 import matplotlib.pyplot as plt 2 3 y=pp.DS.Transac_open # 设置y轴数据,以数组形式提供 4 5 x= ...

  10. params over length limit is 20