企业搜索引擎开发之连接器connector(二十七)
ChangeQueue类实现ChangeSource接口,声明了拉取下一条Change对象的方法
* A source of {@link Change} objects.
*
* @since 2.8
*/
public interface ChangeSource {
/**
* @return the next change, or {@code null} if there is no change available
*/
public Change getNextChange();
}
在ChangeQueue类实例里面初始化阻塞队列private final BlockingQueue<Change> pendingChanges,作为保存Change对象容器
/**
* 初始化阻塞队列pendingChanges
* @param size
* @param sleepInterval
* @param introduceDelayAfterEachScan
* @param activityLogger
*/
private ChangeQueue(int size, long sleepInterval,
boolean introduceDelayAfterEachScan, CrawlActivityLogger activityLogger) {
pendingChanges = new ArrayBlockingQueue<Change>(size);
this.sleepInterval = sleepInterval;
this.activityLogger = activityLogger;
this.introduceDelayAfterEveryScan = introduceDelayAfterEachScan;
}
参数introduceDelayAfterEveryScan设置在数据迭代完毕是否延时
上文中提到在其内部类CallBack中将提交的数据添加到阻塞队列BlockingQueue<Change> pendingChanges之中
而在ChangeQueue实现ChangeSource接口的方法中,实现从阻塞队列获取Change对象
/**
* 获取阻塞队列pendingChanges元素
* Gets the next available change from the ChangeQueue. Will wait up to
* 1/4 second for a change to appear if none is immediately available.
*
* @return the next available change, or {@code null} if no changes are
* available
*/
public Change getNextChange() {
try {
return pendingChanges.poll(250L, TimeUnit.MILLISECONDS);
} catch (InterruptedException ie) {
return null;
}
}
ChangeQueue对象作为保存Change对象的缓冲容器,上文中分析到Change对象是通过启动监控器对象DocumentSnapshotRepositoryMonitor的线程方法添加进来的
那么,由哪个对象实现调用ChangeQueue对象的getNextChange()方法取出Change对象数据呢?
通过跟踪CheckpointAndChangeQueue类的loadUpFromChangeSource方法调用了getNextChange()方法,在该方法里面将获取的Chnage对象经过包装为CheckpointAndChange类型对象后添加到成员属性List<CheckpointAndChange> checkpointAndChangeList之中
先熟悉一下相关成员属性和构造函数
private final AtomicInteger maximumQueueSize =
new AtomicInteger(DEFAULT_MAXIMUM_QUEUE_SIZE);
private final List<CheckpointAndChange> checkpointAndChangeList;
private final ChangeSource changeSource;
private final DocumentHandleFactory internalDocumentHandleFactory;
private final DocumentHandleFactory clientDocumentHandleFactory; private volatile DiffingConnectorCheckpoint lastCheckpoint;
private final File persistDir; // place to persist enqueued values
private MonitorRestartState monitorPoints = new MonitorRestartState(); public CheckpointAndChangeQueue(ChangeSource changeSource, File persistDir,
DocumentHandleFactory internalDocumentHandleFactory,
DocumentHandleFactory clientDocumentHandleFactory) {
this.changeSource = changeSource;
this.checkpointAndChangeList
= Collections.synchronizedList(
new ArrayList<CheckpointAndChange>(maximumQueueSize.get()));
this.persistDir = persistDir;
this.internalDocumentHandleFactory = internalDocumentHandleFactory;
this.clientDocumentHandleFactory = clientDocumentHandleFactory;
ensurePersistDirExists();
}
包括初始化ChangeSource类型对象changeSource(也即ChangeQueue类型对象)以及List容器List<CheckpointAndChange> checkpointAndChangeList
再来回顾loadUpFromChangeSource方法
/**
* 从ChangeSource拉取Change,加入checkpointAndChangeList
*/
private void loadUpFromChangeSource() {
int max = maximumQueueSize.get();
if (checkpointAndChangeList.size() < max) {
lastCheckpoint = lastCheckpoint.nextMajor();
}
while (checkpointAndChangeList.size() < max) {
Change newChange = changeSource.getNextChange();
if (newChange == null) {
break;
}
lastCheckpoint = lastCheckpoint.next();
checkpointAndChangeList.add(new CheckpointAndChange(
lastCheckpoint, newChange));
}
}
方法主要行为即从changeSource对象取出Change对象,然后经过包装为CheckPointAndChange对象添加到 容器List<CheckpointAndChange> checkpointAndChangeList之中
在其resume方法里面调用了loadUpFromChangeSource方法(resume方法在DiffingConnectorDocumentList类的构造函数中调用)
/**
* 获取List<CheckpointAndChange>队列
* Returns an {@link Iterator} for currently available
* {@link CheckpointAndChange} objects that occur after the passed in
* checkpoint. The {@link String} form of a {@link DiffingConnectorCheckpoint}
* passed in is produced by calling
* {@link DiffingConnectorCheckpoint#toString()}. As a side effect, Objects
* up to and including the object with the passed in checkpoint are removed
* from this queue.
*
* @param checkpointString null means return all {@link CheckpointAndChange}
* objects and a non null value means to return
* {@link CheckpointAndChange} objects with checkpoints after the
* passed in value.
* @throws IOException if error occurs while manipulating recovery state
*/
synchronized List<CheckpointAndChange> resume(String checkpointString)
throws IOException {
//移除已完成队列
removeCompletedChanges(checkpointString);
//从ChangeSource拉取Change,加入checkpointAndChangeList
loadUpFromChangeSource();
//更新monitorPoints
monitorPoints.updateOnGuaranteed(checkpointAndChangeList);
try {
//持久化checkpointAndChangeList到队列文件
//一次resume即生成一文件
writeRecoveryState();
} finally {
// TODO: Enahnce with mechanism that remembers
// information about recovery files to avoid re-reading.
//移除冗余的队列文件 (已经消费完成的)
removeExcessRecoveryState();
}
return getList();
}
在填充List<CheckpointAndChange> checkpointAndChangeList容器后,将其中的数据以json格式持久化到队列文件
/**
* 持久化json队列
* @throws IOException
*/
private void writeRecoveryState() throws IOException {
// TODO(pjo): Move this method into RecoveryFile.
File recoveryFile = new RecoveryFile(persistDir);
FileOutputStream outStream = new FileOutputStream(recoveryFile);
Writer writer = new OutputStreamWriter(outStream, Charsets.UTF_8);
try {
try {
writeJson(writer);
} catch (JSONException e) {
throw IOExceptionHelper.newIOException("Failed writing recovery file.", e);
}
writer.flush();
outStream.getFD().sync();
} finally {
writer.close();
}
}
队列文件命名包含了当前系统时间,用于比较文件创建的早晚
/**
* 可用于比较时间的队列文件
* A File that has some of the recovery logic.
* Original recovery files' names contained a single nanosecond timestamp,
* eg. recovery.10220010065599398 . These turned out to be flawed
* because nanosecond times can go "back in time" between JVM restarts.
* Updated recovery files' names contain a wall clock millis timestamp
* followed by an underscore followed by a nanotimestamp, eg.
* recovery.702522216012_10220010065599398 .
*/
static class RecoveryFile extends File {
final static long NO_TIME_AVAIL = -1;
long milliTimestamp = NO_TIME_AVAIL;
long nanoTimestamp; long parseTime(String s) throws IOException {
try {
return Long.parseLong(s);
} catch(NumberFormatException e) {
throw new LoggingIoException("Invalid recovery filename: "
+ getAbsolutePath());
}
} /**
* 解析文件名称中包含的时间
* @throws IOException
*/
void parseOutTimes() throws IOException {
try {
String basename = getName();
if (!basename.startsWith(RECOVERY_FILE_PREFIX)) {
throw new LoggingIoException("Invalid recovery filename: "
+ getAbsolutePath());
} else {
String extension = basename.substring(RECOVERY_FILE_PREFIX.length());
if (!extension.contains("_")) { // Original name format.
nanoTimestamp = parseTime(extension);
} else { // Updated name format.
String timeParts[] = extension.split("_");
if (2 != timeParts.length) {
throw new LoggingIoException("Invalid recovery filename: "
+ getAbsolutePath());
}
milliTimestamp = parseTime(timeParts[0]);
nanoTimestamp = parseTime(timeParts[1]);
}
}
} catch(IndexOutOfBoundsException e) {
throw new LoggingIoException("Invalid recovery filename: "
+ getAbsolutePath());
}
} RecoveryFile(File persistanceDir) throws IOException {
super(persistanceDir, RECOVERY_FILE_PREFIX + System.currentTimeMillis()
+ "_" + System.nanoTime());
parseOutTimes();
} /**
* 该构造函数用于先获得文件绝对路径
* @param absolutePath
* @throws IOException
*/
RecoveryFile(String absolutePath) throws IOException {
super(absolutePath);
parseOutTimes();
} boolean isOlder(RecoveryFile other) {
boolean weHaveMillis = milliTimestamp != NO_TIME_AVAIL;
boolean otherHasMillis = other.milliTimestamp != NO_TIME_AVAIL;
boolean bothHaveMillis = weHaveMillis && otherHasMillis;
boolean neitherHasMillis = (!weHaveMillis) && (!otherHasMillis);
if (bothHaveMillis) {
if (this.milliTimestamp < other.milliTimestamp) {
return true;
} else if (this.milliTimestamp > other.milliTimestamp) {
return false;
} else {
return this.nanoTimestamp < other.nanoTimestamp;
}
} else if (neitherHasMillis) {
return this.nanoTimestamp < other.nanoTimestamp;
} else if (weHaveMillis) { // and other doesn't; we are newer.
return false;
} else { // other has millis; other is newer.
return true;
}
} /** A delete method that logs failures. */
/**
* 删除文件
*/
public void logOnFailDelete() {
boolean deleted = super.delete();
if (!deleted) {
LOG.severe("Failed to delete: " + getAbsolutePath());
}
}
// TODO(pjo): Move more recovery logic into this class.
}
下面来看在其启动方法(start方法)都做了什么
/**
* Initialize to start processing from after the passed in checkpoint
* or from the beginning if the passed in checkpoint is null. Part of
* making DocumentSnapshotRepositoryMonitorManager go from "cold" to "warm".
*/
public synchronized void start(String checkpointString) throws IOException {
LOG.info("Starting CheckpointAndChangeQueue from " + checkpointString);
//创建队列目录
ensurePersistDirExists();
checkpointAndChangeList.clear();
lastCheckpoint = constructLastCheckpoint(checkpointString);
if (null == checkpointString) {
//删除队列文件
removeAllRecoveryState();
} else {
RecoveryFile current = removeExcessRecoveryState();
//加载monitorPoints和checkpointAndChangeList队列
loadUpFromRecoveryState(current);
//this.monitorPoints.points.entrySet(); }
}
无非从原先保存的队列文件中加载CheckPointAndChange对象列表到List<CheckpointAndChange> checkpointAndChangeList容器中(另外还包括MonitorCheckoint对象)
/**
* 加载队列
* @param file
* @throws IOException
*/
private void loadUpFromRecoveryState(RecoveryFile file) throws IOException {
// TODO(pjo): Move this method into RecoveryFile.
new LoadingQueueReader().readJson(file);
}
在CheckpointAndChangeQueue类中定义了内部类,即用于从json格式文件加载CheckPointAndChange对象列表到List<CheckpointAndChange> checkpointAndChangeList容器
抽象队列读取抽象类AbstractQueueReader
/**
* 从json文件加载队列抽象类
* Reads JSON recovery files. Uses the Template Method pattern to
* delegate what to do with the parsed objects to subclasses.
*
* Note: This class uses gson for streaming support.
*/
private abstract class AbstractQueueReader {
public void readJson(File file) throws IOException {
readJson(new BufferedReader(new InputStreamReader(
new FileInputStream(file), Charsets.UTF_8)));
} /**
* Reads and parses the stream, calling the abstract methods to
* take whatever action is required. The given stream will be
* closed automatically.
*
* @param reader the stream to parse
*/
@VisibleForTesting
void readJson(Reader reader) throws IOException {
JsonReader jsonReader = new JsonReader(reader);
try {
readJson(jsonReader);
} finally {
jsonReader.close();
}
} /**
* Reads and parses the stream, calling the abstract methods to
* take whatever action is required.
*/
private void readJson(JsonReader reader) throws IOException {
JsonParser parser = new JsonParser();
reader.beginObject();
while (reader.hasNext()) {
String name = reader.nextName();
if (name.equals(MONITOR_STATE_JSON_TAG)) {
readMonitorPoints(parser.parse(reader));
} else if (name.equals(QUEUE_JSON_TAG)) {
reader.beginArray();
while (reader.hasNext()) {
readCheckpointAndChange(parser.parse(reader));
}
reader.endArray();
} else {
throw new IOException("Read invalid recovery file.");
}
}
reader.endObject(); reader.setLenient(true);
String name = reader.nextString();
if (!name.equals(SENTINAL)) {
throw new IOException("Read invalid recovery file.");
}
} protected abstract void readMonitorPoints(JsonElement gson)
throws IOException; protected abstract void readCheckpointAndChange(JsonElement gson)
throws IOException;
}
抽象方法由子类实现
/**
* 检测队列文件的有效性
* Verifies that a JSON recovery file is valid JSON with a
* trailing sentinel.
*/
private class ValidatingQueueReader extends AbstractQueueReader {
protected void readMonitorPoints(JsonElement gson) throws IOException {
} protected void readCheckpointAndChange(JsonElement gson)
throws IOException {
}
} /**
* 从json文件加载队列实现类
*/
/** Loads the queue from a JSON recovery file. */
/*
* TODO(jlacey): Change everything downstream to gson. For now, we
* reserialize the individual gson objects and deserialize them
* using org.json.
*/
@VisibleForTesting
class LoadingQueueReader extends AbstractQueueReader {
/**
* 加载MonitorRestartState checkpoint(HashMap<String, MonitorCheckpoint> points)
*/
protected void readMonitorPoints(JsonElement gson) throws IOException {
try {
JSONObject json = gsonToJson(gson);
monitorPoints = new MonitorRestartState(json);
//monitorPoints.updateOnGuaranteed(checkpointAndChangeList)
} catch (JSONException e) {
throw IOExceptionHelper.newIOException(
"Failed reading persisted JSON queue.", e);
}
} /**
* 加载checkpointAndChangeList
*/
protected void readCheckpointAndChange(JsonElement gson)
throws IOException {
try {
JSONObject json = gsonToJson(gson);
checkpointAndChangeList.add(new CheckpointAndChange(json,
internalDocumentHandleFactory, clientDocumentHandleFactory));
} catch (JSONException e) {
throw IOExceptionHelper.newIOException(
"Failed reading persisted JSON queue.", e);
}
} // TODO(jlacey): This could be much more efficient, especially
// with LOBs, if we directly transformed the objects with a little
// recursive parser. This code is only used when recovering failed
// batches, so I don't know if that's worth the effort.
private JSONObject gsonToJson(JsonElement gson) throws JSONException {
return new JSONObject(gson.toString());
}
}
---------------------------------------------------------------------------
本系列企业搜索引擎开发之连接器connector系本人原创
转载请注明出处 博客园 刺猬的温驯
本人邮箱: chenying998179@163#com (#改为.)
本文链接 http://www.cnblogs.com/chenying99/p/3789560.html
企业搜索引擎开发之连接器connector(二十七)的更多相关文章
- 企业搜索引擎开发之连接器connector(十七)
本文描述连接器的提供与外界交互的servlet接口,连接器与外部是通过xml格式数据交互的 1) 获取所有连接类型 提交地址:http://localhost:8080/connector-mana ...
- 企业搜索引擎开发之连接器connector(二十九)
在哪里调用监控器管理对象snapshotRepositoryMonitorManager的start方法及stop方法,然后又在哪里调用CheckpointAndChangeQueue对象的resum ...
- 企业搜索引擎开发之连接器connector(二十八)
通常一个SnapshotRepository仓库对象对应一个DocumentSnapshotRepositoryMonitor监视器对象,同时也对应一个快照存储器对象,它们的关联是通过监视器管理对象D ...
- 企业搜索引擎开发之连接器connector(二十六)
连接器通过监视器对象DocumentSnapshotRepositoryMonitor从上文提到的仓库对象SnapshotRepository(数据库仓库为DBSnapshotRepository)中 ...
- 企业搜索引擎开发之连接器connector(二十五)
下面开始具体分析连接器是怎么与连接器实例交互的,这里主要是分析连接器怎么从连接器实例获取数据的(前面文章有涉及基于http协议与连接器的xml格式的交互,连接器对连接器实例的设置都是通过配置文件操作的 ...
- 企业搜索引擎开发之连接器connector(二十四)
本人在上文中提到,连接器实现了两种事件依赖的机制 ,其一是我们手动操作连接器实例时:其二是由连接器的自动更新机制 上文中分析了连接器的自动更新机制,即定时器执行定时任务 那么,如果我们手动操作连接器实 ...
- 企业搜索引擎开发之连接器connector(二十三)
我们在前面的文章已经看到,ConnectorCoordinatorImpl类也实现了ChangeHandler接口,本文接下来分析实现该接口的作用 class ConnectorCoordinator ...
- 企业搜索引擎开发之连接器connector(二十二)
下面来分析线程执行类,线程池ThreadPool类 对该类的理解需要对java的线程池比较熟悉 该类引用了一个内部类 /** * The lazily constructed LazyThreadPo ...
- 企业搜索引擎开发之连接器connector(二十一)
从上文中的QueryTraverser对象的BatchResult runBatch(BatchSize batchSize)方法上溯到CancelableBatch类,该类实现了TimedCance ...
随机推荐
- Oracle中的一些语句
添加字段的语法:alter table tablename add (column datatype [default value][null/not null],….); 修改字段的语法:alter ...
- MFC vs. SDK程序流程
大家都知道,windows API编程以及其消息处理,其过程都清晰可见,大体步骤如下: 1)声明消息窗口类 2)注册窗口类 3)createwindows 4)消息获得以及分派(windows pro ...
- java wab----遇到经常用到集合list/map/
List与Vector的区别: vector适用:对象数量变化少,简单对象,随机访问元素频繁 list适用:对象数量变化大,对象复杂,插入和删除频] List首先是链表,它的元素不是连续的.vecto ...
- 处理DateTime.Now不经过ToString()转换的格式(带有AM、PM)问题
问题是这样的: DateTime.Now不经过ToString()转换,网站部署到测试服务器(国内)得到的时间格式是:2018/8/17 16:26:09,而部署到国外服务器得到的时间格式是:17/8 ...
- 2018 Multi-University Training Contest 4-Glad You Came(hdu 6356)
一.思路 线段树维护一个区间最小值,然后对于每次操作,做区间更新即可.要注意的是,在更新的时候,记得剪枝:如果当前更新的值$v \le minv$(minv为当前线段树节点所管辖区间的最小值),直接返 ...
- solr学习之六--------Analyzer(分析器)、Tokenizer(分词器)
首先,不知道大家在前面的例子中没有试着搜索文本串,就是在第二节,我们添加了很多文档.如果字段值是一个文本.你如果只搜索这个字段的某个单词,是不是发现搜不到? 这就是因为我们没有配置Analyzer,因 ...
- Java各种Utils小结
原文地址:http://trinea.iteye.com/blog/1533616 最新内容建议直接访问原文:Android常用的工具类 主要介绍总结的Android开发中常用的工具类,大部分同样适用 ...
- leetcode434
public class Solution { public int CountSegments(string s) { s = s.Trim(); ) { ; } else { ; ; ; i &l ...
- VB.Net与C# 的语法比较
最近看代码或写代码时,经常把VB与C#的基本语法搞混,为方便查看,特对其异同进行对比: 變數初始化 VB.NET 自動將所有的變數初始化成 0 或 nothing.C# 在你未初始化變數之前不准你用該 ...
- views中class定义(类的写法)CBV
from django.views import View class Home(View): def dispatch(self, request, *args, **kwargs): print( ...