企业搜索引擎开发之连接器connector(二十五)
下面开始具体分析连接器是怎么与连接器实例交互的,这里主要是分析连接器怎么从连接器实例获取数据的(前面文章有涉及基于http协议与连接器的xml格式的交互,连接器对连接器实例的设置都是通过配置文件操作的,具体文件操作尚未详细分析(com.google.enterprise.connector.persist.FileStore类))
本文以数据库连接器实例为例来分析,数据库类型连接器是通过调用mybatis(sqlmap映射框架)组件与数据库进行操作的,我们通过前端提交的数据库连接器实例表单信息最终存储在配置文件里面(默认采用文件方式,也可以采用数据库方式存储),连接器启动时通过加载该配置文件映射到数据库连接实例的上下文对象(类似反序列化的概念)
数据库连接实例的上下文对象类属性记录了配置信息及数据操作客户端对象类,同时在其初始化方法将上下文对象设置为数据操作客户端对象的属性
private final MimeTypeDetector mimeTypeDetector = new MimeTypeDetector(); private DBClient client;
private String connectionUrl;
private String connectorName;
private String login;
private String password;
private String sqlQuery;
private String authZQuery;
private String googleConnectorWorkDir;
private String primaryKeys;
private String xslt;
private String driverClassName;
private String documentURLField;
private String documentIdField;
private String baseURL;
private String lobField;
private String fetchURLField;
private String lastModifiedDate;
private String extMetadataType;
private int numberOfRows = 500;
private Integer minValue = -1;
private boolean publicFeed = true;
private boolean parameterizedQueryFlag = false;
private Boolean nullsSortLow = null;
private Collator collator; public DBContext() {
} public void init() throws DBException {
client.setDBContext(this); // If the NULL value sort behaviour has not been explicitly overriden
// in the configuration, fetch it from the DatabaseMetadata.
if (nullsSortLow == null) {
nullsSortLow = client.nullsAreSortedLow();
if (nullsSortLow == null) {
throw new DBException("nullsSortLowFlag must be set in configuration.");
}
}
}
DBClient对象封装了数据读取方法,数据读取调用了mybatis组件的相关API,加载配置信息,生成数据操作sqlsession对象
private boolean hasCustomCollationQuery = false;
protected DBContext dbContext;
protected SqlSessionFactory sqlSessionFactory;
protected DatabaseType databaseType; static {
org.apache.ibatis.logging.LogFactory.useJdkLogging();
} public DBClient() {
} public void setDBContext(DBContext dbContext) throws DBException {
this.dbContext = dbContext;
generateSqlMap();
this.sqlSessionFactory = getSqlSessionFactory(generateMyBatisConfig());
LOG.info("DBClient for database " + getDatabaseInfo() + " is instantiated");
this.databaseType = getDatabaseType();
} /**
* Constructor used for testing purpose. DBCLient initialized with sqlMap
* having crawl query without CDATA section.
*/
@VisibleForTesting
DBClient(DBContext dbContext) throws DBException {
this.dbContext = dbContext;
this.sqlSessionFactory = getSqlSessionFactory(generateMyBatisConfig());
this.databaseType = getDatabaseType();
} private SqlSessionFactory getSqlSessionFactory(String config) {
try {
SqlSessionFactoryBuilder builder = new SqlSessionFactoryBuilder();
return builder.build(new StringReader(config));
} catch (RuntimeException e) {
throw new RuntimeException("XML is not well formed", e);
}
} /**
* @return a SqlSession
*/
@VisibleForTesting
SqlSession getSqlSession()
throws SnapshotRepositoryRuntimeException {
try {
return sqlSessionFactory.openSession();
} catch (RuntimeException e) {
Throwable cause = (e.getCause() != null &&
e.getCause() instanceof SQLException) ? e.getCause() : e;
LOG.log(Level.WARNING, "Unable to connect to the database.", cause);
throw new SnapshotRepositoryRuntimeException(
"Unable to connect to the database.", cause);
}
}
具体的数据读取方法如下:
/**
* @param skipRows number of rows to skip in the database.
* @param maxRows max number of rows to return.
* @return rows - subset of the result of executing the SQL query. E.g.,
* result table with columns id and lastName and two rows will be
* returned as
*
* <pre>
* [{id=1, lastName=last_01}, {id=2, lastName=last_02}]
* </pre>
* @throws DBException
*/
public List<Map<String, Object>> executePartialQuery(int skipRows, int maxRows)
throws SnapshotRepositoryRuntimeException {
// TODO(meghna): Think about a better way to scroll through the result set.
List<Map<String, Object>> rows;
LOG.info("Executing partial query with skipRows = " + skipRows + " and "
+ "maxRows = " + maxRows);
SqlSession session = getSqlSession();
try {
rows = session.selectList("IbatisDBClient.getAll", null,
new RowBounds(skipRows, maxRows));
LOG.info("Sucessfully executed partial query with skipRows = "
+ skipRows + " and maxRows = " + maxRows);
} catch (RuntimeException e) {
checkDBConnection(session, e);
rows = new ArrayList<Map<String, Object>>();
} finally {
session.close();
}
LOG.info("Number of rows returned " + rows.size());
return rows;
}
RepositoryHandler类里面通过对DBContext dbContext和DBClient dbClient的引用来读取数据信息
里面还包装了内部类PartialQueryStrategy实现对数据偏移的控制
/**
* 实际调用的默认是这个实现类
* @author Administrator
*
*/
private class PartialQueryStrategy implements QueryStrategy {
private int skipRows = 0; @Override
public List<Map<String, Object>> executeQuery() {
return dbClient.executePartialQuery(skipRows,
dbContext.getNumberOfRows());
} @Override
public void resetCursor() {
skipRows = 0;
} @Override
public void updateCursor(List<Map<String, Object>> rows) {
skipRows += rows.size();
} @Override
public void logComplete() {
LOG.info("Total " + skipRows
+ " records are crawled during this crawl cycle");
}
}
然后在executeQueryAndAddDocs()方法里面调用该内部类实例对象
/**
* 重启后都是重新开始获取数据,不记录批次信息
* Function for fetching database rows and providing a collection of
* snapshots.
*/
public List<DocumentSnapshot> executeQueryAndAddDocs()
throws SnapshotRepositoryRuntimeException {
List<Map<String, Object>> rows = null; try {
rows = queryStrategy.executeQuery();
} catch (SnapshotRepositoryRuntimeException e) {
LOG.info("Repository Unreachable. Resetting DB cursor to "
+ "start traversal from begining after recovery.");
queryStrategy.resetCursor();
LOG.warning("Unable to connect to the database\n" + e.toString());
throw new SnapshotRepositoryRuntimeException(
"Unable to connect to the database.", e);
}
if (rows.size() == 0) {
queryStrategy.logComplete();
LOG.info("Crawl cycle of database is complete. Resetting DB cursor to "
+ "start traversal from begining");
queryStrategy.resetCursor();
} else {
queryStrategy.updateCursor(rows);
} if (traversalContext == null) {
LOG.info("Setting Traversal Context");
traversalContext = traversalContextManager.getTraversalContext();
JsonDocument.setTraversalContext(traversalContext);
} return getDocList(rows);
}
getDocList(rows)方法实现将数据记录包装为List<DocumentSnapshot>对象
/**
* 将数据包装为List<DocumentSnapshot>
* @param rows
* @return
*/
private List<DocumentSnapshot> getDocList(List<Map<String, Object>> rows) {
LOG.log(Level.FINE, "Building document snapshots for {0} rows.",
rows.size());
List<DocumentSnapshot> docList = Lists.newArrayList();
for (Map<String, Object> row : rows) {
try {
DocumentSnapshot snapshot = docBuilder.getDocumentSnapshot(row);
if (snapshot != null) {
if (LOG.isLoggable(Level.FINER)) {
LOG.finer("DBSnapshotRepository returns document with docID "
+ snapshot.getDocumentId());
}
docList.add(snapshot);
}
} catch (DBException e) {
// See the similar log message in DBSnapshot.getDocumentHandle.
LOG.log(Level.WARNING, "Cannot convert database record to snapshot "
+ "for record " + row, e);
}
}
LOG.info(docList.size() + " document(s) to be fed to GSA");
return docList;
}
RepositoryHandlerIterator类进一步对repositoryHandler的封装,实现数据的迭代器
/**
* Iterates over the collections of {@link DocumentSnapshot} objects
* produced by a {@code RepositoryHandler}.
*/
public class RepositoryHandlerIterator
extends AbstractIterator<DocumentSnapshot> {
private final RepositoryHandler repositoryHandler;
private Iterator<DocumentSnapshot> current; /**
* @param repositoryHandler RepositoryHandler object for fetching DB rows in
* DocumentSnapshot form.
*/
public RepositoryHandlerIterator(RepositoryHandler repositoryHandler) {
this.repositoryHandler = repositoryHandler;
this.current = Iterators.emptyIterator();
} @Override
protected DocumentSnapshot computeNext() {
if (current.hasNext()) {
return current.next();
} else {
current = repositoryHandler.executeQueryAndAddDocs().iterator();
if (current.hasNext()) {
return current.next();
} else {
return endOfData();
}
}
}
}
最后将迭代器交给了DBSnapshotRepository仓库(继承自连接器的SnapshotRepository仓库类,实现了与连接器的接口对接(适配器模式))
/**
* An iterable over the database rows. The main building block for
* interacting with the diffing package.
*/
public class DBSnapshotRepository
implements SnapshotRepository<DocumentSnapshot> {
private final RepositoryHandler repositoryHandler; public DBSnapshotRepository(RepositoryHandler repositoryHandler) {
this.repositoryHandler = repositoryHandler;
} @Override
public Iterator<DocumentSnapshot> iterator()
throws SnapshotRepositoryRuntimeException {
return new RepositoryHandlerIterator(repositoryHandler);
} @Override
public String getName() {
return DBSnapshotRepository.class.getName();
}
}
---------------------------------------------------------------------------
本系列企业搜索引擎开发之连接器connector系本人原创
转载请注明出处 博客园 刺猬的温驯
本人邮箱: chenying998179@163#com (#改为.)
本文链接 http://www.cnblogs.com/chenying99/p/3789054.html
企业搜索引擎开发之连接器connector(二十五)的更多相关文章
- 企业搜索引擎开发之连接器connector(十九)
连接器是基于http协议通过推模式(push)向数据接收服务端推送数据,即xmlfeed格式数据(xml格式),其发送数据接口命名为Pusher Pusher接口定义了与发送数据相关的方法 publi ...
- 企业搜索引擎开发之连接器connector(十八)
创建并启动连接器实例之后,连接器就会基于Http协议向指定的数据接收服务器发送xmlfeed格式数据,我们可以通过配置http代理服务器抓取当前基于http协议格式的数据(或者也可以通过其他网络抓包工 ...
- 企业搜索引擎开发之连接器connector(十六)
本人有一段时间没有接触企业搜索引擎之连接器的开发了,连接器是涉及企业搜索引擎一个重要的组件,在数据源与企业搜索引擎中间起一个桥梁的作用,类似于数据库之JDBC,通过连接器将不同数据源的数据适配到企业搜 ...
- 企业搜索引擎开发之连接器connector(二十九)
在哪里调用监控器管理对象snapshotRepositoryMonitorManager的start方法及stop方法,然后又在哪里调用CheckpointAndChangeQueue对象的resum ...
- 企业搜索引擎开发之连接器connector(二十八)
通常一个SnapshotRepository仓库对象对应一个DocumentSnapshotRepositoryMonitor监视器对象,同时也对应一个快照存储器对象,它们的关联是通过监视器管理对象D ...
- 企业搜索引擎开发之连接器connector(二十六)
连接器通过监视器对象DocumentSnapshotRepositoryMonitor从上文提到的仓库对象SnapshotRepository(数据库仓库为DBSnapshotRepository)中 ...
- 企业搜索引擎开发之连接器connector(二十四)
本人在上文中提到,连接器实现了两种事件依赖的机制 ,其一是我们手动操作连接器实例时:其二是由连接器的自动更新机制 上文中分析了连接器的自动更新机制,即定时器执行定时任务 那么,如果我们手动操作连接器实 ...
- 企业搜索引擎开发之连接器connector(二十二)
下面来分析线程执行类,线程池ThreadPool类 对该类的理解需要对java的线程池比较熟悉 该类引用了一个内部类 /** * The lazily constructed LazyThreadPo ...
- 企业搜索引擎开发之连接器connector(二十)
连接器里面衔接数据源与数据推送对象的是QueryTraverser类对象,该类实现了Traverser接口 /** * Interface presented by a Traverser. Used ...
随机推荐
- bzoj1215 24点游戏
Description 为了培养小孩的计算能力,大人们经常给小孩玩这样的游戏:从1付扑克牌中任意抽出4张扑克,要小孩用“+”.“-”.“×”.“÷”和括号组成一个合法的表达式,并使表达式的值为24点. ...
- jQuery的删除的三种方法remove(),detach(),empty()
remove()方法是从DOM中删除所有匹配的元素,包括匹配元素的子元素.但是他会有一个返回值, 返回值是一个指向已被删除的节点的引用,所以说,remove删除的元素,还可以再回收利用. var $l ...
- linux学习(别人指出来的), 回头有针对性的学下!
应该是 会linux 基本操作吧linux 安装 lamp lnmp php拓展这些基本都得会把知道subversion 和 github 这俩吧windows的代码同步到linux上无需ftp 会跟 ...
- Redhat 7.2 编译安装PostgreSQL 10
1.环境说明 CentOS7.2 postgresql10.4 2.下载 postgresql的官方地址 https://www.postgresql.org/ftp/source/ 在下载列表中根据 ...
- 问题记录,StartCoroutine(“str")问题
StartCoroutine参数为函数字符串名,运行时出错,错误是:无法启动协程函数. 调用格式如下: gameManager.StartCoroutine(LuaOnLevelwasloaded() ...
- 一步一步学习Android开发
一步步踏入Android的阵营. 疑惑篇: gravity和layout_gravity的区别
- dsm 黑 离线转码 备忘
6.2以后不行 我用的是 DS3617_6.17-15284 进入下载安装文件和工具 1安装 .套件来源增加 packages.synocommunity.comb.设置信任级别为任何发行者 c.找到 ...
- LevelDB 写入与删除记录
[LevelDB 写入与删除记录] levelDb的记录更新操作,即插入一条KV记录或者删除一条KV记录.levelDb的更新操作速度是非常快的,源于其内部机制决定了这种更新操作的简单性. 图6.1是 ...
- 30. Substring with Concatenation of All Words (String; HashTable)
You are given a string, s, and a list of words, words, that are all of the same length. Find all sta ...
- spring定时任务执行两次的原因与解决方法
spring定时任务,本地执行一次,放到服务器上后,每次执行时会执行两次,原因及解决办法. http://blog.csdn.net/yaobengen/article/details/7031266 ...