深入剖析SolrCloud（四）

出处：http://phinecos.cnblogs.com/　　　　

本博客遵从Creative Commons
Attribution 3.0 License，若用于非商业目的，您可以自由转载，但请保留原作者信息和文章链接URL。

在上一篇中介绍了连接Zookeeper集群的方法，这一篇将围绕一个有趣的话题---来展开，这就是Replication（索引复制），关于Solr Replication的详细介绍，可以参考http://wiki.apache.org/solr/SolrReplication。

在开始这个话题之前，先从我最近在应用中引入solr的master/slave架构时，遇到的一个让我困扰的实际问题。

应用场景简单描述如下：

1）首先master节点下载索引分片，然后创建配置文件，加入master节点的replication配置片段，再对索引分片进行合并(关于mergeIndex，可以参考http://wiki.apache.org/solr/MergingSolrIndexes)，然后利用上述配置文件和索引数据去创建一个solr核。

2）slave节点创建配置文件，加入slave节点的replication配置片段，创建一个空的solr核，等待从master节点进行索引数据同步

出现的问题：slave节点没有从master节点同步到数据。

问题分析：

1）首先检查master节点，获取最新的可复制索引的版本号，

http://master_host:port/solr/replication?command=indexversion

发现返回的索引版本号是0，这说明mater节点根本没有触发replication动作，

2）为了确认上述判断，在slave节点上进一步查看replication的详细信息

http://slave_host:port/solr/replication?command=details

发现确实如此，尽管master节点的索引版本号和slave节点的索引版本号不一致，但索引却没有同步过来，再分别查看master节点和slave节点的日志，发现索引复制动作确实没有开始。

综上所述，确实是master节点没有触发索引复制动作，那究竟是为何呢？先将原因摆出来，后面会通过源码的分析来加以说明。

原因：solr合并索引时，不管你是通过mergeindexes的http命令，还是调用底层lucene的IndexWriter，记得最后一定要提交一个commit，否则，不仅索引不仅不会对查询可见，更是对于master/slave架构的solr集群来说，master节点的replication动作不会触发，因为indexversion没有感知到变化。

好了，下面开始对Solr的Replication的分析。

Solr容器在加载solr核的时候，会对已经注册的各个实现SolrCoreAware接口的Handler进行回调，调用其inform方法。

对于ReplicationHandler来说，就是在这里对自己是属于master节点还是slave节点进行判断，若是slave节点，则创建一个SnapPuller对象，定时负责从master节点主动拉索引数据下来；若是master节点，则只设置相应的参数。

复制代码

public void inform(SolrCore core) {

this.core = core;

registerFileStreamResponseWriter();

registerCloseHook();

NamedList slave = (NamedList) initArgs.get("slave");

boolean enableSlave = isEnabled( slave );

if (enableSlave) {

tempSnapPuller = snapPuller = new SnapPuller(slave, this, core);

isSlave = true;

}

NamedList master = (NamedList) initArgs.get("master");

boolean enableMaster = isEnabled( master );



if (!enableSlave && !enableMaster) {

enableMaster = true;

master = new NamedList<Object>();

}



if (enableMaster) {

includeConfFiles = (String) master.get(CONF_FILES);

if (includeConfFiles != null && includeConfFiles.trim().length() > 0) {

List<String> files = Arrays.asList(includeConfFiles.split(","));

for (String file : files) {

if (file.trim().length() == 0) continue;

String[] strs = file.split(":");

// if there is an alias add it or it is null

confFileNameAlias.add(strs[0], strs.length > 1 ? strs[1] : null);

}

LOG.info("Replication enabled for following config files: " + includeConfFiles);

}

List backup = master.getAll("backupAfter");

boolean backupOnCommit = backup.contains("commit");

boolean backupOnOptimize = !backupOnCommit && backup.contains("optimize");

List replicateAfter = master.getAll(REPLICATE_AFTER);

replicateOnCommit = replicateAfter.contains("commit");

replicateOnOptimize = !replicateOnCommit && replicateAfter.contains("optimize");

if (!replicateOnCommit && ! replicateOnOptimize) {

replicateOnCommit = true;

}



// if we only want to replicate on optimize, we need the deletion policy to

// save the last optimized commit point.

if (replicateOnOptimize) {

IndexDeletionPolicyWrapper wrapper = core.getDeletionPolicy();

IndexDeletionPolicy policy = wrapper == null ? null : wrapper.getWrappedDeletionPolicy();

if (policy instanceof SolrDeletionPolicy) {

SolrDeletionPolicy solrPolicy = (SolrDeletionPolicy)policy;

if (solrPolicy.getMaxOptimizedCommitsToKeep() < 1) {

solrPolicy.setMaxOptimizedCommitsToKeep(1);

}

} else {

LOG.warn("Replication can't call setMaxOptimizedCommitsToKeep on " + policy);

}

}

if (replicateOnOptimize || backupOnOptimize) {

core.getUpdateHandler().registerOptimizeCallback(getEventListener(backupOnOptimize, replicateOnOptimize));

}

if (replicateOnCommit || backupOnCommit) {

replicateOnCommit = true;

core.getUpdateHandler().registerCommitCallback(getEventListener(backupOnCommit, replicateOnCommit));

}

if (replicateAfter.contains("startup")) {

replicateOnStart = true;

RefCounted<SolrIndexSearcher> s = core.getNewestSearcher(false);

try {

DirectoryReader reader = s==null ? null : s.get().getIndexReader();

if (reader!=null && reader.getIndexCommit() != null && reader.getIndexCommit().getGeneration() != 1L) {

try {

if(replicateOnOptimize){

Collection<IndexCommit> commits = DirectoryReader.listCommits(reader.directory());

for (IndexCommit ic : commits) {

if(ic.getSegmentCount() == 1){

if(indexCommitPoint == null || indexCommitPoint.getGeneration() < ic.getGeneration()) indexCommitPoint = ic;

}

}

} else{

indexCommitPoint = reader.getIndexCommit();

}

} finally {

// We don't need to save commit points for replication, the SolrDeletionPolicy

// always saves the last commit point (and the last optimized commit point, if needed)

/***

if(indexCommitPoint != null){

core.getDeletionPolicy().saveCommitPoint(indexCommitPoint.getGeneration());

}

***/

}

}

// reboot the writer on the new index

core.getUpdateHandler().newIndexWriter();

} catch (IOException e) {

LOG.warn("Unable to get IndexCommit on startup", e);

} finally {

if (s!=null) s.decref();

}

}

String reserve = (String) master.get(RESERVE);

if (reserve != null && !reserve.trim().equals("")) {

reserveCommitDuration = SnapPuller.readInterval(reserve);

}

LOG.info("Commits will be reserved for " + reserveCommitDuration);

isMaster = true;

}

}

复制代码

ReplicationHandler可以响应多种命令：

1) indexversion。

这里需要了解的第一个概念是索引提交点(IndexCommit)，这是底层lucene的东西，可以自行查阅资料。首先获取最新的索引提交点，然后从其中获取索引版本号和索引所属代。

IndexCommit commitPoint = indexCommitPoint; // make a copy so it won't change

if (commitPoint != null && replicationEnabled.get()) {

core.getDeletionPolicy().setReserveDuration(commitPoint.getVersion(), reserveCommitDuration);

rsp.add(CMD_INDEX_VERSION, commitPoint.getVersion());

rsp.add(GENERATION, commitPoint.getGeneration());

2）backup。这个命令用来对索引做快照。首先获取最新的索引提交点，然后创建做一个SnapShooter，具体的快照动作由这个对象完成，

复制代码

private void doSnapShoot(SolrParams params, SolrQueryResponse rsp, SolrQueryRequest req) {

try {

int numberToKeep = params.getInt(NUMBER_BACKUPS_TO_KEEP, Integer.MAX_VALUE);

IndexDeletionPolicyWrapper delPolicy = core.getDeletionPolicy();

IndexCommit indexCommit = delPolicy.getLatestCommit();



if(indexCommit == null) {

indexCommit = req.getSearcher().getReader().getIndexCommit();

}



// small race here before the commit point is saved

new SnapShooter(core, params.get("location")).createSnapAsync(indexCommit, numberToKeep, this);



} catch (Exception e) {

LOG.warn("Exception during creating a snapshot", e);

rsp.add("exception", e);

}

}

快照对象会启动一个线程去异步地做一个索引备份。

void createSnapAsync(final IndexCommit indexCommit, final int numberToKeep, final ReplicationHandler replicationHandler) {

replicationHandler.core.getDeletionPolicy().saveCommitPoint(indexCommit.getVersion());

new Thread() {

@Override

public void run() {

createSnapshot(indexCommit, numberToKeep, replicationHandler);

}

}.start();

}

void createSnapshot(final IndexCommit indexCommit, int numberToKeep, ReplicationHandler replicationHandler) {

NamedList details = new NamedList();

details.add("startTime", new Date().toString());

File snapShotDir = null;

String directoryName = null;

Lock lock = null;

try {

if(numberToKeep<Integer.MAX_VALUE) {

deleteOldBackups(numberToKeep);

}

SimpleDateFormat fmt = new SimpleDateFormat(DATE_FMT, Locale.US);

directoryName = "snapshot." + fmt.format(new Date());

lock = lockFactory.makeLock(directoryName + ".lock");

if (lock.isLocked()) return;

snapShotDir = new File(snapDir, directoryName);

if (!snapShotDir.mkdir()) {

LOG.warn("Unable to create snapshot directory: " + snapShotDir.getAbsolutePath());

return;

}

Collection<String> files = indexCommit.getFileNames();

FileCopier fileCopier = new FileCopier(solrCore.getDeletionPolicy(), indexCommit);

fileCopier.copyFiles(files, snapShotDir);

details.add("fileCount", files.size());

details.add("status", "success");

details.add("snapshotCompletedAt", new Date().toString());

} catch (Exception e) {

SnapPuller.delTree(snapShotDir);

LOG.error("Exception while creating snapshot", e);

details.add("snapShootException", e.getMessage());

} finally {

replicationHandler.core.getDeletionPolicy().releaseCommitPoint(indexCommit.getVersion());

replicationHandler.snapShootDetails = details;

if (lock != null) {

try {

lock.release();

} catch (IOException e) {

LOG.error("Unable to release snapshoot lock: " + directoryName + ".lock");

}

}

}

}

3）fetchindex。响应来自slave节点的取索引文件的请求，会启动一个线程来实现索引文件的获取。

String masterUrl = solrParams.get(MASTER_URL);

if (!isSlave && masterUrl == null) {

rsp.add(STATUS,ERR_STATUS);

rsp.add("message","No slave configured or no 'masterUrl' Specified");

return;

}

final SolrParams paramsCopy = new ModifiableSolrParams(solrParams);

new Thread() {

@Override

public void run() {

doFetch(paramsCopy);

}

}.start();

rsp.add(STATUS, OK_STATUS);

具体的获取动作是通过SnapPuller对象来实现的，首先尝试获取pull对象锁，如果请求锁失败，则说明还有取索引数据动作未结束，如果请求锁成功，就调用SnapPuller对象的fetchLatestIndex方法来取最新的索引数据。

void doFetch(SolrParams solrParams) {

String masterUrl = solrParams == null ? null : solrParams.get(MASTER_URL);

if (!snapPullLock.tryLock())

return;

try {

tempSnapPuller = snapPuller;

if (masterUrl != null) {

NamedList<Object> nl = solrParams.toNamedList();

nl.remove(SnapPuller.POLL_INTERVAL);

tempSnapPuller = new SnapPuller(nl, this, core);

}

tempSnapPuller.fetchLatestIndex(core);

} catch (Exception e) {

LOG.error("SnapPull failed ", e);

} finally {

tempSnapPuller = snapPuller;

snapPullLock.unlock();

}

}

最后真正的取索引数据过程，首先，若mastet节点的indexversion为0，则说明master节点根本没有提供可供复制的索引数据，若master节点和slave节点的indexversion相同，则说明slave节点目前与master节点索引数据状态保持一致，无需同步。若两者的indexversion不同，则开始索引复制过程，首先从master节点上下载指定索引版本号的索引文件列表，然后创建一个索引文件同步服务线程来完成同并工作。

这里需要区分的是，如果master节点的年代比slave节点要老，那就说明两者已经不相容，此时slave节点需要新建一个索引目录，再从master节点做一次全量索引复制。还需要注意的一点是，索引同步也是可以同步配置文件的，若配置文件发生变化，则需要对solr核进行一次reload操作。最对了，还有，和文章开头一样， slave节点同步完数据后，别忘了做一次commit操作，以便刷新自己的索引提交点到最新的状态。最后，关闭并等待同步服务线程结束。此外，具体的取索引文件是通过FileFetcher对象来完成。

boolean fetchLatestIndex(SolrCore core) throws IOException {

replicationStartTime = System.currentTimeMillis();

try {

//get the current 'replicateable' index version in the master

NamedList response = null;

try {

response = getLatestVersion();

} catch (Exception e) {

LOG.error("Master at: " + masterUrl + " is not available. Index fetch failed. Exception: " + e.getMessage());

return false;

}

long latestVersion = (Long) response.get(CMD_INDEX_VERSION);

long latestGeneration = (Long) response.get(GENERATION);

if (latestVersion == 0L) {

//there is nothing to be replicated

return false;

}

IndexCommit commit;

RefCounted<SolrIndexSearcher> searcherRefCounted = null;

try {

searcherRefCounted = core.getNewestSearcher(false);

commit = searcherRefCounted.get().getReader().getIndexCommit();

} finally {

if (searcherRefCounted != null)

searcherRefCounted.decref();

}

if (commit.getVersion() == latestVersion && commit.getGeneration() == latestGeneration) {

//master and slave are alsready in sync just return

LOG.info("Slave in sync with master.");

return false;

}

LOG.info("Master's version: " + latestVersion + ", generation: " + latestGeneration);

LOG.info("Slave's version: " + commit.getVersion() + ", generation: " + commit.getGeneration());

LOG.info("Starting replication process");

// get the list of files first

fetchFileList(latestVersion);

// this can happen if the commit point is deleted before we fetch the file list.

if(filesToDownload.isEmpty()) return false;

LOG.info("Number of files in latest index in master: " + filesToDownload.size());

// Create the sync service

fsyncService = Executors.newSingleThreadExecutor();

// use a synchronized list because the list is read by other threads (to show details)

filesDownloaded = Collections.synchronizedList(new ArrayList<Map<String, Object>>());

// if the generateion of master is older than that of the slave , it means they are not compatible to be copied

// then a new index direcory to be created and all the files need to be copied

boolean isFullCopyNeeded = commit.getGeneration() >= latestGeneration;

File tmpIndexDir = createTempindexDir(core);

if (isIndexStale())

isFullCopyNeeded = true;

successfulInstall = false;

boolean deleteTmpIdxDir = true;

File indexDir = null ;

try {

indexDir = new File(core.getIndexDir());

downloadIndexFiles(isFullCopyNeeded, tmpIndexDir, latestVersion);

LOG.info("Total time taken for download : " + ((System.currentTimeMillis() - replicationStartTime) / 1000) + " secs");

Collection<Map<String, Object>> modifiedConfFiles = getModifiedConfFiles(confFilesToDownload);

if (!modifiedConfFiles.isEmpty()) {

downloadConfFiles(confFilesToDownload, latestVersion);

if (isFullCopyNeeded) {

successfulInstall = modifyIndexProps(tmpIndexDir.getName());

deleteTmpIdxDir = false;

} else {

successfulInstall = copyIndexFiles(tmpIndexDir, indexDir);

}

if (successfulInstall) {

LOG.info("Configuration files are modified, core will be reloaded");

logReplicationTimeAndConfFiles(modifiedConfFiles, successfulInstall);//write to a file time of replication and conf files.

reloadCore();

}

} else {

terminateAndWaitFsyncService();

if (isFullCopyNeeded) {

successfulInstall = modifyIndexProps(tmpIndexDir.getName());

deleteTmpIdxDir = false;

} else {

successfulInstall = copyIndexFiles(tmpIndexDir, indexDir);

}

if (successfulInstall) {

logReplicationTimeAndConfFiles(modifiedConfFiles, successfulInstall);

doCommit();

}

}

replicationStartTime = 0;

return successfulInstall;

} catch (ReplicationHandlerException e) {

LOG.error("User aborted Replication");

} catch (SolrException e) {

throw e;

} catch (Exception e) {

throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Index fetch failed : ", e);

} finally {

if (deleteTmpIdxDir) delTree(tmpIndexDir);

else delTree(indexDir);

}

return successfulInstall;

} finally {

if (!successfulInstall) {

logReplicationTimeAndConfFiles(null, successfulInstall);

}

filesToDownload = filesDownloaded = confFilesDownloaded = confFilesToDownload = null;

replicationStartTime = 0;

fileFetcher = null;

if (fsyncService != null && !fsyncService.isShutdown()) fsyncService.shutdownNow();

fsyncService = null;

stop = false;

fsyncException = null;

}

}

深入剖析SolrCloud（四）的更多相关文章

Tomcat剖析（四）：Tomcat默认连接器(2)
Tomcat剖析(四):Tomcat默认连接器(2) 1. Tomcat剖析(一):一个简单的Web服务器 2. Tomcat剖析(二):一个简单的Servlet服务器 3. Tomcat剖析(三): ...
Tomcat剖析（四）：Tomcat默认连接器(1)
Tomcat剖析(四):Tomcat默认连接器(1) 1. Tomcat剖析(一):一个简单的Web服务器 2. Tomcat剖析(二):一个简单的Servlet服务器 3. Tomcat剖析(三): ...
深入剖析SolrCloud（一）
作者:洞庭散人出处:http://phinecos.cnblogs.com/ 本博客遵从Creative Commons Attribution 3.0 License,若用于非商业目的,您可以自由 ...
深入剖析SolrCloud（二）
作者:洞庭散人出处:http://phinecos.cnblogs.com/ 本博客遵从Creative Commons Attribution 3.0 License,若用于非商业目的,您可以自由 ...
RapidJSON 代码剖析（四）：优化 Grisu
我曾经在知乎的一个答案里谈及到 V8 引擎里实现了 Grisu 算法,我先引用该文的内容简单介绍 Grisu.然后,再谈及 RapidJSON 对它做了的几个底层优化. (配图中的<Grisù& ...
SpringMVC源码剖析（四）- DispatcherServlet请求转发的实现
SpringMVC完成初始化流程之后,就进入Servlet标准生命周期的第二个阶段,即“service”阶段.在“service”阶段中,每一次Http请求到来,容器都会启动一个请求线程,通过serv ...
Java反射机制剖析（四）-深度剖析动态代理原理及总结
动态代理类原理(示例代码参见java反射机制剖析(三)) a) 理解上面的动态代理示例流程 a) 理解上面的动态代理示例流程 b) 代理接口实现类源代码剖析咱们一起来剖析一下代理实现类($Pr ...
Linux内核剖析（四）为arm内核构建源码树
前面说到要做linux底层开发或者编写Linux的驱动,必须建立内核源码树,之前我们提到过在本机上构建源码树—-Linux内核剖析(三),其建立的源码树是针对i686平台的,但是我么嵌入式系统用的是a ...
Python 源码剖析（四）【LIST对象】
四.LIST对象 1.PyListObject对象 2.PyListObject的创建与维护 3.PyListObject 对象缓冲池 4.Hack PyListObject 1.PyListObje ...

随机推荐

兼容数组 api map代码
if(!("map" in Array.prototype)) Array.prototype.map=function(fun){ for(var i=0,arr=[]; i&l ...
Oracle临时表和SQL Server临时表的不同点对比
文章来源:http://www.codesky.net/article/201109/141401.html 1.简介 Oracle数据库除了可以保存永久表外,还可以建立临时表temporary ta ...
洛谷P1306 斐波那契公约数
题目描述对于Fibonacci数列:1,1,2,3,5,8,13......大家应该很熟悉吧~~~但是现在有一个很“简单”问题:第n项和第m项的最大公约数是多少? 输入输出格式输入格式: 两个正整 ...
设置ubantu的软件源地址
查看所用的源 $ sudo vim /etc/apt/sources.list 由于安装的Ubuntu Server 16.04.1 LTS是英文版的,软件源就默认都是 us.archive.ubun ...
Des加解密(Java端和Js端配套)解析
一.什么是DES加密 des对称加密,对称加密,是一种比较传统的加密方式,其加密运算.解密运算使用的是同样的密钥,信息的发送者和信息的接收者在进行信息的传输与处理时,必须共同持有该密码( ...
笔记：Why don't you pull up a chair and give this lifestyle a try?
Why don't you pull up a chair and give this lifestyle a try? Why don't you pull up a chair and give ...
利用 Excel 写 C51 的宏定义
利用 Excel 写 C51 的宏定义填好占空比,自动生成宏. #define LIGHT_LEVEL_00 0xFF #define LIGHT_LEVEL_10 0xE5 #define LIG ...
HTTP协议-简介
1.什么是http协议? 百度百科上的解释:超文本传输协议(HTTP,HyperText Transfer Protocol)是互联网上应用最为广泛的一种网络协议.所有的WWW文件都必须遵守这个标准. ...
生产环境连接数据库失败:Cannot create PoolableConnectionFactory❨Got mins one from a read call❩
生产环境发现有接口调不通,而且集中在两个节点,其他节点都没问题.抓取日志发现报错如下: Context initialization failed. org.springframework. bean ...
selenium自动化浏览器后台运行headless模式
通过selenium做WEB自动化的时候,必须要启动浏览器, 浏览器的启动与关闭会影响执行效率. 当我们在自己电脑运行代码时,还会影响做别的事情. 鉴于这种情况,Google针对Chrome浏览器新增 ...

深入剖析SolrCloud（四）

深入剖析SolrCloud（四）的更多相关文章

随机推荐

热门专题