[ES版本]

5.5.0

[分析过程]

找到Recovery有6种状态

public class RecoveryState implements ToXContent, Streamable {

    public enum Stage {
//初始化状态
INIT((byte) 0), /**
* recovery of lucene files, either reusing local ones are copying new ones
*/
//从lucene或本地文件复制新文件
INDEX((byte) 1), /**
* potentially running check index
*/
//检查确认索引
VERIFY_INDEX((byte) 2), /**
* starting up the engine, replaying the translog
*/
//启动引擎重放translog
TRANSLOG((byte) 3), /**
* performing final task after all translog ops have been done
*/
//translog结束后执行最终任务
FINALIZE((byte) 4), //完成状态
DONE((byte) 5);
...
}
...
}

shard有4种状态

public enum ShardRoutingState {
/**
* The shard is not assigned to any node.
*/
//分片未分配
UNASSIGNED((byte) 1),
/**
* The shard is initializing (probably recovering from either a peer shard
* or gateway).
*/
//分片正在初始化(可能正在从peer shard或者gateway进行恢复)
INITIALIZING((byte) 2),
/**
* The shard is started.
*/
//分片已经启动
STARTED((byte) 3),
/**
* The shard is in the process being relocated.
*/
//分片正在迁移
RELOCATING((byte) 4);
...
}

找到一处调用位置:PeerRecoverySourceService

/**
* The source recovery accepts recovery requests from other peer shards and start the recovery process from this
* source shard to the target shard.
*/
public class PeerRecoverySourceService extends AbstractComponent implements IndexEventListener { public static class Actions {
public static final String START_RECOVERY = "internal:index/shard/recovery/start_recovery";
} private final TransportService transportService;
private final IndicesService indicesService;
private final RecoverySettings recoverySettings; private final ClusterService clusterService; private final OngoingRecoveries ongoingRecoveries = new OngoingRecoveries(); @Inject
public PeerRecoverySourceService(Settings settings, TransportService transportService, IndicesService indicesService,
RecoverySettings recoverySettings, ClusterService clusterService) {
super(settings);
this.transportService = transportService;
this.indicesService = indicesService;
this.clusterService = clusterService;
this.recoverySettings = recoverySettings;
transportService.registerRequestHandler(Actions.START_RECOVERY, StartRecoveryRequest::new, ThreadPool.Names.GENERIC, new StartRecoveryTransportRequestHandler());
} //在分片关闭前,要把所有正在recovery的动作中止掉
@Override
public void beforeIndexShardClosed(ShardId shardId, @Nullable IndexShard indexShard,
Settings indexSettings) {
if (indexShard != null) {
ongoingRecoveries.cancel(indexShard, "shard is closed");
}
} private RecoveryResponse recover(StartRecoveryRequest request) throws IOException {
final IndexService indexService = indicesService.indexServiceSafe(request.shardId().getIndex());
final IndexShard shard = indexService.getShard(request.shardId().id()); // starting recovery from that our (the source) shard state is marking the shard to be in recovery mode as well, otherwise
// the index operations will not be routed to it properly
//先判断目的节点是否正常存在集群中
RoutingNode node = clusterService.state().getRoutingNodes().node(request.targetNode().getId());
//如果不在集群,则推迟进行recovery
if (node == null) {
logger.debug("delaying recovery of {} as source node {} is unknown", request.shardId(), request.targetNode());
throw new DelayRecoveryException("source node does not have the node [" + request.targetNode() + "] in its state yet..");
} ShardRouting routingEntry = shard.routingEntry();
//是主分片并且当前分片非迁移状态
// 或者
//是主分片且处于迁移状态,但是目标节点与正在迁移的目标节点不一致
if (request.isPrimaryRelocation() && (routingEntry.relocating() == false || routingEntry.relocatingNodeId().equals(request.targetNode().getId()) == false)) {
logger.debug("delaying recovery of {} as source shard is not marked yet as relocating to {}", request.shardId(), request.targetNode());
throw new DelayRecoveryException("source shard is not marked yet as relocating to [" + request.targetNode() + "]");
} ShardRouting targetShardRouting = node.getByShardId(request.shardId());
//节点上未获取到目标分片
if (targetShardRouting == null) {
logger.debug("delaying recovery of {} as it is not listed as assigned to target node {}", request.shardId(), request.targetNode());
throw new DelayRecoveryException("source node does not have the shard listed in its state as allocated on the node");
}
//目标分片非初始化状态
if (!targetShardRouting.initializing()) {
logger.debug("delaying recovery of {} as it is not listed as initializing on the target node {}. known shards state is [{}]",
request.shardId(), request.targetNode(), targetShardRouting.state());
throw new DelayRecoveryException("source node has the state of the target shard to be [" + targetShardRouting.state() + "], expecting to be [initializing]");
} //请求中未携带分配的ID,需要重新构造请求
if (request.targetAllocationId() == null) {
// ES versions < 5.4.0 do not send targetAllocationId as part of recovery request, just assume that we have the correct id
request = new StartRecoveryRequest(request.shardId(), targetShardRouting.allocationId().getId(), request.sourceNode(),
request.targetNode(), request.metadataSnapshot(), request.isPrimaryRelocation(), request.recoveryId());
}

//请求中携带的目标分配ID与分片的分配唯一标识ID不一致
if (request.targetAllocationId().equals(targetShardRouting.allocationId().getId()) == false) {
logger.debug("delaying recovery of {} due to target allocation id mismatch (expected: [{}], but was: [{}])",
request.shardId(), request.targetAllocationId(), targetShardRouting.allocationId().getId());
throw new DelayRecoveryException("source node has the state of the target shard to have allocation id [" +
targetShardRouting.allocationId().getId() + "], expecting to be [" + request.targetAllocationId() + "]");
} //往正在recovery的列表中增加一个新的recovery
RecoverySourceHandler handler = ongoingRecoveries.addNewRecovery(request, shard);
logger.trace("[{}][{}] starting recovery to {}", request.shardId().getIndex().getName(), request.shardId().id(), request.targetNode());
try {
return handler.recoverToTarget();
} finally {
ongoingRecoveries.remove(shard, handler);
}
}
...
}

StartRecoveryTransportRequestHandler中的messageReceived负责从transport channel接收启动recovery的请求并执行recover操作。

<未完待续>

ElasticSearch recovery过程源码分析的更多相关文章

  1. 转:InnoDB Crash Recovery 流程源码实现分析

    此文章转载给登博的文章,给大家分享 InnoDB Crash Recovery 流程源码实现分析 Crash Recovery问题 本文主要分析了InnoDB整个crash recovery的源码处理 ...

  2. ElasticSearch Index操作源码分析

    ElasticSearch Index操作源码分析 本文记录ElasticSearch创建索引执行源码流程.从执行流程角度看一下创建索引会涉及到哪些服务(比如AllocationService.Mas ...

  3. [Android]从Launcher开始启动App流程源码分析

    以下内容为原创,欢迎转载,转载请注明 来自天天博客:http://www.cnblogs.com/tiantianbyconan/p/5017056.html 从Launcher开始启动App流程源码 ...

  4. [Android]Android系统启动流程源码分析

    以下内容为原创,欢迎转载,转载请注明 来自天天博客:http://www.cnblogs.com/tiantianbyconan/p/5013863.html Android系统启动流程源码分析 首先 ...

  5. Android系统默认Home应用程序(Launcher)的启动过程源码分析

    在前面一篇文章中,我们分析了Android系统在启动时安装应用程序的过程,这些应用程序安装好之后,还须要有一个Home应用程序来负责把它们在桌面上展示出来,在Android系统中,这个默认的Home应 ...

  6. Android Content Provider的启动过程源码分析

    本文參考Android应用程序组件Content Provider的启动过程源码分析http://blog.csdn.net/luoshengyang/article/details/6963418和 ...

  7. Android应用程序绑定服务(bindService)的过程源码分析

    Android应用程序组件Service与Activity一样,既能够在新的进程中启动,也能够在应用程序进程内部启动:前面我们已经分析了在新的进程中启动Service的过程,本文将要介绍在应用程序内部 ...

  8. Spring加载流程源码分析03【refresh】

      前面两篇文章分析了super(this)和setConfigLocations(configLocations)的源代码,本文来分析下refresh的源码, Spring加载流程源码分析01[su ...

  9. 【高速接口-RapidIO】5、Xilinx RapidIO核例子工程源码分析

    提示:本文的所有图片如果不清晰,请在浏览器的新建标签中打开或保存到本地打开 一.软件平台与硬件平台 软件平台: 操作系统:Windows 8.1 64-bit 开发套件:Vivado2015.4.2 ...

随机推荐

  1. linux配置防火墙打开3306端口

      安装完MYSQL服务器后在本机所有操作都正常, 但在其它机器上远程访问这个MYSQL服务器时怎么都连接不上.  shit! 怀疑是端口问题, 结果: telnet 192.168.1.245 33 ...

  2. 数据库已有时间索引,想再添加ID索引

    将时间索引的代码复制进去后,将编辑框的变量改为m_QueryID. 准备先以时间索引查找出大概数据,再直接使用ID索引精确查找. 于是想直接精确查ID为105的数据信息. 出现错误: 发现错误原因是忘 ...

  3. 将Oracle数据库转换为SQL Server

    (转发)近期为公司的一个项目数据库进行了转换,将Oracle的Db转换为SqlServer(2000或2005均可),一开始在网上找了一些资料,发现有个工具叫SwisSql的,尝试了一下,没成功,继续 ...

  4. 编程之美 set 1 不要被阶乘吓倒

    总结 1. 使用加法解决指数问题时, 可用背包问题的变形 2. 题目用到的公式和求解 1~N 中 1 出现的次数的公式类似 题目 1. 给定一个整数 N, 那么 N 的阶乘 N! 末尾有多少个 0 呢 ...

  5. iOS 8出色的跨应用通信效果:解读Action扩展

    本文转载至 http://mobile.51cto.com/iphone-464809.htm 用程序扩展最初于WWDC 2014大会上正式亮相,这是一种将iOS应用程序功能扩展至系统其它组成部分的途 ...

  6. 遇到OutOfMemoryException异常了

    遇到OutOfMemoryException异常了 2008-11-28 09:52 asp.net做的售后服务系统运行了快1年了,昨天在做全年数据导出的时候出现OutOfMemoryExceptio ...

  7. SSM框架---搭建

    SSM框架简介 SSM框架,是spring + spring MVC + MyBatis的缩写,这个是继SSH之后,目前比较主流的Java EE企业级框架,适用于搭建各种大型的企业级应用系统. Spr ...

  8. 安装TortoiseSVN客户端时遇到的异常

    环境:WindowsXP,安装 双击SVN安装程序"TortoiseSVN-1.8.5.25224-win32-svn-1.8.8.msi"后,出现 "无法通过Sindo ...

  9. 160520、MyBatis的几种批量操作

    MyBatis中批量插入 方法一: <insert id="insertbatch" parameterType="Java.util.List"> ...

  10. Maven的安装配置及初次创建项目与java单元测试工具JUnit

    Maven  安装     1.把maven安装包解压到某个位置     2.配置M2_HOME环境变量指向这个位置 3.在path环境变量中添加;%M2_HOME%\bin 配置镜像 国内的阿里云镜 ...