ES6.3.2 副本失败处理

副本的失败处理对理解ES的数据副本模型很有帮助。在ES6.3.2 index操作源码流程的总结中提到:ES的写操作会先写主分片,然后主分片再将操作同步到副本分片。本文给出ES中的源码片断,分析副本执行操作失败时,ES是如何处理的。

副本执行源码:replicasProxy.performOn实现了副本操作,执行正常结束回调onResponse(),异常回调onFailure()

replicasProxy.performOn(shard, replicaRequest, globalCheckpoint, new ActionListener<ReplicaResponse>() {
@Override
public void onResponse(ReplicaResponse response) {
successfulShards.incrementAndGet();
try {
primary.updateLocalCheckpointForShard(shard.allocationId().getId(), response.localCheckpoint());//执行成功回调更新检查点
primary.updateGlobalCheckpointForShard(shard.allocationId().getId(), response.globalCheckpoint());
} catch (final AlreadyClosedException e) {
// okay, the index was deleted or this shard was never activated after a relocation; fall through and finish normally
} catch (final Exception e) {
// fail the primary but fall through and let the rest of operation processing complete
final String message = String.format(Locale.ROOT, "primary failed updating local checkpoint for replica %s", shard);
primary.failShard(message, e);
}
decPendingAndFinishIfNeeded();//不管是正常的onResponse还是异常的onFailure,都会调用这个方法,代表已经完成了一个操作,pendingActions减1
} @Override
public void onFailure(Exception replicaException) {
logger.trace(() -> new ParameterizedMessage(
"[{}] failure while performing [{}] on replica {}, request [{}]",
shard.shardId(), opType, shard, replicaRequest), replicaException);
// Only report "critical" exceptions - TODO: Reach out to the master node to get the latest shard state then report.
if (TransportActions.isShardNotAvailableException(replicaException) == false) {
RestStatus restStatus = ExceptionsHelper.status(replicaException);
shardReplicaFailures.add(new ReplicationResponse.ShardInfo.Failure(
shard.shardId(), shard.currentNodeId(), replicaException, restStatus, false));
}
String message = String.format(Locale.ROOT, "failed to perform %s on replica %s", opType, shard);
//---> failShardIfNeeded 具体执行何种操作要看 replicasProxy的真正实现类:如果是WriteActionReplicasProxy则会报告shard错误
replicasProxy.failShardIfNeeded(shard, message,
replicaException, ReplicationOperation.this::decPendingAndFinishIfNeeded,
ReplicationOperation.this::onPrimaryDemoted, throwable -> decPendingAndFinishIfNeeded());
}
});
}

执行正常结束回调onResponse()

successfulShards.incrementAndGet();,在返回的结果里面,_shards 字段里面就能看到 successful 数值。

更新 local checkpoint 和 global checkpoint:如果检查点更新失败,触发:replica shard engine 关闭。

/**
* Fails the shard and marks the shard store as corrupted if
* <code>e</code> is caused by index corruption
*
* org.elasticsearch.index.shard.IndexShard#failShard
*/
public void failShard(String reason, @Nullable Exception e) {
// fail the engine. This will cause this shard to also be removed from the node's index service.
getEngine().failEngine(reason, e);
}
fail engine due to some error. the engine will also be closed.
The underlying store is marked corrupted iff failure is caused by index corruption

关于检查点,可参考这篇文章:elasticsearch-sequence-ids-6-0

异常结束回调 onFailure()

replicasProxy.failShardIfNeeded(shard, message,
replicaException, ReplicationOperation.this::decPendingAndFinishIfNeeded,
ReplicationOperation.this::onPrimaryDemoted, throwable -> decPendingAndFinishIfNeeded());

failShardIfNeeded 可以做2件事情,具体是如何执行得看failShardIfNeeded的实现类。

  1. onPrimaryDemoted

    通知master primary stale(过时)了。index操作首先在primary shard执行成功了,然后同步给replica,但是replica发现此primary shard 的 primary term 比它知道的该索引的primary term 还小,于是replica就认为此primary shard是一个已经过时了的primary shard,因此就回调onFailure()拒绝执行,并执行onPrimaryDemoted通知master节点。

    private void onPrimaryDemoted(Exception demotionFailure) {
    String primaryFail = String.format(Locale.ROOT,
    "primary shard [%s] was demoted while failing replica shard",
    primary.routingEntry());
    // we are no longer the primary, fail ourselves and start over
    primary.failShard(primaryFail, demotionFailure);
    finishAsFailed(new RetryOnPrimaryException(primary.routingEntry().shardId(), primaryFail, demotionFailure));
    }

  2. decPendingAndFinishIfNeeded

    计数。一个请求会由ReplicationGroup中的 多个分片执行,这些分片是否都已经执行完成了?就由pendingActions计数。不管是执行正常结束onResponse还是异常结束onFailure都会调用这个方法。

    private void decPendingAndFinishIfNeeded() {
    assert pendingActions.get() > 0 : "pending action count goes below 0 for request [" + request + "]";
    if (pendingActions.decrementAndGet() == 0) {//当所有的shard都处理完这个请求,client收到ACK(里面允许一些replica执行失败), 或者是收到一个请求超时的响应
    finish();
    }
    }

    对于发起index操作的Client而言,该 index 操作会由primary shard 执行,也会由若干个replica执行。因此,pendingActions统计到底有多少个分片(既包括主分片也包括副本分片)执行完成(在某些副本分片上执行失败也算执行完成)了。正是由于不管是 onResponse() 还是 onFailure(),都会执行decPendingAndFinishIfNeeded()方法,每执行一次,意味着有一个分片返回了响应,这时if (pendingActions.decrementAndGet() == 0)就减1,直到减为0时,调用finish()方法给Client返回ACK响应。

    private void finish() {
if (finished.compareAndSet(false, true)) {
final ReplicationResponse.ShardInfo.Failure[] failuresArray;
if (shardReplicaFailures.isEmpty()) {
failuresArray = ReplicationResponse.EMPTY;
} else {
failuresArray = new ReplicationResponse.ShardInfo.Failure[shardReplicaFailures.size()];
shardReplicaFailures.toArray(failuresArray);
}
primaryResult.setShardInfo(new ReplicationResponse.ShardInfo(
totalShards.get(),
successfulShards.get(),
failuresArray
)
);
resultListener.onResponse(primaryResult);
}
}

Client要么收到一个执行成功的ACK(默认情况下,只要primary shard执行成功,若存在 replica执行失败,Client也会收到一个执行成功的ACK,只不过 返回的ACK里面 _shards参数下的 failed 不为0而已),如下:

{

"_index": "user",

"_type": "profile",

"_id": "10",

"_version": 1,

"result": "created",

"_shards": {

​ "total": 3,

​ "successful": 1,

​ "failed": 0

},

"_seq_no": 0,

"_primary_term": 1

}

另外,ES6.3.2 index操作源码流程 的总结部分,详细解释了Client收到执行成功的ACK的原因。

要么收到一个超时ACK,如下:(这篇文章提到了如何产生一个超时的ACK)

{

"statusCode": 504,

"error": "Gateway Time-out",

"message": "Client request timeout"

}

failShardIfNeeded方法一共有2个具体实现,看类图:

TransportReplicationAction.ReplicasProxy#failShardIfNeeded (默认实现)

@Override
public void failShardIfNeeded(ShardRouting replica, String message, Exception exception,Runnable onSuccess, Consumer<Exception> onPrimaryDemoted, Consumer<Exception> onIgnoredFailure) {
// This does not need to fail the shard. The idea is that this
// is a non-write operation (something like a refresh or a global
// checkpoint sync) and therefore the replica should still be
// "alive" if it were to fail.
onSuccess.run();
}

TransportResyncReplicationAction.ResyncActionReplicasProxy#failShardIfNeeded(副本resync操作的实现)

/**
* A proxy for primary-replica resync operations which are performed on replicas when a new primary is promoted.
* Replica shards fail to execute resync operations will be failed but won't be marked as stale.
* This avoids marking shards as stale during cluster restart but enforces primary-replica resync mandatory.
*/
class ResyncActionReplicasProxy extends ReplicasProxy {
@Override
public void failShardIfNeeded(ShardRouting replica, String message, Exception exception, Runnable onSuccess,
Consumer<Exception> onPrimaryDemoted, Consumer<Exception> onIgnoredFailure) {
shardStateAction.remoteShardFailed(replica.shardId(), replica.allocationId().getId(), primaryTerm, false, message, exception,
createShardActionListener(onSuccess, onPrimaryDemoted, onIgnoredFailure));
}
}

TransportWriteAction.WriteActionReplicasProxy#failShardIfNeeded(index 写操作的实现)

/**
* A proxy for <b>write</b> operations that need to be performed on the
* replicas, where a failure to execute the operation should fail
* the replica shard and/or mark the replica as stale.
*
* This extends {@code TransportReplicationAction.ReplicasProxy} to do the
* failing and stale-ing.
*/
class WriteActionReplicasProxy extends ReplicasProxy { @Override
public void failShardIfNeeded(ShardRouting replica, String message, Exception exception,Runnable onSuccess, Consumer<Exception> onPrimaryDemoted, Consumer<Exception> onIgnoredFailure) {
if (TransportActions.isShardNotAvailableException(exception) == false) {
logger.warn(new ParameterizedMessage("[{}] {}", replica.shardId(), message), exception);}
shardStateAction.remoteShardFailed(replica.shardId(), replica.allocationId().getId(), primaryTerm, true, message, exception,
createShardActionListener(onSuccess, onPrimaryDemoted, onIgnoredFailure));
}

总结

从上面代码中可看出:副本resync操作、副本上 index 写操作失败都会导致 调用 onPrimaryDemoted() 方法,通知master节点判断当前primary shard 是否已经过时(stale)。这可以说是:replica 检验 primary shard是否stale的方式。

另外,primary shard 和 各个replica之间也会通过 租约机制 进行故障检测,以判断对方是否stale,不过这不是本文要讨论的内容了。

参考文章:

原文:https://www.cnblogs.com/hapjin/p/10585555.html

ES6.3.2 副本失败处理的更多相关文章

  1. 分布式入门之2:Quorum机制

    1.  全写读1(write all, read one) 全写读1是最直观的副本控制规则.写时,只有全部副本写成功,才算是写成功.这样,读取时只需要从其中一个副本上读数据,就能保证正确性. 这种规则 ...

  2. HDFS原理讲解

    简介 本文是笔者在学习HDFS的时候的学习笔记整理, 将HDFS的核心功能的原理都整理在这里了. [广告] 如果你喜欢本博客,请点此查看本博客所有文章:http://www.cnblogs.com/x ...

  3. SQL Server Alwayson概念总结

    一.alwayson概念 “可用性组” 针对一组离散的用户数据库(称为“可用性数据库” ,它们共同实现故障转移)支持故障转移环境. 一个可用性组支持一组主数据库以及一至八组对应的辅助数据库(包括一个主 ...

  4. elasticsearch系列七:ES Java客户端-Elasticsearch Java client(ES Client 简介、Java REST Client、Java Client、Spring Data Elasticsearch)

    一.ES Client 简介 1. ES是一个服务,采用C/S结构 2. 回顾 ES的架构 3. ES支持的客户端连接方式 3.1 REST API ,端口 9200 这种连接方式对应于架构图中的RE ...

  5. GFS浅析

    1 . 简介 GFS, Big Table, Map Reduce称为Google的三驾马车,是许多基础服务的基石 GFS于2003年提出,是一个分布式的文件系统,与此前的很多分布式系统的前提假设存在 ...

  6. ES系列十五、ES常用Java Client API

    一.简介 1.先看ES的架构图 二.ES支持的客户端连接方式 1.REST API http请求,例如,浏览器请求get方法:利用Postman等工具发起REST请求:java 发起httpClien ...

  7. 面试小结之Elasticsearch篇(转)

    最近面试一些公司,被问到的关于Elasticsearch和搜索引擎相关的问题,以及自己总结的回答. Elasticsearch是如何实现Master选举的? Elasticsearch的选主是ZenD ...

  8. Data Partitioning Guidance

    在很多大规模的解决方案中,数据都是分成单独的分区,可以分别进行管理和访问的.而分割数据的策略必须仔细的斟酌才能够最大限度的提高效益,同时最大限度的减少不利影响.数据的分区可以极大的提升可扩展性,降低争 ...

  9. Elasticsearch基础知识要点QA

    前言:本文为学习整理实践他人成果的记录型博客.在此统一感谢各原作者,如果你对基础知识不甚了解,可以通过查看Elasticsearch权威指南中文版, 此处注意你的elasticsearch版本,版本不 ...

随机推荐

  1. go语言框架gin之集成swagger

    1.安装swag 在goLand中直接使用go get -u github.com/swaggo/swag/cmd/swag命令安装会报错 翻了很多博客,都没找到太合适的办法,根据博客中所写的操作还是 ...

  2. django xadmin 1不在可用的选项中

    报错:1不在可用的选项中 解决办法: 对CharField的choices的选项, gender = models.CharField(max_length=, choices=((, , " ...

  3. c#实现用SQL池(多线程),定时批量执行SQL语句 【转】

    在实际项目开发中,业务逻辑层的处理速度往往很快,特别是在开发Socket通信服务的时候,网络传输很快,但是一旦加上数据库操作,性能一落千丈,数据库操作的效率往往成为一个系统整体性能的瓶颈.面对这问题, ...

  4. Delphi中常用字符串处理函数

    .copy(str,pos,num) 从str字符串的pos处开始,截取num个字符的串返回. 假设str为,)=,)='def' .concat(str1,str2{,strn}) 把各自变量连接起 ...

  5. 深入理解内存映射mmap

    内存映射mmap是Linux内核的一个重要机制,它和虚拟内存管理以及文件IO都有直接的关系,这篇细说一下mmap的一些要点. 修改(2015-11-12):Linux的虚拟内存管理是基于mmap来实现 ...

  6. Python Learning: 03

    An inch is worth a pound of gold, an inch of gold is hard to buy an inch of time. Slice When the sca ...

  7. RocketMQ4.3.X关于设置useEpollNativeSelector = true报错问题

    前一阵子刚整理完RocketMQ4.3.x版本的相关配置的工作,接下来就来测试一下改变参数会带来什么好的结果 首先我就选中了useEpollNativeSelector 这个参数 默认这个参数是 fa ...

  8. 项目中遇到angular时间插件datetinepicker汉化问题

    问题描述: 测试需要中文的时间插件: 参考资料: angularjs封装bootstrap官网的时间插件datetimepicker https://www.cnblogs.com/cynthia-w ...

  9. animation动画案例

    最近一直苦恼做一个banner的进度条,原先用js改变width值,但明显卡顿.后来用了animation,超级好用. <!DOCTYPE html> <html lang=&quo ...

  10. 在Winform开发框架中对附件文件进行集中归档处理

    在我们Winform开发中,往往需要涉及到附件的统一管理,因此我倾向于把它们独立出来作为一个附件管理模块,这样各个模块都可以使用这个附件管理模块,更好的实现模块重用的目的.在涉及附件管理的场景中,一个 ...