Copycat - 状态

Member.Status

status的变迁是源于heartbeat

heartbeat，append空的entries

/**

   * Triggers a heartbeat to a majority of the cluster.

   * <p>

   * For followers to which no AppendRequest is currently being sent, a new empty AppendRequest will be

   * created and sent. For followers to which an AppendRequest is already being sent, the appendEntries()

   * call will piggyback on the *next* AppendRequest. Thus, multiple calls to this method will only ever

   * result in a single AppendRequest to each follower at any given time, and the returned future will be

   * shared by all concurrent calls.

   *

   * @return A completable future to be completed the next time a heartbeat is received by a majority of the cluster.

   */

  public CompletableFuture<Long> appendEntries() {

    // If there are no other active members in the cluster, simply complete the append operation.

    if (context.getClusterState().getRemoteMemberStates().isEmpty())

      return CompletableFuture.completedFuture(null);

    // If no heartbeat future already exists, that indicates there's no heartbeat currently under way.

    // Create a new heartbeat future and commit to all members in the cluster.

    if (heartbeatFuture == null) {

      CompletableFuture<Long> newHeartbeatFuture = new CompletableFuture<>();

      heartbeatFuture = newHeartbeatFuture;

      heartbeatTime = System.currentTimeMillis();

      for (MemberState member : context.getClusterState().getRemoteMemberStates()) {

        appendEntries(member); // 对所有member发起appendEntries

      }

      return newHeartbeatFuture;

    }

heartbeat的逻辑是会向所有的getRemoteMemberStates，发起heartbeat

AVAILABLE

在初始化的时候，每个ServerMember默认是Status.AVAILABLE

public final class ServerMember implements Member, CatalystSerializable, AutoCloseable {

  private Member.Type type;

  private Status status = Status.AVAILABLE;

LeaderAppender

@Override

  protected void succeedAttempt(MemberState member) {

    super.succeedAttempt(member);

    // If the member is currently marked as UNAVAILABLE, change its status to AVAILABLE and update the configuration.

    if (member.getMember().status() == ServerMember.Status.UNAVAILABLE && !leader.configuring()) {

      member.getMember().update(ServerMember.Status.AVAILABLE, Instant.now());

      leader.configure(context.getCluster().members());

    }

  }

在succeedAttempt里面会将unavailable转换成available；在super.succeedAttempt中会将fail count清空

这个当收到AppendResponseOk的时候会调用，

protected void handleAppendResponseOk(MemberState member, AppendRequest request, AppendResponse response) {

    // Reset the member failure count and update the member's availability status if necessary.

    succeedAttempt(member);

leader的心跳是通过空AppendResponse实现的，所以可以收到ResponseOK，说明member是available的

UNAVAILABLE

在fail Attempt中被调用

@Override

  protected void failAttempt(MemberState member, Throwable error) {

    super.failAttempt(member, error);

    // Verify that the leader has contacted a majority of the cluster within the last two election timeouts.

    // If the leader is not able to contact a majority of the cluster within two election timeouts, assume

    // that a partition occurred and transition back to the FOLLOWER state.

    if (System.currentTimeMillis() - Math.max(heartbeatTime(), leaderTime) > context.getElectionTimeout().toMillis() * 2) {

      LOGGER.warn("{} - Suspected network partition. Stepping down", context.getCluster().member().address());

      context.setLeader(0);

      context.transition(CopycatServer.State.FOLLOWER);

    }

    // If the number of failures has increased above 3 and the member hasn't been marked as UNAVAILABLE, do so.

    else if (member.getFailureCount() >= 3) {

      // If the member is currently marked as AVAILABLE, change its status to UNAVAILABLE and update the configuration.

      if (member.getMember().status() == ServerMember.Status.AVAILABLE && !leader.configuring()) {

        member.getMember().update(ServerMember.Status.UNAVAILABLE, Instant.now());

        leader.configure(context.getCluster().members());

      }

    }

  }

super.failAttempt中，会重置connection，和increase failcount

member.incrementFailureCount();

第一个判断Math.max(heartbeatTime(), leaderTime)

heartbeatTime

/**

   * Returns the last time a majority of the cluster was contacted.

   * <p>

   * This is calculated by sorting the list of active members and getting the last time the majority of

   * the cluster was contacted based on the index of a majority of the members. So, in a list of 3 ACTIVE

   * members, index 1 (the second member) will be used to determine the commit time in a sorted members list.

   */

  private long heartbeatTime() {

    int quorumIndex = quorumIndex();

    if (quorumIndex >= 0) {

      return context.getClusterState().getActiveMemberStates((m1, m2)-> Long.compare(m2.getHeartbeatTime(), m1.getHeartbeatTime())).get(quorumIndex).getHeartbeatTime();

    }

    return System.currentTimeMillis();

  }

这个意思将ActiveMember按heartbeat排序，然后取出quorumIndex的heartbeat，即多数派中最早的heartbeat
如果leader收到的有效heartbeat达不到多数派，说明发生脑裂

这时，leader会退化成follower

第二个判断，当一个member的failcount>3，就把他标记为UNAVAILABLE

而failAttempt，会在各种fail response里面被调用

AbstractAppender

handleAppendRequestFailure，

handleAppendResponseFailure，

handleConfigureRequestFailure，

handleInstallRequestFailure

CopycatServer.State

public enum State {

    /**

     * Represents the state of an inactive server.

     * <p>

     * All servers start in this state and return to this state when {@link #leave() stopped}.

     */

    INACTIVE,

    /**

     * Represents the state of a server that is a reserve member of the cluster.

     * <p>

     * Reserve servers only receive notification of leader, term, and configuration changes.

     */

    RESERVE,

    /**

     * Represents the state of a server in the process of catching up its log.

     * <p>

     * Upon successfully joining an existing cluster, the server will transition to the passive state and remain there

     * until the leader determines that the server has caught up enough to be promoted to a full member.

     */

    PASSIVE,

    /**

     * Represents the state of a server participating in normal log replication.

     * <p>

     * The follower state is a standard Raft state in which the server receives replicated log entries from the leader.

     */

    FOLLOWER,

    /**

     * Represents the state of a server attempting to become the leader.

     * <p>

     * When a server in the follower state fails to receive communication from a valid leader for some time period,

     * the follower will transition to the candidate state. During this period, the candidate requests votes from

     * each of the other servers in the cluster. If the candidate wins the election by receiving votes from a majority

     * of the cluster, it will transition to the leader state.

     */

    CANDIDATE,

    /**

     * Represents the state of a server which is actively coordinating and replicating logs with other servers.

     * <p>

     * Leaders are responsible for handling and replicating writes from clients. Note that more than one leader can

     * exist at any given time, but Raft guarantees that no two leaders will exist for the same {@link Cluster#term()}.

     */

    LEADER

  }

在serverContext初始化的时候，state为Inactive

public class ServerContext implements AutoCloseable {

  //......

  protected ServerState state = new InactiveState(this);

比较tricky的是，在Member里面有，

enum Type {

    /**

     * Represents an inactive member.

     * <p>

     * The {@code INACTIVE} member type represents a member which does not participate in any communication

     * and is not an active member of the cluster. This is typically the state of a member prior to joining

     * or after leaving a cluster.

     */

    INACTIVE,

    /**

     * Represents a member which does not participate in replication.

     * <p>

     * The {@code RESERVE} member type is representative of a member that does not participate in any

     * replication of state but only maintains contact with the cluster leader and is an active member

     * of the {@link Cluster}. Typically, reserve members act as standby nodes which can be

     * {@link #promote() promoted} to a {@link #PASSIVE} or {@link #ACTIVE} role when needed.

     */

    RESERVE,

    /**

     * Represents a member which participates in asynchronous replication but does not vote in elections

     * or otherwise participate in the Raft consensus algorithm.

     * <p>

     * The {@code PASSIVE} member type is representative of a member that receives state changes from

     * follower nodes asynchronously. As state changes are committed via the {@link #ACTIVE} Raft nodes,

     * committed state changes are asynchronously replicated by followers to passive members. This allows

     * passive members to maintain nearly up-to-date state with minimal impact on the performance of the

     * Raft algorithm itself, and allows passive members to be quickly promoted to {@link #ACTIVE} voting

     * members if necessary.

     */

    PASSIVE,

    /**

     * Represents a full voting member of the Raft cluster which participates fully in leader election

     * and replication algorithms.

     * <p>

     * The {@code ACTIVE} member type represents a full voting member of the Raft cluster. Active members

     * participate in the Raft leader election and replication algorithms and can themselves be elected

     * leaders.

     */

    ACTIVE,

  }

看看不同，这里面有Active，而State里面没有

除此state包含type;

意思是，memeber可以是inactive，reserve，passive和active

当member是inactive，reserve，passive时，那么server的state也和其相应

当member是active时，那么server的state，可能是follower，candidator或leader其中之一

在CopycatServer.builder中，

public static class Builder implements io.atomix.catalyst.util.Builder<CopycatServer> {

  //......

  private Member.Type type = Member.Type.ACTIVE;

而注意，transition是根据Member.type，来transition state的

/**

   * Transitions the server to the base state for the given member type.

   */

  protected void transition(Member.Type type) {

    switch (type) {

      case ACTIVE:

        if (!(state instanceof ActiveState)) {

          transition(CopycatServer.State.FOLLOWER);

        }

        break;

      case PASSIVE:

        if (this.state.type() != CopycatServer.State.PASSIVE) {

          transition(CopycatServer.State.PASSIVE);

        }

        break;

      case RESERVE:

        if (this.state.type() != CopycatServer.State.RESERVE) {

          transition(CopycatServer.State.RESERVE);

        }

        break;

      default:

        if (this.state.type() != CopycatServer.State.INACTIVE) {

          transition(CopycatServer.State.INACTIVE);

        }

        break;

    }

  }

注意Active的处理，

当Member.type为active，如果这个时候state不是ActiveState，就transition到follower；显然candidator和leader不是能直接transition过去的

可以看到上面ServerContext在初始化的时候，state的初始状态是inactive
何时会变成active，

在server bootstrap或join一个cluster时，都会调用ClusterState.join，里面会做状态的transition

@Override

  public CompletableFuture<Void> bootstrap(Collection<Address> cluster) {

    if (configuration == null) {

      if (member.type() != Member.Type.ACTIVE) {

        return Futures.exceptionalFuture(new IllegalStateException("only ACTIVE members can bootstrap the cluster"));

      } else {

        // Create a set of active members.

        Set<Member> activeMembers = cluster.stream()

          .filter(m -> !m.equals(member.serverAddress()))

          .map(m -> new ServerMember(Member.Type.ACTIVE, m, null, member.updated()))

          .collect(Collectors.toSet());

        // Add the local member to the set of active members.

        activeMembers.add(member);

        // Create a new configuration and store it on disk to ensure the cluster can fall back to the configuration.

        configure(new Configuration(0, 0, member.updated().toEpochMilli(), activeMembers));

      }

    }

    return join();

  }

@Override

  public synchronized CompletableFuture<Void> join(Collection<Address> cluster) {

    // If no configuration was loaded from disk, create a new configuration.

    if (configuration == null) {

      // Create a set of cluster members, excluding the local member which is joining a cluster.

      Set<Member> activeMembers = cluster.stream()

        .filter(m -> !m.equals(member.serverAddress()))

        .map(m -> new ServerMember(Member.Type.ACTIVE, m, null, member.updated()))

        .collect(Collectors.toSet());

      // Create a new configuration and configure the cluster. Once the cluster is configured, the configuration

      // will be stored on disk to ensure the cluster can fall back to the provided configuration if necessary.

      configure(new Configuration(0, 0, member.updated().toEpochMilli(), activeMembers)); //修改配置

    }

    return join();

  }

  /**

   * Starts the join to the cluster.

   */

  private synchronized CompletableFuture<Void> join() {

    joinFuture = new CompletableFuture<>();

    context.getThreadContext().executor().execute(() -> {

      // Transition the server to the appropriate state for the local member type.

      context.transition(member.type()); //transition state

      // Attempt to join the cluster. If the local member is ACTIVE then failing to join the cluster

      // will result in the member attempting to get elected. This allows initial clusters to form.

      List<MemberState> activeMembers = getActiveMemberStates();

      if (!activeMembers.isEmpty()) {

        join(getActiveMemberStates().iterator());

      } else {

        joinFuture.complete(null);

      }

    });

下面看看leader，candidator和follower之间的转化条件，

Leader

只有当Candidator发起vote，得到majority同意时，

context.transition(CopycatServer.State.LEADER)

/**

   * Resets the election timer.

   */

  private void sendVoteRequests() {

    //.........

    // Send vote requests to all nodes. The vote request that is sent

    // to this node will be automatically successful.

    // First check if the quorum is null. If the quorum isn't null then that

    // indicates that another vote is already going on.

    final Quorum quorum = new Quorum(context.getClusterState().getQuorum(), (elected) -> {

      complete.set(true);

      if (elected) {

        context.transition(CopycatServer.State.LEADER); //checkComplete()调用

      } else {

        context.transition(CopycatServer.State.FOLLOWER);

      }

    });

    // Once we got the last log term, iterate through each current member

    // of the cluster and vote each member for a vote.

    for (ServerMember member : votingMembers) {

      LOGGER.debug("{} - Requesting vote from {} for term {}", context.getCluster().member().address(), member, context.getTerm());

      VoteRequest request = VoteRequest.builder()

        .withTerm(context.getTerm())

        .withCandidate(context.getCluster().member().id())

        .withLogIndex(lastIndex)

        .withLogTerm(lastTerm)

        .build();

      context.getConnections().getConnection(member.serverAddress()).thenAccept(connection -> {

        connection.<VoteRequest, VoteResponse>send(request).whenCompleteAsync((response, error) -> {

          context.checkThread();

          if (isOpen() && !complete.get()) {

            if (error != null) {

              LOGGER.warn(error.getMessage());

              quorum.fail();

            } else {

                //........

              } else {

                LOGGER.debug("{} - Received successful vote from {}", context.getCluster().member().address(), member);

                quorum.succeed(); //member同意，succeeded++;checkComplete();

              }

            }

          }

        }, context.getThreadContext().executor());

      });

Candidator

只有当Follower发起Poll请求，并得到majority的同意后，

  /**

   * Polls all members of the cluster to determine whether this member should transition to the CANDIDATE state.

   */

  private void sendPollRequests() {

   final Quorum quorum = new Quorum(context.getClusterState().getQuorum(), (elected) -> {

      // If a majority of the cluster indicated they would vote for us then transition to candidate.

      complete.set(true);

      if (elected) {

        context.transition(CopycatServer.State.CANDIDATE);

      } else {

        resetHeartbeatTimeout();

      }

    });

    //......

Follower

Leader –> Follower

在LeaderAppender中，由于heartbeat触发

/**

   * Handles a {@link Response.Status#OK} response.

   */

  protected void handleAppendResponseOk(MemberState member, AppendRequest request, AppendResponse response) {

    //......

    // If we've received a greater term, update the term and transition back to follower.

    else if (response.term() > context.getTerm()) {

      context.setTerm(response.term()).setLeader(0);

      context.transition(CopycatServer.State.FOLLOWER);

    }

如果收到Response OK，但是response的term大于我的term，说明我已经不是leader了
所以要退化成follower

/**

   * Handles a {@link Response.Status#ERROR} response.

   */

  protected void handleAppendResponseError(MemberState member, AppendRequest request, AppendResponse response) {

    // If we've received a greater term, update the term and transition back to follower.

    if (response.term() > context.getTerm()) {

      context.setTerm(response.term()).setLeader(0);

      context.transition(CopycatServer.State.FOLLOWER);

对于ResponseError也一样

@Override

  protected void failAttempt(MemberState member, Throwable error) {

    super.failAttempt(member, error);

    // Verify that the leader has contacted a majority of the cluster within the last two election timeouts.

    // If the leader is not able to contact a majority of the cluster within two election timeouts, assume

    // that a partition occurred and transition back to the FOLLOWER state.

    if (System.currentTimeMillis() - Math.max(heartbeatTime(), leaderTime) > context.getElectionTimeout().toMillis() * 2) {

      LOGGER.warn("{} - Suspected network partition. Stepping down", context.getCluster().member().address());

      context.setLeader(0);

      context.transition(CopycatServer.State.FOLLOWER);

    }

failAttemp时，两个getElectionTimeout超时内，收不到majority的heartbeat，说明发生partition
退化成follower

在LeaderState中，

leader初始化失败时，

/**

   * Commits a no-op entry to the log, ensuring any entries from a previous term are committed.

   */

  private CompletableFuture<Void> commitInitialEntries() {

    // The Raft protocol dictates that leaders cannot commit entries from previous terms until

    // at least one entry from their current term has been stored on a majority of servers. Thus,

    // we force entries to be appended up to the leader's no-op entry. The LeaderAppender will ensure

    // that the commitIndex is not increased until the no-op entry (appender.index()) is committed.

    CompletableFuture<Void> future = new CompletableFuture<>();

    appender.appendEntries(appender.index()).whenComplete((resultIndex, error) -> {

      context.checkThread();

      if (isOpen()) {

        if (error == null) {

          context.getStateMachine().apply(resultIndex);

          future.complete(null);

        } else {

          context.setLeader(0);

          context.transition(CopycatServer.State.FOLLOWER);

        }

      }

    });

    return future;

  }

也会退化为follower

Candidator –> Follower

Vote失败时，退化为follower

/**

   * Resets the election timer.

   */

  private void sendVoteRequests() {

    //......

    // Send vote requests to all nodes. The vote request that is sent

    // to this node will be automatically successful.

    // First check if the quorum is null. If the quorum isn't null then that

    // indicates that another vote is already going on.

    final Quorum quorum = new Quorum(context.getClusterState().getQuorum(), (elected) -> {

      complete.set(true);

      if (elected) {

        context.transition(CopycatServer.State.LEADER);

      } else {

        context.transition(CopycatServer.State.FOLLOWER); //没被选中

      }

    });

ActiveState –> Follower

包含LeaderState，CandidatorState，在响应vote，append请求时，都会下面的逻辑

    // If the request indicates a term that is greater than the current term then

    // assign that term and leader to the current context and transition to follower.

    boolean transition = updateTermAndLeader(request.term(), request.leader());

    // If a transition is required then transition back to the follower state.

    // If the node is already a follower then the transition will be ignored.

    if (transition) {

      context.transition(CopycatServer.State.FOLLOWER);

    }

/**

   * Updates the term and leader.

   */

  protected boolean updateTermAndLeader(long term, int leader) {

    // If the request indicates a term that is greater than the current term or no leader has been

    // set for the current term, update leader and term.

    if (term > context.getTerm() || (term == context.getTerm() && context.getLeader() == null && leader != 0)) {

      context.setTerm(term);

      context.setLeader(leader);

      // Reset the current cluster configuration to the last committed configuration when a leader change occurs.

      context.getClusterState().reset();

      return true;

    }

    return false;

  }

Copycat - 状态的更多相关文章

Copycat - StateMachine
看下用户注册StateMachine的过程, CopycatServer.Builder builder = CopycatServer.builder(address); builder.withS ...
Copycat - Overview
Copycat’s primary role is as a framework for building highly consistent, fault-tolerant replicated s ...
Copycat - MemberShip
https://github.com/atomix/copycat http://atomix.io/copycat/docs/membership/ 为了便于实现,Copycat把membe ...
【小程序分享篇二】web在线踢人小程序，维持用户只能在一个台电脑持登录状态
最近离职了, 突然记起来还一个小功能没做, 想想也挺简单,留下代码和思路给同事做个参考. 换工作心里挺忐忑, 对未来也充满了憧憬与担忧.(虽然已是老人, 换了N次工作了,但每次心里都和忐忑). 写写代 ...
Http状态码之：301、302重定向
概念 301 Moved Permanently 被请求的资源已永久移动到新位置,并且将来任何对此资源的引用都应该使用本响应返回的若干个URI之一.如果可能,拥有链接编辑功能的客户端应当自动把请求的地 ...
C# 利用性能计数器监控网络状态
本例是利用C#中的性能计数器(PerformanceCounter)监控网络的状态.并能够直观的展现出来涉及到的知识点: PerformanceCounter,表示 Windows NT 性能计数器 ...
无法向会话状态服务器发出会话状态请求。请确保 ASP.NET State Service (ASP.NET 状态服务)已启动，并且客户端端口与服务器端口相同。如果服务器位于远程计算机上，请检查。。。
异常处理汇总-服务器 http://www.cnblogs.com/dunitian/p/4522983.html 无法向会话状态服务器发出会话状态请求.请确保 ASP.NET State Ser ...
JavaScript var关键字、变量的状态、异常处理、命名规范等介绍
本篇主要介绍var关键字.变量的undefined和null状态.异常处理.命名规范. 目录 1. var 关键字:介绍var关键字的使用. 2. 变量的状态:介绍变量的未定义.已定义未赋值.已定义已 ...
【.net 深呼吸】启动一个进程并实时获取状态信息
地球人和火星人都知道,Process类既可以获取正在运行的进程,也可以启动一个新的进程.在79.77%应用场合,我们只需要让目标进程顺利启动就完事了,至于它执行了啥,有没有出错,啥时候退出就不管了. ...

随机推荐

①Android NuPlayer播放框架
[时间:2016-09] [状态:Open] [关键词:android,nuplayer,开源播放器,播放框架,nuplayerdriver] 0 NuPlayer简介 Android2.3时引入流媒 ...
【九天教您南方cass 9.1】 07 绘制与标注圆曲线和细部点的方法
同学们大家好,欢迎收看由老王测量上班记出品的cass9.1视频课程我是本节课主讲老师九天. 我们讲课的教程附件也是共享的,请注意索取测量空间中. [点击索取cass教程]5元立得 (给客服说暗号:“ ...
RTMP之后，SRT与QUIC
RTMP协议存在累计延迟与加密方面的问题,为适应互联网视频低延时,高质量的要求,以UDP为核心,具有创造性的SRT,QUIC等流媒体视频方式将成为新的选择 RTMP协议最初是由Macromedia为通 ...
openCV函数
1.cvInitFont ,, ); font 被初始化的字体结构体. font_face 字体名称标识符.只是Hershey 字体集( http://sources.isc.org/utils/mi ...
Java知多少（35）Object类
Object 类位于 java.lang 包中,是所有 Java 类的祖先,Java 中的每个类都由它扩展而来. 定义Java类时如果没有显示的指明父类,那么就默认继承了 Object 类.例如: p ...
关于使用maven打包如何聚合资源文件
多数情况下,我们使用maven管理多个子工程,在最后maven打包阶段将多个子工程聚合到一个jar或war包.单个子工程会有自己独立的资源配置文件,在打包的时候我们需要将其聚合在一起(各子工程中的配置 ...
spring mvc中获取请求URL
String baseUrl=request.getScheme()+"://"+request.getServerName()+":"+request.get ...
三、主流区块链技术特点及Hyperledger Fabric V1.0版本特点
一.Hyperledger fabric V1.0 架构 1.逻辑架构: 2.区块链网络 3.运行时架构二.架构总结 1.架构要点分拆Peer的功能,将Blockchain的数据维护和共识服务进行 ...
Spark学习笔记——房屋价格预测
先翻译了一下给的房屋数据的特征,这里定义了一个case class,方便理解每个特征的含义, Kaggle的房价数据集使用的是Ames Housing dataset,是美国爱荷华州的艾姆斯镇2006 ...
duilib进阶教程 -- 图片和文字的位置调整 (5)
已经有8个晚上没写教程啦,因为之后遇到了一些问题,主要是TreeView控件的问题,这个问题搞了几个晚上,然后还需要调试代码才能知道它的用法,虽然能够调试出来,但毕竟没什么含金量,只是重复劳动而已,相 ...