MIT 6.824 Lab2D Raft之日志压缩

书接上文Raft Part C | MIT 6.824 Lab2C Persistence。

实验准备

实验代码：git://g.csail.mit.edu/6.824-golabs-2021/src/raft
如何测试：go test -run 2D -race
相关论文：Raft Extended Section 7
实验指导：6.824 Lab 2: Raft (mit.edu)

实验目标

实现Snapshot、CondInstallSnapshot、InstallSnapshot RPC，并修改之前的代码以支持本次实验的内容。

一些提示

不要使用论文中的偏移机制为数据分片，每个分片作为一个快照。而是每次RPC发送全部数据作为一个快照。
丢弃旧日志的全部引用，以便GC回收。
由于保存快照要丢弃部分日志，不能再使用日志长度来作为索引日志的标准。
考虑是否需要持久化lastIncludeTerm和lastIncludeIndex。
使用rf.persister.SaveStateAndSnapshot()持久化快照。

日志压缩

日志序列不断扩张，是无法全部存储在内存中的，对于已经应用到状态机的部分日志，就不再需要维护在Raft中。

但由于仍可能存在部分Follower的日志序列远远落后于Leader，因此这部分日志不能被Leader丢弃，在同步日志时，若Leader中原应被同步的日志在快照中，则将快照发送给Follower。

lastIncluedTerm & lastIncludeIndex

日志压缩后，Raft需要记录额外的两个信息，lastIncludeIndex、lastIncludeTerm表示快照中最后一个log的index和Term。

此处设计新的log类型如下。

type Log struct {
    Entries []LogEntry
    Base    int
}

需要注意的是，Log.Entries从1开始存储，因此Log.Entries[0].Term用于存储lastIncludeTerm，Log.Base表示Log.Entries[0]的逻辑位置，也是lastIncludeIndex的值。

本例中，lastIncludeIndex = 4，lastIncludeTerm = 2，snapshot = [1,1,1,2]。

为Log添加相关成员函数。

func (l *Log) size() {
    return l.Base + len(l.Entries)
}
func (l *Log) get(i int) {
    return l.Entries[i-l.Base]
}
func (l *Log) set(i int, e LogEntry) {
    l.[i-l.Base] = e
}

Snapshot()

Snapshot(index int, snapshot []byte)由状态机调用，传入的index表示lastIncludeIndex，snapshot由状态机生成，需要Raft保存，用于发送Follower时需要。

func (rf *Raft) Snapshot(index int, snapshot []byte) {
    if index <= rf.log.Base {
        return
    }
    rf.log.Entries = rf.log.Entries[index-rf.log.Base:]
    rf.log.Base = index
    rf.snapshot = snapshot
    rf.saveStateAndSnapshot()
}

index <= rf.log.Base说明传入的snapshot是一个旧的快照。

InstallSnapshot RPC

首先是heartbeat()应该新增如下逻辑，当Leader中应被同步到Follower的日志在快照中时，将快照发送给Follower。

if next <= rf.log.Base {
    go rf.sendSnapshot(i, peer, InstallSnapshotArgs{
        Term: rf.currentTerm,
        LastIncludeIndex: rf.log.Base,
        LastIncludeTerm: rf.log.Entries[0].Term,
        Data: rf.snapshot,
    })
}

sendSnapshot()和发送日志序列类似。

func (rf *Raft) sendSnapshot(id int, peer *labrpc.ClientEnd, args InstallSnapshotArgs) {
	reply := InstallSnapshotReply{}
	ok := peer.Call("Raft.InstallSnapshot", &args, &reply)
	if !ok {
		return
	}
	if reply.Term > rf.currentTerm {
		rf.toFollower(reply.Term)
		return
	}
	rf.nextIndex[id] = args.LastIncludedIndex + 1
	rf.matchIndex[id] = args.LastIncludedIndex
}

InstallSnapshot()和AppendEntries()类似，args.LastIncludedIndex <= rf.log.Base也是一样的，表示一个旧的快照。

func (rf *Raft) InstallSnapshot(args *InstallSnapshotArgs, reply *InstallSnapshotReply) {
	rf.lastRecv = time.Now()
	if args.Term > rf.currentTerm {
		rf.toFollower(args.Term)
	}
	reply.Term = rf.currentTerm
	if args.Term < rf.currentTerm || args.LastIncludedIndex <= rf.log.Base {
		return
	}
	rf.applyCh <- ApplyMsg{
		SnapshotValid: true,
		Snapshot:      args.Data,
		SnapshotTerm:  args.LastIncludedTerm,
		SnapshotIndex: args.LastIncludedIndex,
	}
}

注意：快照是状态机中的概念，需要在状态机中加载快照，因此要通过applyCh将快照发送给状态机，但是发送后Raft并不立即保存快照，而是等待状态机调用CondInstallSnapshot()，如果从收到InstallSnapshot()后到收到CondInstallSnapshot()前，没有新的日志提交到状态机，则Raft返回True，Raft和状态机保存快照，否则Raft返回False，两者都不保存快照。

如此保证了Raft和状态机保存快照是一个原子操作。当然在InstallSnapshot()将快照发送给状态机后再将快照保存到Raft，令CondInstallSnap()永远返回True，也可以保证原子操作，但是这样做必须等待快照发送给状态机完成，但是rf.applyCh <- ApplyMsg是有可能阻塞的，由于InstallSnapshot()需要持有全局的互斥锁，这可能导致整个节点无法工作。

为什么要保证原子操作？因为负责将commit状态的日志提交到状态机的goroutine不负责快照部分，因此必须是先保存快照，再同步日志。

本系列文章给出的代码为了好读，没有考虑同步问题，正常来讲applyCh <- ApplyMsg这个操作是需要令起一个goroutine去做的。

如何判断InstallSnapshot()到CondInstallSnapshot()之间没有新的日志提交到状态机呢？这里使用commitIndex来判断，当lastIncludeIndex <= commitIndex时，说明这期间原本没有的快照部分的日志补全了，虽然commit状态并不一定是apply状态，但这里以commit为准，更安全。

func (rf *Raft) CondInstallSnapshot(lastIncludedTerm int, lastIncludedIndex int, snapshot []byte) bool {
	if lastIncludedIndex <= rf.commitIndex {
		return false
	}
	if lastIncludedIndex <= rf.log.size()-1 && rf.log.get(lastIncludedIndex).Term == lastIncludedTerm {
		rf.log.Entries = append([]LogEntry(nil), rf.log.Entries[lastIncludedIndex-rf.log.Base:]...)
	} else {
		rf.log.Entries = append([]LogEntry(nil), LogEntry{Term: lastIncludedTerm})
	}
	rf.log.Base = lastIncludedIndex
	rf.snapshot = snapshot
	rf.commitIndex = lastIncludedIndex
	rf.lastApplied = lastIncludedIndex
	rf.saveStateAndSnapshot()
	return true
}

需要注意的是，这里截断rf.log.Entries的方式，如果使用s = s[i:]这样的方式，依然维持对底层数组全部元素的引用，是无法被GC回收的。

还有一点要注意的是，不要忘记在Make()中读取持久化的snapshot，并初始化lastApplied的值。

最后，为了证明我不是在乱写，附上我的测试结果。