转发请注明原创地址http://www.cnblogs.com/dongxiao-yang/p/8029356.html

checkpoint是Flink Fault Tolerance机制的重要构成部分,flink checkpoint的核心类名为org.apache.flink.runtime.checkpoint.CheckpointCoordinator。

定期产生的checkpoint事件

flink的checkpoint是由CheckpointCoordinator内部的一个timer线程池定时产生的,具体代码由ScheduledTrigger这个Runnable类启动。

    private final class ScheduledTrigger implements Runnable {

        @Override
public void run() {
try {
triggerCheckpoint(System.currentTimeMillis(), true);
}
catch (Exception e) {
LOG.error("Exception while triggering checkpoint.", e);
}
}
}

整个triggerCheckpoint方法大致分为三个部分:

1 环境前置检查

    // Sanity check
if (props.externalizeCheckpoint() && targetDirectory == null) {
throw new IllegalStateException("No target directory specified to persist checkpoint to.");
} // make some eager pre-checks
synchronized (lock) {
// abort if the coordinator has been shutdown in the meantime
if (shutdown) {
return new CheckpointTriggerResult(CheckpointDeclineReason.COORDINATOR_SHUTDOWN);
} // Don't allow periodic checkpoint if scheduling has been disabled
if (isPeriodic && !periodicScheduling) {
return new CheckpointTriggerResult(CheckpointDeclineReason.PERIODIC_SCHEDULER_SHUTDOWN);
} // validate whether the checkpoint can be triggered, with respect to the limit of
// concurrent checkpoints, and the minimum time between checkpoints.
// these checks are not relevant for savepoints
if (!props.forceCheckpoint()) {
// sanity check: there should never be more than one trigger request queued
if (triggerRequestQueued) {
LOG.warn("Trying to trigger another checkpoint while one was queued already");
return new CheckpointTriggerResult(CheckpointDeclineReason.ALREADY_QUEUED);
} // if too many checkpoints are currently in progress, we need to mark that a request is queued
if (pendingCheckpoints.size() >= maxConcurrentCheckpointAttempts) {
triggerRequestQueued = true;
if (currentPeriodicTrigger != null) {
currentPeriodicTrigger.cancel(false);
currentPeriodicTrigger = null;
}
return new CheckpointTriggerResult(CheckpointDeclineReason.TOO_MANY_CONCURRENT_CHECKPOINTS);
} // make sure the minimum interval between checkpoints has passed
final long earliestNext = lastCheckpointCompletionNanos + minPauseBetweenCheckpointsNanos;
final long durationTillNextMillis = (earliestNext - System.nanoTime()) / 1_000_000; if (durationTillNextMillis > 0) {
if (currentPeriodicTrigger != null) {
currentPeriodicTrigger.cancel(false);
currentPeriodicTrigger = null;
}
// Reassign the new trigger to the currentPeriodicTrigger
currentPeriodicTrigger = timer.scheduleAtFixedRate(
new ScheduledTrigger(),
durationTillNextMillis, baseInterval, TimeUnit.MILLISECONDS); return new CheckpointTriggerResult(CheckpointDeclineReason.MINIMUM_TIME_BETWEEN_CHECKPOINTS);
}
}
} // check if all tasks that we need to trigger are running.
// if not, abort the checkpoint
Execution[] executions = new Execution[tasksToTrigger.length];
for (int i = 0; i < tasksToTrigger.length; i++) {
Execution ee = tasksToTrigger[i].getCurrentExecutionAttempt();
if (ee != null && ee.getState() == ExecutionState.RUNNING) {
executions[i] = ee;
} else {
LOG.info("Checkpoint triggering task {} is not being executed at the moment. Aborting checkpoint.",
tasksToTrigger[i].getTaskNameWithSubtaskIndex());
return new CheckpointTriggerResult(CheckpointDeclineReason.NOT_ALL_REQUIRED_TASKS_RUNNING);
}
} // next, check if all tasks that need to acknowledge the checkpoint are running.
// if not, abort the checkpoint
Map<ExecutionAttemptID, ExecutionVertex> ackTasks = new HashMap<>(tasksToWaitFor.length); for (ExecutionVertex ev : tasksToWaitFor) {
Execution ee = ev.getCurrentExecutionAttempt();
if (ee != null) {
ackTasks.put(ee.getAttemptId(), ev);
} else {
LOG.info("Checkpoint acknowledging task {} is not being executed at the moment. Aborting checkpoint.",
ev.getTaskNameWithSubtaskIndex());
return new CheckpointTriggerResult(CheckpointDeclineReason.NOT_ALL_REQUIRED_TASKS_RUNNING);
}
}

上面的代码主要在生成一个chepoint之前进行了一些pre-checks,包括checkpoint的targetDirectory、正在进行中的pendingCheckpoint数量上限、前后两次checkpoint间隔是否过小、以及下游与checkpoint相关tasks是否存活等检测,任意一个条件不满足的都不会执行真正的checkpoint动作。

2 生成pendingcheckpoint

        final long checkpointID;
try {
// this must happen outside the coordinator-wide lock, because it communicates
// with external services (in HA mode) and may block for a while.
checkpointID = checkpointIdCounter.getAndIncrement();
}
catch (Throwable t) {
int numUnsuccessful = numUnsuccessfulCheckpointsTriggers.incrementAndGet();
LOG.warn("Failed to trigger checkpoint (" + numUnsuccessful + " consecutive failed attempts so far)", t);
return new CheckpointTriggerResult(CheckpointDeclineReason.EXCEPTION);
} final PendingCheckpoint checkpoint = new PendingCheckpoint(
job,
checkpointID,
timestamp,
ackTasks,
props,
targetDirectory,
executor); if (statsTracker != null) {
PendingCheckpointStats callback = statsTracker.reportPendingCheckpoint(
checkpointID,
timestamp,
props); checkpoint.setStatsCallback(callback);
} // schedule the timer that will clean up the expired checkpoints
final Runnable canceller = new Runnable() {
@Override
public void run() {
synchronized (lock) {
// only do the work if the checkpoint is not discarded anyways
// note that checkpoint completion discards the pending checkpoint object
if (!checkpoint.isDiscarded()) {
LOG.info("Checkpoint " + checkpointID + " expired before completing."); checkpoint.abortExpired();
pendingCheckpoints.remove(checkpointID);
rememberRecentCheckpointId(checkpointID); triggerQueuedRequests();
}
}
}
};

pendingcheckpoint表示一个待处理的检查点,每个pendingcheckpoint标有一个全局唯一的递增checkpointID,并声明了一个canceller用于后续超时情况下的checkpoint清理用于释放资源。

    // re-acquire the coordinator-wide lock
synchronized (lock) {
// since we released the lock in the meantime, we need to re-check
// that the conditions still hold.
if (shutdown) {
return new CheckpointTriggerResult(CheckpointDeclineReason.COORDINATOR_SHUTDOWN);
}
else if (!props.forceCheckpoint()) {
if (triggerRequestQueued) {
LOG.warn("Trying to trigger another checkpoint while one was queued already");
return new CheckpointTriggerResult(CheckpointDeclineReason.ALREADY_QUEUED);
} if (pendingCheckpoints.size() >= maxConcurrentCheckpointAttempts) {
triggerRequestQueued = true;
if (currentPeriodicTrigger != null) {
currentPeriodicTrigger.cancel(false);
currentPeriodicTrigger = null;
}
return new CheckpointTriggerResult(CheckpointDeclineReason.TOO_MANY_CONCURRENT_CHECKPOINTS);
} // make sure the minimum interval between checkpoints has passed
final long earliestNext = lastCheckpointCompletionNanos + minPauseBetweenCheckpointsNanos;
final long durationTillNextMillis = (earliestNext - System.nanoTime()) / 1_000_000; if (durationTillNextMillis > 0) {
if (currentPeriodicTrigger != null) {
currentPeriodicTrigger.cancel(false);
currentPeriodicTrigger = null;
} // Reassign the new trigger to the currentPeriodicTrigger
currentPeriodicTrigger = timer.scheduleAtFixedRate(
new ScheduledTrigger(),
durationTillNextMillis, baseInterval, TimeUnit.MILLISECONDS); return new CheckpointTriggerResult(CheckpointDeclineReason.MINIMUM_TIME_BETWEEN_CHECKPOINTS);
}
} LOG.info("Triggering checkpoint " + checkpointID + " @ " + timestamp); pendingCheckpoints.put(checkpointID, checkpoint); ScheduledFuture<?> cancellerHandle = timer.schedule(
canceller,
checkpointTimeout, TimeUnit.MILLISECONDS); if (!checkpoint.setCancellerHandle(cancellerHandle)) {
// checkpoint is already disposed!
cancellerHandle.cancel(false);
}

pendingcheckpoint在正式执行前还会再执行一遍前置检查,主要等待完成的检查点数量是否过多以及前后两次完成的检查点间隔是否过短等问题,这些检查都通过后,会把之前定义好的cancller注册到timer线程池,如果等待时间过长会主动回收checkpoint的资源。

3 启动checkpoint执行

发送这个checkpoint的checkpointID和timestamp到各个对应的executor,也就是给各个TaskManger发一个TriggerCheckpoint类型的消息。

                CheckpointOptions checkpointOptions;
if (!props.isSavepoint()) {
checkpointOptions = CheckpointOptions.forCheckpoint();
} else {
checkpointOptions = CheckpointOptions.forSavepoint(targetDirectory);
} // send the messages to the tasks that trigger their checkpoint
for (Execution execution: executions) {
execution.triggerCheckpoint(checkpointID, timestamp, checkpointOptions);
} numUnsuccessfulCheckpointsTriggers.set(0);
return new CheckpointTriggerResult(checkpoint);
    public void triggerCheckpoint(long checkpointId, long timestamp, CheckpointOptions checkpointOptions) {
final SimpleSlot slot = assignedResource; if (slot != null) {
final TaskManagerGateway taskManagerGateway = slot.getTaskManagerGateway(); taskManagerGateway.triggerCheckpoint(attemptId, getVertex().getJobId(), checkpointId, timestamp, checkpointOptions);
} else {
LOG.debug("The execution has no slot assigned. This indicates that the execution is " +
"no longer running.");
}
}
    @Override
public void triggerCheckpoint(
ExecutionAttemptID executionAttemptID,
JobID jobId,
long checkpointId,
long timestamp,
CheckpointOptions checkpointOptions) { Preconditions.checkNotNull(executionAttemptID);
Preconditions.checkNotNull(jobId); actorGateway.tell(new TriggerCheckpoint(jobId, executionAttemptID, checkpointId, timestamp, checkpointOptions));
}

其中,for (Execution execution: executions) 这里面的executions里面是所有的输入节点,也就是flink source节点,所以checkpoint这些barrier 时间首先从jobmanager发送给了所有的source task

JobCheckpointingSettings settings = new JobCheckpointingSettings(
triggerVertices,
ackVertices,
commitVertices,
new CheckpointCoordinatorConfiguration(
interval,
cfg.getCheckpointTimeout(),
cfg.getMinPauseBetweenCheckpoints(),
cfg.getMaxConcurrentCheckpoints(),
retentionAfterTermination,
isExactlyOnce),
serializedStateBackend,
serializedHooks); jobGraph for (JobVertex vertex : jobVertices.values()) {
if (vertex.isInputVertex()) {
triggerVertices.add(vertex.getID());
}
commitVertices.add(vertex.getID());
ackVertices.add(vertex.getID());
}

flink checkpoint 源码分析 (一)的更多相关文章

  1. flink checkpoint 源码分析 (二)

    转发请注明原创地址http://www.cnblogs.com/dongxiao-yang/p/8260370.html flink checkpoint 源码分析 (一)一文主要讲述了在JobMan ...

  2. Flink源码阅读(二)——checkpoint源码分析

    前言 在Flink原理——容错机制一文中,已对checkpoint的机制有了较为基础的介绍,本文着重从源码方面去分析checkpoint的过程.当然本文只是分析做checkpoint的调度过程,只是尽 ...

  3. flink-connector-kafka consumer checkpoint源码分析

    转发请注明原创地址:http://www.cnblogs.com/dongxiao-yang/p/7700600.html <flink-connector-kafka consumer的top ...

  4. flink1.7 checkpoint源码分析

    初始化state类 //org.apache.flink.streaming.runtime.tasks.StreamTask#initializeState initializeState(); p ...

  5. Flink的Job启动TaskManager端(源码分析)

    前面说到了  Flink的JobManager启动(源码分析)  启动了TaskManager 然后  Flink的Job启动JobManager端(源码分析)  说到JobManager会将转化得到 ...

  6. [源码分析] 从源码入手看 Flink Watermark 之传播过程

    [源码分析] 从源码入手看 Flink Watermark 之传播过程 0x00 摘要 本文将通过源码分析,带领大家熟悉Flink Watermark 之传播过程,顺便也可以对Flink整体逻辑有一个 ...

  7. [源码分析] 从实例和源码入手看 Flink 之广播 Broadcast

    [源码分析] 从实例和源码入手看 Flink 之广播 Broadcast 0x00 摘要 本文将通过源码分析和实例讲解,带领大家熟悉Flink的广播变量机制. 0x01 业务需求 1. 场景需求 对黑 ...

  8. Flink源码分析 - 源码构建

    原文地址:https://mp.weixin.qq.com/s?__biz=MzU2Njg5Nzk0NQ==&mid=2247483692&idx=1&sn=18cddc1ee ...

  9. 从flink-example分析flink组件(3)WordCount 流式实战及源码分析

    前面介绍了批量处理的WorkCount是如何执行的 <从flink-example分析flink组件(1)WordCount batch实战及源码分析> <从flink-exampl ...

随机推荐

  1. python aiohttp sancio 框架性能测试

    开头先啰嗦两句: 由于本人有开发一个博客的打算,所以近期开始选型python的web框架重头学习,选了两款非常火的 aio web框架 aiohttp 和 sancio 进行性能测试以及开发喜好的调研 ...

  2. [ZOJ3522]Hide and seek

    题意:给一棵带边权的树,多次询问$(x,y,l)$表示如果加一条连接$x$和$y$的长为$l$的边,所有点到$x$和到$y$的最短路减少了多少 先把题目中的图放上来(雾 考虑用lct维护,先把路径提出 ...

  3. 【指数型母函数】hdu1521 排列组合

    #include<cstdio> #include<cstring> using namespace std; int n,m,jiecheng[11]; double a[1 ...

  4. [HDU6271]Master of Connected Component

    [HDU6271]Master of Connected Component 题目大意: 给出两棵\(n(n\le10000)\)个结点的以\(1\)为根的树\(T_a,T_b\),和一个拥有\(m( ...

  5. Scala零基础教学【1-20】

    基于王家林老师的Spark教程——共计111讲的<Scala零基础教学> 计划在9月24日内完成(中秋节假期之内) 目前18号初步学习到25讲,平均每天大约完成15讲,望各位监督. 初步计 ...

  6. linux+iptables搭建网关服务器

    公司购买的一批云服务器只带内网,配置了一个负载均衡器(lb),这批服务器通过lb可以对外提供服务,但是这批服务器不能主动连接外网,例如使用wget下载文件,或者curl访问ttlsa.com站点. 额 ...

  7. 通过Roslyn动态生成程序集

    之前写过篇文章如何通过Roslyn构建自己的C#脚本,今天本来打算测试一下这部分API在新的版本中的变化,结果发现它的脚本引擎找不到了,翻了一下官方文档,貌似说暂时性的移除了.便看了一下它动态生成程序 ...

  8. Layui 使用问题汇总

    1. Layui数据表格中checkbox位置不居中 使用方法渲染的方式生成数据表格,添加了checkbox,但发现checkbox位置不居中,如下图所示 解决办法 通过layui官方社区,找到如下代 ...

  9. Devexpress汉化修改 已经汉化过后生成的*.resources.dll文件 z

    Devexpress 是很好的第三方控件.但是需要汉化,在从网上下载的zh-CN文件夹里都是*.resources.dll文件,由于汉化的工作量很大,难免有些地方汉化 错误或者不合适(如下图).花了些 ...

  10. 关于group by后为每个分组编号