一、从何说起

State要能发挥作用，就需要持久化到可靠存储中，flink中持久化的动作就是checkpointing，那么从TM中执行的Task的基类StreamTask的checkpoint逻辑说起。

1.streamTask

 StreamTask

 protected OperatorChain<OUT, OP> operatorChain;

 CheckpointStreamFactory createCheckpointStreamFactory(StreamOperator<?> operator)

 <K> AbstractKeyedStateBackend<K> createKeyedStateBackend(

       TypeSerializer<K> keySerializer,

       int numberOfKeyGroups,

       KeyGroupRange keyGroupRange)

 OperatorStateBackend createOperatorStateBackend(

       StreamOperator<?> op, Collection<OperatorStateHandle> restoreStateHandles)

 CheckpointStreamFactory createSavepointStreamFactory(StreamOperator<?> operator, String targetLocation)

 StateBackend createStateBackend()

 boolean triggerCheckpoint(CheckpointMetaData checkpointMetaData, CheckpointOptions checkpointOptions)

 void triggerCheckpointOnBarrier(

       CheckpointMetaData checkpointMetaData,

       CheckpointOptions checkpointOptions,

       CheckpointMetrics checkpointMetrics)

 boolean performCheckpoint(

       CheckpointMetaData checkpointMetaData,

       CheckpointOptions checkpointOptions,

       CheckpointMetrics checkpointMetrics)

 void checkpointState(

       CheckpointMetaData checkpointMetaData,

       CheckpointOptions checkpointOptions,

       CheckpointMetrics checkpointMetrics)

triggerCheckpoint->performCheckpoint->checkpointState，最终来到了checkpointingOperation。

2.checkpointingOperation

 CheckpointingOperation

 void executeCheckpointing(){

 ……

 for (StreamOperator<?> op : allOperators) {

    checkpointStreamOperator(op);

 }

 ……

 }

 void checkpointStreamOperator(StreamOperator<?> op)

        ……

        op.snapshotState(

        checkpointMetaData.getCheckpointId(),

        checkpointMetaData.getTimestamp(),

        checkpointOptions)

        ……

这个类中，直接对streamTask中传入的每一个operator调用其snapshotState方法。

那就再看Operator的基类。

3.StreamOperator

 StreamOperator

 OperatorSnapshotResult snapshotState(

    long checkpointId,

    long timestamp,

    CheckpointOptions checkpointOptions)

 void initializeState(OperatorSubtaskState stateHandles)

 void notifyOfCompletedCheckpoint(long checkpointId)

StreamOperator是一个接口，其中包含了这三个接口，意味着继承它的Operator都必须实现这几个方法。

4.AbstractStreamOperator

 AbstractStreamOperator

 Final OperatorSnapshotResult snapshotState(long checkpointId, long timestamp, CheckpointOptions checkpointOptions)

        ……

       snapshotState(snapshotContext);

        ……

        if (null != operatorStateBackend) {

            snapshotInProgress.setOperatorStateManagedFuture(

            operatorStateBackend.snapshot(checkpointId, timestamp, factory,              checkpointOptions));

        }

        if (null != keyedStateBackend) {

            snapshotInProgress.setKeyedStateManagedFuture(

            keyedStateBackend.snapshot(checkpointId, timestamp, factory, checkpointOptions));

        }

       ……

 void notifyOfCompletedCheckpoint(long checkpointId)

       if (keyedStateBackend != null) {

            keyedStateBackend.notifyCheckpointComplete(checkpointId);

       }

 void snapshotState(StateSnapshotContext context)

 void initializeState(StateInitializationContext context)

AbstractStreamOperator是对StreamOperator的基础实现，在它的snapshotState方法中，分别调用了OperatorStateBackend和KeyedStateBackend的snapshot方法。
特别注意，在调用这两个方法之前的snapshotState(snapshotContext)这个调用，它一方面实现了Raw的State的snapshot，一方面也实现了用户自定义的函数的State的更新。

再说一下，后面的两个函数，snapshotState和initializeState，他们的形参都是一个context，是提供给用户来重新实现用户自己的state的checkpoints的。

这个类有一个很重要的子类，AbstractUdfStreamOperator，很多Operator都从这个类开始继承。

5.AbstractUdfStreamOperator

AbstractUdfStreamOperator

void initializeState(StateInitializationContext context) throws Exception {

   super.initializeState(context);

   StreamingFunctionUtils.restoreFunctionState(context, userFunction);

void snapshotState(StateSnapshotContext context) throws Exception {

   super.snapshotState(context);

   StreamingFunctionUtils.snapshotFunctionState(context, getOperatorStateBackend(), userFunction)

这里可以很明显的看到，在实现父类的方法的过程中，它添加了东西，就是userFunction的restore和snapshot。

看看上面这些子类，真正会被实例化的Operator。

6.StreamingFunctionUtils

 StreamingFunctionUtils

 void snapshotFunctionState(

       StateSnapshotContext context,

       OperatorStateBackend backend,

       Function userFunction {

 ……

 while (true) {

       if (trySnapshotFunctionState(context, backend, userFunction)) {

          break;

       }

       // inspect if the user function is wrapped, then unwrap and try again if we can snapshot the inner function

       if (userFunction instanceof WrappingFunction) {

          userFunction = ((WrappingFunction<?>) userFunction).getWrappedFunction();

       } else {

          break;

       }

    }

 }

 boolean trySnapshotFunctionState(

       StateSnapshotContext context,

       OperatorStateBackend backend,

       Function userFunction) throws Exception {

    if (userFunction instanceof CheckpointedFunction) {

       ……

       return true;

    }

    if (userFunction instanceof ListCheckpointed) {

       ……

       return true;

    }

    return false;

 }

从上面可以看到，这个Util的作用，就用就是把用户实现的CheckpointedFunction和ListCheckpointed来做restore和snapshot。

二、工厂

上面从task和operator的层面说明了state保存的过程，那么保存到哪里？就由下面的三个工厂类来提供。

7.State backend

	MemoryStateBackend	FsStateBackend	RocksDBStateBackend
CheckpointStream	MemCheckpointStreamFactory	FsCheckpointStreamFactory	FsCheckpointStreamFactory
SavepointStream	MemCheckpointStreamFactory	FsSavepointStreamFactory	FsSavepointStreamFactory
KeyedStateBackend	HeapKeyedStateBackend	HeapKeyedStateBackend	RocksDBKeyedStateBackend
OperatorStateBackend	DefaultOperatorStateBackend	DefaultOperatorStateBackend	DefaultOperatorStateBackend

RocksDBStateBackend的构造函数可以传入一个AbstractStateBackend，否则默认采用FsStateBackend

可以看到，从OperatorState的角度来讲，目前Flink只有一个实现，即DefaultOperatorStateBackend，它将List风格的State保存在内存中。

从KeyedState的角度来讲，目前有两种实现,HeapKeyedStateBackend将state保存在内存中，而RocksDbKeyedStateBackend将State保存在TM本地的RocksDB中。相对而言，前者在内存中，速度会快，效率高，但一方面会限制state的大小，另一方面也会造成JVM自己的内存问题；后者在本地文件中，就会涉及序列化和反序列化，效率不及前者，但可以保存的state的大小会很大。

从checkpoint和savepoint的角度来看，Memory工厂方法都保存在内存中，显然不能在生产环境使用，而Fs工厂方法和RocksDb工厂方法，则统一都放在文件系统中，比如HDFS。

三、房子

具体存储State的目前有三种，以DefaultOperatorStateBackend作为OperatorState的例子，以及HeapKeyedStateBackend作为KeyedState的例子来看。

8.DefaultOperatorStateBackend

DefaultOperatorStateBackend

Map<String, PartitionableListState<?>> registeredStates;

RunnableFuture<OperatorStateHandle> snapshot(

      final long checkpointId,

      final long timestamp,

      final CheckpointStreamFactory streamFactory,

      final CheckpointOptions checkpointOptions)

……

if (registeredStates.isEmpty()) {

   return DoneFuture.nullValue();

}

……

for (Map.Entry<String, PartitionableListState<?>> entry : this.registeredStates.entrySet())

……

ListState<S> getListState(ListStateDescriptor<S> stateDescriptor)

这里截取了三个方法，其中registeredStates可以看到，其还是以map的方式在存储，snapshotState方法具体实现了刚才在AbstractStreamOperator中调用snapshotState的方法，后面的getListState提供了在用户编程中提供ListState实例的接口。

 PartitionableListState<S>

 /**

  * The internal list the holds the elements of the state

  */

 private final ArrayList<S> internalList;

由此可以看出 OperatorState都保存在内存中，本质上还是一个ArrayList。

9.HeapKeyedStateBackend

 * @param <K> The key by which state is keyed.

 HeapKeyedStateBackend<K>

 /**

  * Map of state tables that stores all state of key/value states. We store it centrally so

  * that we can easily checkpoint/restore it.

  *

  * <p>The actual parameters of StateTable are {@code StateTable<NamespaceT, Map<KeyT, StateT>>}

  * but we can't put them here because different key/value states with different types and

  * namespace types share this central list of tables.

  */

 private final HashMap<String, StateTable<K, ?, ?>> stateTables = new HashMap<>();

 <N, V> InternalValueState<N, V> createValueState(

       TypeSerializer<N> namespaceSerializer,

       ValueStateDescriptor<V> stateDesc){

    StateTable<K, N, V> stateTable = tryRegisterStateTable(namespaceSerializer, stateDesc);

    return new HeapValueState<>(stateDesc, stateTable, keySerializer, namespaceSerializer);

 }

 <N, T> InternalListState<N, T> createListState(

       TypeSerializer<N> namespaceSerializer,

       ListStateDescriptor<T> stateDesc)

       new HeapListState<>

 <N, T> InternalReducingState<N, T> createReducingState(

       TypeSerializer<N> namespaceSerializer,

       ReducingStateDescriptor<T> stateDesc)

       new HeapReducingState<>

 <N, T, ACC, R> InternalAggregatingState<N, T, R> createAggregatingState(

       TypeSerializer<N> namespaceSerializer,

       AggregatingStateDescriptor<T, ACC, R> stateDesc)

       new HeapAggregatingState<>

 <N, T, ACC> InternalFoldingState<N, T, ACC> createFoldingState(

       TypeSerializer<N> namespaceSerializer,

       FoldingStateDescriptor<T, ACC> stateDesc)

       new HeapFoldingState<>

 <N, UK, UV> InternalMapState<N, UK, UV> createMapState(TypeSerializer<N> namespaceSerializer,

       MapStateDescriptor<UK, UV> stateDesc)

       new HeapMapState<>

 RunnableFuture<KeyedStateHandle> snapshot(

       final long checkpointId,

       final long timestamp,

       final CheckpointStreamFactory streamFactory,

       CheckpointOptions checkpointOptions)

 ……

 if (!hasRegisteredState()) {

    return DoneFuture.nullValue();

 }

 ……

这里也类似，几个create方法也都提供了在用户编程中可以调用的接口，分别返回对应类型的State。snapshotState也是对AbstractStreamOperator中调用的具体实现。

四、通道

所谓通道，也就是通过用户编程，如何使得用户使用的State和上面的DefaultOperatorStateBackend和HeapKeyedStateBackend发生关联。用户编程中首先面对的就是StreamingRuntimeContext这个类。

10.StreamingRuntimeContext

 StreamingRuntimeContext

 public <T> ValueState<T> getState(ValueStateDescriptor<T> stateProperties) {

    KeyedStateStore keyedStateStore = checkPreconditionsAndGetKeyedStateStore(stateProperties);

    stateProperties.initializeSerializerUnlessSet(getExecutionConfig());

    return keyedStateStore.getState(stateProperties);

 }

这里只截取了getState的方法，其他类型的State的方法类似，这里也很简单，就是看看是否能拿到KeyedStateStore，然后用其去生成State。

11.PerWindowStateStore

 PerWindowStateStore

 @Override

 public <T> ListState<T> getListState(ListStateDescriptor<T> stateProperties) {

    try {

       return WindowOperator.this.getPartitionedState(window, windowSerializer, stateProperties);

    } catch (Exception e) {

       throw new RuntimeException("Could not retrieve state", e);

    }

 }

PerWindowStateStore是KeyedStateStore的一个子类，具体实现了如何去拿。其中的getPartitionedState最终还是调到了AbstractStreamOperator。

12.AbstractStreamOperator

 AbstractStreamOperator

 protected <S extends State, N> S getPartitionedState(

       N namespace,

       TypeSerializer<N> namespaceSerializer,

       StateDescriptor<S, ?> stateDescriptor) throws Exception {

    /*

     TODO: NOTE: This method does a lot of work caching / retrieving states just to update the namespace.

     This method should be removed for the sake of namespaces being lazily fetched from the keyed

     state backend, or being set on the state directly.

     */

    if (keyedStateStore != null) {

       return keyedStateBackend.getPartitionedState(namespace, namespaceSerializer, stateDescriptor);

    } else {

       throw new RuntimeException("Cannot create partitioned state. The keyed state " +

          "backend has not been set. This indicates that the operator is not " +

          "partitioned/keyed.");

    }

 }

这里也就是一个二传手的作用，还是调回了keyedStateBackend的方法。

13.AbstractKeyedStateBackend

 AbstractKeyedStateBackend

 <N, S extends State> S getPartitionedState(

       final N namespace,

       final TypeSerializer<N> namespaceSerializer,

       final StateDescriptor<S, ?> stateDescriptor)

 <N, S extends State, V> S getOrCreateKeyedState(

       final TypeSerializer<N> namespaceSerializer,

       StateDescriptor<S, V> stateDescriptor)

 // create a new blank key/value state

 S state = stateDescriptor.bind(new StateBinder() {

    @Override

    public <T> ValueState<T> createValueState(ValueStateDescriptor<T> stateDesc) throws Exception {

       return AbstractKeyedStateBackend.this.createValueState(namespaceSerializer, stateDesc);

    }

    @Override

    public <T> ListState<T> createListState(ListStateDescriptor<T> stateDesc) throws Exception {

       return AbstractKeyedStateBackend.this.createListState(namespaceSerializer, stateDesc);

    }

    @Override

    public <T> ReducingState<T> createReducingState(ReducingStateDescriptor<T> stateDesc) throws Exception {

       return AbstractKeyedStateBackend.this.createReducingState(namespaceSerializer, stateDesc);

    }

    @Override

    public <T, ACC, R> AggregatingState<T, R> createAggregatingState(

          AggregatingStateDescriptor<T, ACC, R> stateDesc) throws Exception {

       return AbstractKeyedStateBackend.this.createAggregatingState(namespaceSerializer, stateDesc);

    }

    @Override

    public <T, ACC> FoldingState<T, ACC> createFoldingState(FoldingStateDescriptor<T, ACC> stateDesc) throws Exception {

       return AbstractKeyedStateBackend.this.createFoldingState(namespaceSerializer, stateDesc);

    }

    @Override

    public <UK, UV> MapState<UK, UV> createMapState(MapStateDescriptor<UK, UV> stateDesc) throws Exception {

       return AbstractKeyedStateBackend.this.createMapState(namespaceSerializer, stateDesc);

    }

 });

可以看到这里才是真正实现State生成的逻辑，在stateDescriptor.bind这里实现了一个向上绑定，还是比较微妙的。其实在真正的运行中，这里的this就会变成HeapKeyedStateBacked或者RocksDbKeyedStateBackend，它们才真正负责最后的生成。

14.StateInitializationContextImpl

 StateInitializationContextImpl

 public OperatorStateStore getOperatorStateStore() {

    return operatorStateStore;

 }

这个是OperatorState的部分，最终也会调到DefaultOperatorStateBackend的getListState方法，创建state，并注册state。

五、状态

说完了用处，存储和发生关联，这里才是State本尊的介绍。先来看看如果要实现OperatorState怎么弄。

15.CheckpointedFunction and ListCheckpointed

 interface CheckpointedFunction {

    void snapshotState(FunctionSnapshotContext context) throws Exception;

    void initializeState(FunctionInitializationContext context) throws Exception;

 }

 public interface ListCheckpointed<T extends Serializable> {

       List<T> snapshotState(long checkpointId, long timestamp) throws Exception;

       void restoreState(List<T> state) throws Exception;

 }

 public class BufferingSink

         implements SinkFunction<Tuple2<String, Integer>>,

         CheckpointedFunction {

     private final int threshold;

 //pay attention here, the definition of the state

     private transient ListState<Tuple2<String, Integer>> checkpointedState;

     private List<Tuple2<String, Integer>> bufferedElements;

     public BufferingSink(int threshold) {

         this.threshold = threshold;

         this.bufferedElements = new ArrayList<>();

     }

     @Override

     public void invoke(Tuple2<String, Integer> value) throws Exception {

         bufferedElements.add(value);

         if (bufferedElements.size() == threshold) {

             for (Tuple2<String, Integer> element: bufferedElements) {

                 // send it to the sink

             }

             bufferedElements.clear();

         }

     }

     @Override

     public void snapshotState(FunctionSnapshotContext context) throws Exception {

         checkpointedState.clear();

         for (Tuple2<String, Integer> element : bufferedElements) {

             checkpointedState.add(element);

         }

     }

     @Override

     public void initializeState(FunctionInitializationContext context) throws Exception {

         //new a descriptor

 ListStateDescriptor<Tuple2<String, Integer>> descriptor =

                 new ListStateDescriptor<>(

                         "buffered-elements",

                         TypeInformation.of(new TypeHint<Tuple2<String, Integer>>() {}));

         //get the state by OperatorStateStor

         checkpointedState = context.getOperatorStateStore().getListState(descriptor);

         //unlike keyed state, flink will do the restore, user should take care of the restore of the operator state

         if (context.isRestored()) {

             for (Tuple2<String, Integer> element : checkpointedState.get()) {

                 bufferedElements.add(element);

             }

         }

     }

 }

创建一个ListStateDescriptor，然后从context中获取OperatorStateStore，也就是刚才的DefaultOperatorStateStore来具体生成状态。

这里关键的一点在于initializeState方法中的isRestored的判断，需要用户自己来决定如何恢复State。

16.RichFunction

获取任何的KeyedState都必须在RichFunction的子类中才能进行。

 public class CountWindowAverage extends RichFlatMapFunction<Tuple2<Long, Long>, Tuple2<Long, Long>> {

     /**

      * The ValueState handle. The first field is the count, the second field a running sum.

      */

     private transient ValueState<Tuple2<Long, Long>> sum;//the Keyed State definition

     @Override

     public void flatMap(Tuple2<Long, Long> input, Collector<Tuple2<Long, Long>> out) throws Exception {

         // access the state value

         Tuple2<Long, Long> currentSum = sum.value();

         // update the count

         currentSum.f0 += 1;

         // add the second field of the input value

         currentSum.f1 += input.f1;

         // make sure to update the state

         sum.update(currentSum);

         // if the count reaches 2, emit the average and clear the state

         if (currentSum.f0 >= 2) {

             out.collect(new Tuple2<>(input.f0, currentSum.f1 / currentSum.f0));

             sum.clear();

         }

     }

     @Override

     public void open(Configuration config) {

         //new a descriptor according to the Keyed State

         ValueStateDescriptor<Tuple2<Long, Long>> descriptor =

                 new ValueStateDescriptor<>(

                         "average", // the state name

                         TypeInformation.of(new TypeHint<Tuple2<Long, Long>>() {}), // type information

                         Tuple2.of(0L, 0L)); // default value of the state, if nothing was set

         //using the context to get the Keyed State

         sum = getRuntimeContext().getState(descriptor);

     }

 }

 // this can be used in a streaming program like this (assuming we have a StreamExecutionEnvironment env)

 env.fromElements(Tuple2.of(1L, 3L), Tuple2.of(1L, 5L), Tuple2.of(1L, 7L), Tuple2.of(1L, 4L), Tuple2.of(1L, 2L))

         .keyBy(0)

         .flatMap(new CountWindowAverage())

         .print();

 // the printed output will be (1,4) and (1,5)

这里的Open方法也类似，都是定义一个descriptor，然后直接在context上获取对应的State。

17.State type

	Managed State	Raw State
Keyed State	RichFunction	1,2
OperatorState	CheckpointedFunction
OperatorState	ListCheckpointed

1. AbstractStreamOperator.initializeState(StateInitializationContext context)

2. AbstractStreamOperator.snapshotState(StateSnapshotContext context)

Keyed State:

ValueState<T>:保持一个可以更新和获取的值（每个Key一个value），可以用来update(T)更新，用来T value()获取。

ListState<T>: 保持一个值的列表，用add(T) 或者 addAll(List<T>)来添加，用Iterable<T> get()来获取。

ReducingState<T>: 保持一个值，这个值是状态的很多值的聚合结果，接口和ListState类似，但是可以用相应的ReduceFunction来聚合。

AggregatingState<IN, OUT>:保持很多值的聚合结果的单一值，与ReducingState相比，不同点在于聚合类型可以和元素类型不同，提供AggregateFunction来实现聚合。

FoldingState<T, ACC>: 与AggregatingState类似，除了使用FoldFunction进行聚合。

MapState<UK, UV>: 保持一组映射，可以将kv放进这个状态，使用put(UK, UV) or putAll(Map<UK, UV>)添加，或者使用get(UK)获取。

18.FlinkKafkaConsumerBase

 FlinkKafkaConsumerBase

 final void initializeState(FunctionInitializationContext context) throws Exception {

    OperatorStateStore stateStore = context.getOperatorStateStore();

    ListState<Tuple2<KafkaTopicPartition, Long>> oldRoundRobinListState =

     stateStore.getSerializableListState(DefaultOperatorStateBackend.DEFAULT_OPERATOR_STATE_NAME);

    this.unionOffsetStates = stateStore.getUnionListState(new ListStateDescriptor<>(

          OFFSETS_STATE_NAME,

          TypeInformation.of(new TypeHint<Tuple2<KafkaTopicPartition, Long>>() {})));

 ……

 final void snapshotState(FunctionSnapshotContext context){

       ……

       unionOffsetStates.add(Tuple2.of(subscribedPartition.getKey(), subscribedPartition.getValue()));

       ……

作为source和operator state的示例。

19.ElasticsearchSinkBase

 abstract class ElasticsearchSinkBase

 @Override

 public void initializeState(FunctionInitializationContext context) throws Exception {

    // no initialization needed

 }

 @Override

 public void snapshotState(FunctionSnapshotContext context) throws Exception {

    checkErrorAndRethrow();

    if (flushOnCheckpoint) {

       do {

          bulkProcessor.flush();

          checkErrorAndRethrow();

       } while (numPendingRequests.get() != 0);

    }

 }

In all the subclass of this, no one override these two method.

作为sink和operatorstate的实例。

六、恢复

20.Restore

20.1 Introduction

无状态的重分布，直接数据重分布就可。有了状态，就需要先把状态存下来，然后再拆分，以一定的策略来重分布。

20.2 OperatorState

目前flink官方只实现了如下的重分布方案。

RoundRobinOperatorStateRepartitioner

20.3 KeyedState

20.3.1 key distribution

hash(key) mod parallelism

对keyedState而言，只是跟随key的分布即可。但是为了提高效率，引入了KeyGroup的概念。

20.3.2 KeyGroup

20.3.2.1 Introduce of KeyGroup

Without KeyGroup, the keys in the subtask are wrote sequentially, which is not easy to rescale on parallelism adjust. KeyGroup may have a range of keys, and can be assigned to subtask. Then when checkpointing, keys within the KeyGroup will be wrote together, when rescaling, KeyState of the keys within the same KeyGroup will be read sequeatially. The number of KeyGroup is the upper limit for parallelism, and the number of KeyGroup must be determined before the job is started and cannot be changed after the fact.

20.3.2.2 Determine of KeyGroup

setMaxParallelism，the lower limit is 0<, and the upper limit is <=32768.

KeyGroup的数量和maxParallelism的值是一致的。

七、其他

21.Misc

1.能否在非keyby的语句后面直接接一个RichFunction来使用KeyedState？

在构造StreamGraph的过程中，会判断当前的transform是否有keySelector，如果有，就会在streamNode上设置keySerializer。

然后在Operator的初始化过程中，会判断是否有KeySerializer，如果有，才会生成KeyedStateBackend。

后续利用KeyedstateBackend来生成相应的KeyedState。

如果没有keyby,直接实现一个RichMapFunction,则可以判断出没有KeyedStateBackend，在运行时会抛出异常。

2.究竟KeyedState中的ListState和OperatorState中的ListState是不是一回事？

首先来看ListState是个啥

public interface ListState<T> extends MergingState<T, Iterable<T>> {}

显然它只是一个空接口，用命名的方式来增加一种约束说明。下面是它的继承图。

可以看到最初的基类以及中间的父类，分布都通过命名的方式来增加约束，其中State只定义了clear方法，AppendingState定义了get和add方法，MergingState的意义和ListState的类似。

然后我们看DefaultOperatorStateBackend中定义了生成state的接口，

 <S> ListState<S> getListState(ListStateDescriptor<S> stateDescriptor)

的确，它返回的是一个ListState，但别忘了，这只是一个接口，实际返回是什么了？是PartitionableListState<S>，那就来看看他的继承关系：

可以看到，他实现了ListState这个接口，具体的代码也比较简单，内部以一个ArrayList来存储泛型S类型的State。

好，回过头来，我们看看KeyedState的逻辑，看看最外面的接口KeyedStateStore的声明方式：

 @PublicEvolving

 public interface KeyedStateStore {

 @PublicEvolving

 <T> ListState<T> getListState(ListStateDescriptor<T> stateProperties);

看到这里，我们看到，声明的出参和OperatorState的是一致，可是我们也知道这个只是个空接口，实际如何了？

还得回到HeapKeyedStateBackend来看下，

@Override

public <N, T> InternalListState<N, T> createListState(

      TypeSerializer<N> namespaceSerializer,

      ListStateDescriptor<T> stateDesc) throws Exception {

……

   return new HeapListState<>(stateDesc, stateTable, keySerializer, namespaceSerializer);

}

中间部分我们都略去，看到这里其实变了，实际函数的出参是InternalListState，可以理解它是ListState的一个子类，但最终返回的是一个HeapListState，同样，来看看它的继承图：

从这个图上也能看到，HeapListState实现了InternalListState进而间接实现了ListState，但其实这两个接口都是空接口，都只是一种声明，没有任何的动作或者方法包含在里面。

所以，回到问题上，KeyedState的ListState和OperatorState的ListState是一回事吗？

还是不好回答，从语法上来讲，的确是一回事，因为就是同一个类型啊；可是在实际运行当中，前面也看到了，还是有很大不同的两个类。

那肯定又有人问了，PartitionableListState和HeapListState有什么区别？如果直接回答，一个是用在OperatorState中的，一个是用在KeyedState中，估计你肯定不满意。孔子曰，神经病。

Flink源码解读之状态管理的更多相关文章

14：Spark Streaming源码解读之State管理之updateStateByKey和mapWithState解密
首先简单解释一下)) //要使用updateStateByKey方法,必须设置Checkpoint. ssc.checkpoint("/checkpoint/") val sock ...
Spark Streaming源码解读之State管理之UpdataStateByKey和MapWithState解密
本期内容 : UpdateStateByKey解密 MapWithState解密 Spark Streaming是实现State状态管理因素: 01. Spark Streaming是按照整个Bach ...
apache flink源码挖坑 (未完待续)
Apache Flink 源码解读(一) By yyz940922原创项目模块 (除去.git, .github, .idea, docs等): flink-annotations: flink ...
Flink 源码解析 —— 深度解析 Flink 是如何管理好内存的？
前言如今,许多用于分析大型数据集的开源系统都是用 Java 或者是基于 JVM 的编程语言实现的.最着名的例子是 Apache Hadoop,还有较新的框架,如 Apache Spark.Apach ...
Flink 源码解析 —— 源码编译运行
更新一篇知识星球里面的源码分析文章,去年写的,周末自己录了个视频,大家看下效果好吗?如果好的话,后面补录发在知识星球里面的其他源码解析文章. 前言之前自己本地 clone 了 Flink 的源码,编 ...
Flink 源码解析 —— 如何获取 ExecutionGraph ？
https://t.zsxq.com/UnA2jIi 博客 1.Flink 从0到1学习 -- Apache Flink 介绍 2.Flink 从0到1学习 -- Mac 上搭建 Flink 1.6. ...
Flink 源码解析 —— 如何获取 JobGraph？
JobGraph https://t.zsxq.com/naaMf6y 博客 1.Flink 从0到1学习 -- Apache Flink 介绍 2.Flink 从0到1学习 -- Mac 上搭建 F ...
Flink 源码解析 —— Flink JobManager 有什么作用？
JobManager 的作用 https://t.zsxq.com/2VRrbuf 博客 1.Flink 从0到1学习 -- Apache Flink 介绍 2.Flink 从0到1学习 -- Mac ...
Flink 源码解析 —— JobManager 处理 SubmitJob 的过程
JobManager 处理 SubmitJob https://t.zsxq.com/3JQJMzZ 博客 1.Flink 从0到1学习 -- Apache Flink 介绍 2.Flink 从0到1 ...

随机推荐

Percona-Tookit工具包之pt-ioprofile
Preface As a matter of fact,disk IO is the most important factor which tremendously influenc ...
deepin15.7下使用apt安装mysql5.7不显示root密码设置的解决方法
在安装MySQL的过程中,并没有要求设置root账户密码的步骤,导致很多人无法使用root账户登录这个问题早已有解决方案,笔者在deepin15.7下安装也遇到同样问题,只是做一个简单的记录解决思 ...
loushang框架的开发中关于BSP的使用，将写好的功能模块部署到主页界面结构上
前言: 当我们已经开发好相应的模块或者功能的时候,需要将这个功能部署在index主页上作为可点击直接使用的模块,而不是每次需要去浏览对应的url地址. 这时候就需要运用到L5的BSP. 作为刚刚入门l ...
jQuery的封装
封装,最简单的效果就是一个效果你可以重复的去调用本来前端之路时间不是很长,但是对封装还是有一点点自己个人的理解,曾经踩过的坑也不在少数,最后总结出我个人风格的封装,听一位大神指点过,每个人都有属 ...
吐血分享：QQ群霸屏技术教程（利润篇）
QQ群技术,不论日进几百,空隙时间多的可以尝试,日进100问题不大. QQ群技术,如何赚钱,能赚多少钱?不同行业,不同关键词,不同力度,不一样的产出. 群费群费,这个和付费群是有区别的,群费在手机端 ...
C# 实现窗口无边框，可拖动效果
#region 无边框拖动效果 [DllImport("user32.dll")]//拖动无窗体的控件 public static extern bool ReleaseCaptu ...
POJ1236_A - Network of Schools _强连通分量::Tarjan算法
Time Limit: 1000MS Memory Limit: 10000K Description A number of schools are connected to a compute ...
python2.7练习小例子（九）
9)1.题目:暂停一秒输出. 程序分析:使用 time 模块的 sleep() 函数. 程序源代码: #!/usr/bin/python # -*- coding: UTF-8 ...
长沙Uber优步司机奖励政策（3月28日）
滴快车单单2.5倍,注册地址:http://www.udache.com/ 如何注册Uber司机(全国版最新最详细注册流程)/月入2万/不用抢单:http://www.cnblogs.com/mfry ...
[Jmeter]jmeter数据库性能测试配置
学习jmeter过程中,记录一些学习过程中的点点滴滴,用于备忘.本文主要介绍的是如何创建一个简单的测试计划用户测试数据库服务器. 一.添加线程组二.添加JDBC请求 1.在第一步里面定义并发用户以及 ...

Flink源码解读之状态管理