一、从何说起

State要能发挥作用,就需要持久化到可靠存储中,flink中持久化的动作就是checkpointing,那么从TM中执行的Task的基类StreamTask的checkpoint逻辑说起。

1.streamTask

 StreamTask

 protected OperatorChain<OUT, OP> operatorChain;
CheckpointStreamFactory createCheckpointStreamFactory(StreamOperator<?> operator)
<K> AbstractKeyedStateBackend<K> createKeyedStateBackend( TypeSerializer<K> keySerializer, int numberOfKeyGroups, KeyGroupRange keyGroupRange)
OperatorStateBackend createOperatorStateBackend( StreamOperator<?> op, Collection<OperatorStateHandle> restoreStateHandles)
CheckpointStreamFactory createSavepointStreamFactory(StreamOperator<?> operator, String targetLocation)
StateBackend createStateBackend()
boolean triggerCheckpoint(CheckpointMetaData checkpointMetaData, CheckpointOptions checkpointOptions)
void triggerCheckpointOnBarrier( CheckpointMetaData checkpointMetaData, CheckpointOptions checkpointOptions, CheckpointMetrics checkpointMetrics)
boolean performCheckpoint( CheckpointMetaData checkpointMetaData, CheckpointOptions checkpointOptions, CheckpointMetrics checkpointMetrics)
void checkpointState( CheckpointMetaData checkpointMetaData, CheckpointOptions checkpointOptions, CheckpointMetrics checkpointMetrics)
triggerCheckpoint->performCheckpoint->checkpointState,最终来到了checkpointingOperation。

2.checkpointingOperation

 CheckpointingOperation
void executeCheckpointing(){
……
for (StreamOperator<?> op : allOperators) {
checkpointStreamOperator(op);
} ……
}
void checkpointStreamOperator(StreamOperator<?> op)
……
op.snapshotState( checkpointMetaData.getCheckpointId(), checkpointMetaData.getTimestamp(), checkpointOptions)
……

这个类中,直接对streamTask中传入的每一个operator调用其snapshotState方法。

那就再看Operator的基类。

3.StreamOperator

 StreamOperator
OperatorSnapshotResult snapshotState( long checkpointId, long timestamp, CheckpointOptions checkpointOptions)
void initializeState(OperatorSubtaskState stateHandles)
void notifyOfCompletedCheckpoint(long checkpointId)

StreamOperator是一个接口,其中包含了这三个接口,意味着继承它的Operator都必须实现这几个方法。

4.AbstractStreamOperator

 AbstractStreamOperator
Final OperatorSnapshotResult snapshotState(long checkpointId, long timestamp, CheckpointOptions checkpointOptions) ……
snapshotState(snapshotContext);
……
if (null != operatorStateBackend) { snapshotInProgress.setOperatorStateManagedFuture( operatorStateBackend.snapshot(checkpointId, timestamp, factory, checkpointOptions)); } if (null != keyedStateBackend) { snapshotInProgress.setKeyedStateManagedFuture( keyedStateBackend.snapshot(checkpointId, timestamp, factory, checkpointOptions)); }
……
void notifyOfCompletedCheckpoint(long checkpointId)
if (keyedStateBackend != null) { keyedStateBackend.notifyCheckpointComplete(checkpointId); }
void snapshotState(StateSnapshotContext context)
void initializeState(StateInitializationContext context)
AbstractStreamOperator是对StreamOperator的基础实现,在它的snapshotState方法中,分别调用了OperatorStateBackend和KeyedStateBackend的snapshot方法。
特别注意,在调用这两个方法之前的snapshotState(snapshotContext)这个调用,它一方面实现了Raw的State的snapshot,一方面也实现了用户自定义的函数的State的更新。

再说一下,后面的两个函数,snapshotState和initializeState,他们的形参都是一个context,是提供给用户来重新实现用户自己的state的checkpoints的。

这个类有一个很重要的子类,AbstractUdfStreamOperator,很多Operator都从这个类开始继承。

5.AbstractUdfStreamOperator

AbstractUdfStreamOperator
void initializeState(StateInitializationContext context) throws Exception { super.initializeState(context); StreamingFunctionUtils.restoreFunctionState(context, userFunction);
void snapshotState(StateSnapshotContext context) throws Exception { super.snapshotState(context); StreamingFunctionUtils.snapshotFunctionState(context, getOperatorStateBackend(), userFunction)
这里可以很明显的看到,在实现父类的方法的过程中,它添加了东西,就是userFunction的restore和snapshot。

看看上面这些子类,真正会被实例化的Operator。

6.StreamingFunctionUtils

 StreamingFunctionUtils
void snapshotFunctionState( StateSnapshotContext context, OperatorStateBackend backend, Function userFunction {
……
while (true) { if (trySnapshotFunctionState(context, backend, userFunction)) { break; } // inspect if the user function is wrapped, then unwrap and try again if we can snapshot the inner function if (userFunction instanceof WrappingFunction) { userFunction = ((WrappingFunction<?>) userFunction).getWrappedFunction(); } else { break; } } }
boolean trySnapshotFunctionState( StateSnapshotContext context, OperatorStateBackend backend, Function userFunction) throws Exception { if (userFunction instanceof CheckpointedFunction) { …… return true; } if (userFunction instanceof ListCheckpointed) { ……
return true; } return false; }

从上面可以看到,这个Util的作用,就用就是把用户实现的CheckpointedFunction和ListCheckpointed来做restore和snapshot。

二、工厂

上面从task和operator的层面说明了state保存的过程,那么保存到哪里?就由下面的三个工厂类来提供。

7.State backend

MemoryStateBackend

FsStateBackend

RocksDBStateBackend

CheckpointStream

MemCheckpointStreamFactory

FsCheckpointStreamFactory

FsCheckpointStreamFactory

SavepointStream

MemCheckpointStreamFactory

FsSavepointStreamFactory

FsSavepointStreamFactory

KeyedStateBackend

HeapKeyedStateBackend

HeapKeyedStateBackend

RocksDBKeyedStateBackend

OperatorStateBackend

DefaultOperatorStateBackend

DefaultOperatorStateBackend

DefaultOperatorStateBackend

RocksDBStateBackend的构造函数可以传入一个AbstractStateBackend,否则默认采用FsStateBackend

可以看到,从OperatorState的角度来讲,目前Flink只有一个实现,即DefaultOperatorStateBackend,它将List风格的State保存在内存中。

从KeyedState的角度来讲,目前有两种实现,HeapKeyedStateBackend将state保存在内存中,而RocksDbKeyedStateBackend将State保存在TM本地的RocksDB中。相对而言,前者在内存中,速度会快,效率高,但一方面会限制state的大小,另一方面也会造成JVM自己的内存问题;后者在本地文件中,就会涉及序列化和反序列化,效率不及前者,但可以保存的state的大小会很大。

从checkpoint和savepoint的角度来看,Memory工厂方法都保存在内存中,显然不能在生产环境使用,而Fs工厂方法和RocksDb工厂方法,则统一都放在文件系统中,比如HDFS。

三、房子

具体存储State的目前有三种,以DefaultOperatorStateBackend作为OperatorState的例子,以及HeapKeyedStateBackend作为KeyedState的例子来看。

8.DefaultOperatorStateBackend

DefaultOperatorStateBackend
Map<String, PartitionableListState<?>> registeredStates;
RunnableFuture<OperatorStateHandle> snapshot( final long checkpointId, final long timestamp, final CheckpointStreamFactory streamFactory, final CheckpointOptions checkpointOptions)
……
if (registeredStates.isEmpty()) { return DoneFuture.nullValue(); }
……
for (Map.Entry<String, PartitionableListState<?>> entry : this.registeredStates.entrySet())
……
ListState<S> getListState(ListStateDescriptor<S> stateDescriptor)

这里截取了三个方法,其中registeredStates可以看到,其还是以map的方式在存储,snapshotState方法具体实现了刚才在AbstractStreamOperator中调用snapshotState的方法,后面的getListState提供了在用户编程中提供ListState实例的接口。

 PartitionableListState<S>
/** * The internal list the holds the elements of the state */ private final ArrayList<S> internalList;

由此可以看出 OperatorState都保存在内存中,本质上还是一个ArrayList。

9.HeapKeyedStateBackend

 * @param <K> The key by which state is keyed.

 HeapKeyedStateBackend<K>

 /**
* Map of state tables that stores all state of key/value states. We store it centrally so
* that we can easily checkpoint/restore it.
*
* <p>The actual parameters of StateTable are {@code StateTable<NamespaceT, Map<KeyT, StateT>>}
* but we can't put them here because different key/value states with different types and
* namespace types share this central list of tables.
*/
private final HashMap<String, StateTable<K, ?, ?>> stateTables = new HashMap<>(); <N, V> InternalValueState<N, V> createValueState(
TypeSerializer<N> namespaceSerializer,
ValueStateDescriptor<V> stateDesc){
StateTable<K, N, V> stateTable = tryRegisterStateTable(namespaceSerializer, stateDesc);
return new HeapValueState<>(stateDesc, stateTable, keySerializer, namespaceSerializer);
} <N, T> InternalListState<N, T> createListState(
TypeSerializer<N> namespaceSerializer,
ListStateDescriptor<T> stateDesc) new HeapListState<>
<N, T> InternalReducingState<N, T> createReducingState(
TypeSerializer<N> namespaceSerializer,
ReducingStateDescriptor<T> stateDesc) new HeapReducingState<>
<N, T, ACC, R> InternalAggregatingState<N, T, R> createAggregatingState(
TypeSerializer<N> namespaceSerializer,
AggregatingStateDescriptor<T, ACC, R> stateDesc) new HeapAggregatingState<>
<N, T, ACC> InternalFoldingState<N, T, ACC> createFoldingState(
TypeSerializer<N> namespaceSerializer,
FoldingStateDescriptor<T, ACC> stateDesc) new HeapFoldingState<>
<N, UK, UV> InternalMapState<N, UK, UV> createMapState(TypeSerializer<N> namespaceSerializer,
MapStateDescriptor<UK, UV> stateDesc) new HeapMapState<>
RunnableFuture<KeyedStateHandle> snapshot(
final long checkpointId,
final long timestamp,
final CheckpointStreamFactory streamFactory,
CheckpointOptions checkpointOptions) …… if (!hasRegisteredState()) {
return DoneFuture.nullValue();
} ……

这里也类似,几个create方法也都提供了在用户编程中可以调用的接口,分别返回对应类型的State。snapshotState也是对AbstractStreamOperator中调用的具体实现。

四、通道

所谓通道,也就是通过用户编程,如何使得用户使用的State和上面的DefaultOperatorStateBackend和HeapKeyedStateBackend发生关联。用户编程中首先面对的就是StreamingRuntimeContext这个类。

10.StreamingRuntimeContext

 StreamingRuntimeContext

 public <T> ValueState<T> getState(ValueStateDescriptor<T> stateProperties) {
KeyedStateStore keyedStateStore = checkPreconditionsAndGetKeyedStateStore(stateProperties);
stateProperties.initializeSerializerUnlessSet(getExecutionConfig());
return keyedStateStore.getState(stateProperties);
}

这里只截取了getState的方法,其他类型的State的方法类似,这里也很简单,就是看看是否能拿到KeyedStateStore,然后用其去生成State。

11.PerWindowStateStore

 PerWindowStateStore

 @Override
public <T> ListState<T> getListState(ListStateDescriptor<T> stateProperties) {
try {
return WindowOperator.this.getPartitionedState(window, windowSerializer, stateProperties);
} catch (Exception e) {
throw new RuntimeException("Could not retrieve state", e);
}
}

PerWindowStateStore是KeyedStateStore的一个子类,具体实现了如何去拿。其中的getPartitionedState最终还是调到了AbstractStreamOperator。

12.AbstractStreamOperator

 AbstractStreamOperator

 protected <S extends State, N> S getPartitionedState(
N namespace,
TypeSerializer<N> namespaceSerializer,
StateDescriptor<S, ?> stateDescriptor) throws Exception { /*
TODO: NOTE: This method does a lot of work caching / retrieving states just to update the namespace.
This method should be removed for the sake of namespaces being lazily fetched from the keyed
state backend, or being set on the state directly.
*/ if (keyedStateStore != null) {
return keyedStateBackend.getPartitionedState(namespace, namespaceSerializer, stateDescriptor);
} else {
throw new RuntimeException("Cannot create partitioned state. The keyed state " +
"backend has not been set. This indicates that the operator is not " +
"partitioned/keyed.");
}
}

这里也就是一个二传手的作用,还是调回了keyedStateBackend的方法。

13.AbstractKeyedStateBackend

 AbstractKeyedStateBackend

 <N, S extends State> S getPartitionedState(
final N namespace,
final TypeSerializer<N> namespaceSerializer,
final StateDescriptor<S, ?> stateDescriptor) <N, S extends State, V> S getOrCreateKeyedState(
final TypeSerializer<N> namespaceSerializer,
StateDescriptor<S, V> stateDescriptor) // create a new blank key/value state
S state = stateDescriptor.bind(new StateBinder() {
@Override
public <T> ValueState<T> createValueState(ValueStateDescriptor<T> stateDesc) throws Exception {
return AbstractKeyedStateBackend.this.createValueState(namespaceSerializer, stateDesc);
} @Override
public <T> ListState<T> createListState(ListStateDescriptor<T> stateDesc) throws Exception {
return AbstractKeyedStateBackend.this.createListState(namespaceSerializer, stateDesc);
} @Override
public <T> ReducingState<T> createReducingState(ReducingStateDescriptor<T> stateDesc) throws Exception {
return AbstractKeyedStateBackend.this.createReducingState(namespaceSerializer, stateDesc);
} @Override
public <T, ACC, R> AggregatingState<T, R> createAggregatingState(
AggregatingStateDescriptor<T, ACC, R> stateDesc) throws Exception {
return AbstractKeyedStateBackend.this.createAggregatingState(namespaceSerializer, stateDesc);
} @Override
public <T, ACC> FoldingState<T, ACC> createFoldingState(FoldingStateDescriptor<T, ACC> stateDesc) throws Exception {
return AbstractKeyedStateBackend.this.createFoldingState(namespaceSerializer, stateDesc);
} @Override
public <UK, UV> MapState<UK, UV> createMapState(MapStateDescriptor<UK, UV> stateDesc) throws Exception {
return AbstractKeyedStateBackend.this.createMapState(namespaceSerializer, stateDesc);
} });

可以看到这里才是真正实现State生成的逻辑,在stateDescriptor.bind这里实现了一个向上绑定,还是比较微妙的。其实在真正的运行中,这里的this就会变成HeapKeyedStateBacked或者RocksDbKeyedStateBackend,它们才真正负责最后的生成。

14.StateInitializationContextImpl

 StateInitializationContextImpl
public OperatorStateStore getOperatorStateStore() { return operatorStateStore; }

这个是OperatorState的部分,最终也会调到DefaultOperatorStateBackend的getListState方法,创建state,并注册state。

五、状态

说完了用处,存储和发生关联,这里才是State本尊的介绍。先来看看如果要实现OperatorState怎么弄。

15.CheckpointedFunction and ListCheckpointed

 interface CheckpointedFunction {

    void snapshotState(FunctionSnapshotContext context) throws Exception;

    void initializeState(FunctionInitializationContext context) throws Exception;

 }

 public interface ListCheckpointed<T extends Serializable> {

       List<T> snapshotState(long checkpointId, long timestamp) throws Exception;

       void restoreState(List<T> state) throws Exception;

 }

 public class BufferingSink

         implements SinkFunction<Tuple2<String, Integer>>,

         CheckpointedFunction {

     private final int threshold;

 //pay attention here, the definition of the state

     private transient ListState<Tuple2<String, Integer>> checkpointedState;

     private List<Tuple2<String, Integer>> bufferedElements;

     public BufferingSink(int threshold) {

         this.threshold = threshold;

         this.bufferedElements = new ArrayList<>();

     }

     @Override

     public void invoke(Tuple2<String, Integer> value) throws Exception {

         bufferedElements.add(value);

         if (bufferedElements.size() == threshold) {

             for (Tuple2<String, Integer> element: bufferedElements) {

                 // send it to the sink

             }

             bufferedElements.clear();

         }

     }

     @Override

     public void snapshotState(FunctionSnapshotContext context) throws Exception {

         checkpointedState.clear();

         for (Tuple2<String, Integer> element : bufferedElements) {

             checkpointedState.add(element);

         }

     }

     @Override

     public void initializeState(FunctionInitializationContext context) throws Exception {

         //new a descriptor
ListStateDescriptor<Tuple2<String, Integer>> descriptor = new ListStateDescriptor<>( "buffered-elements", TypeInformation.of(new TypeHint<Tuple2<String, Integer>>() {})); //get the state by OperatorStateStor checkpointedState = context.getOperatorStateStore().getListState(descriptor); //unlike keyed state, flink will do the restore, user should take care of the restore of the operator state if (context.isRestored()) { for (Tuple2<String, Integer> element : checkpointedState.get()) { bufferedElements.add(element); } } } }

创建一个ListStateDescriptor,然后从context中获取OperatorStateStore,也就是刚才的DefaultOperatorStateStore来具体生成状态。

这里关键的一点在于initializeState方法中的isRestored的判断,需要用户自己来决定如何恢复State。

16.RichFunction

获取任何的KeyedState都必须在RichFunction的子类中才能进行。

 public class CountWindowAverage extends RichFlatMapFunction<Tuple2<Long, Long>, Tuple2<Long, Long>> {

     /**

      * The ValueState handle. The first field is the count, the second field a running sum.

      */

     private transient ValueState<Tuple2<Long, Long>> sum;//the Keyed State definition

     @Override

     public void flatMap(Tuple2<Long, Long> input, Collector<Tuple2<Long, Long>> out) throws Exception {

         // access the state value

         Tuple2<Long, Long> currentSum = sum.value();

         // update the count

         currentSum.f0 += 1;

         // add the second field of the input value

         currentSum.f1 += input.f1;

         // make sure to update the state

         sum.update(currentSum);

         // if the count reaches 2, emit the average and clear the state

         if (currentSum.f0 >= 2) {

             out.collect(new Tuple2<>(input.f0, currentSum.f1 / currentSum.f0));

             sum.clear();

         }

     }

     @Override

     public void open(Configuration config) {
//new a descriptor according to the Keyed State ValueStateDescriptor<Tuple2<Long, Long>> descriptor = new ValueStateDescriptor<>( "average", // the state name TypeInformation.of(new TypeHint<Tuple2<Long, Long>>() {}), // type information Tuple2.of(0L, 0L)); // default value of the state, if nothing was set //using the context to get the Keyed State
sum = getRuntimeContext().getState(descriptor); } } // this can be used in a streaming program like this (assuming we have a StreamExecutionEnvironment env) env.fromElements(Tuple2.of(1L, 3L), Tuple2.of(1L, 5L), Tuple2.of(1L, 7L), Tuple2.of(1L, 4L), Tuple2.of(1L, 2L)) .keyBy(0) .flatMap(new CountWindowAverage()) .print(); // the printed output will be (1,4) and (1,5)

这里的Open方法也类似,都是定义一个descriptor,然后直接在context上获取对应的State。

17.State type

Managed State

Raw State

Keyed State

RichFunction

1,2

OperatorState

CheckpointedFunction

ListCheckpointed

1. AbstractStreamOperator.initializeState(StateInitializationContext context)

2. AbstractStreamOperator.snapshotState(StateSnapshotContext context)

Keyed State:

ValueState<T>:保持一个可以更新和获取的值(每个Key一个value),可以用来update(T)更新,用来T value()获取。

ListState<T>: 保持一个值的列表,用add(T) 或者 addAll(List<T>)来添加,用Iterable<T> get()来获取。

ReducingState<T>: 保持一个值,这个值是状态的很多值的聚合结果,接口和ListState类似,但是可以用相应的ReduceFunction来聚合。

AggregatingState<IN, OUT>:保持很多值的聚合结果的单一值,与ReducingState相比,不同点在于聚合类型可以和元素类型不同,提供AggregateFunction来实现聚合。

FoldingState<T, ACC>: 与AggregatingState类似,除了使用FoldFunction进行聚合。

MapState<UK, UV>: 保持一组映射,可以将kv放进这个状态,使用put(UK, UV) or putAll(Map<UK, UV>)添加,或者使用get(UK)获取。

18.FlinkKafkaConsumerBase

 FlinkKafkaConsumerBase

 final void initializeState(FunctionInitializationContext context) throws Exception {
OperatorStateStore stateStore = context.getOperatorStateStore();
ListState<Tuple2<KafkaTopicPartition, Long>> oldRoundRobinListState =
stateStore.getSerializableListState(DefaultOperatorStateBackend.DEFAULT_OPERATOR_STATE_NAME);
this.unionOffsetStates = stateStore.getUnionListState(new ListStateDescriptor<>(
OFFSETS_STATE_NAME,
TypeInformation.of(new TypeHint<Tuple2<KafkaTopicPartition, Long>>() {}))); …… final void snapshotState(FunctionSnapshotContext context){ …… unionOffsetStates.add(Tuple2.of(subscribedPartition.getKey(), subscribedPartition.getValue()));
……

作为source和operator state的示例。

19.ElasticsearchSinkBase

 abstract class ElasticsearchSinkBase

 @Override
public void initializeState(FunctionInitializationContext context) throws Exception {
// no initialization needed
} @Override
public void snapshotState(FunctionSnapshotContext context) throws Exception {
checkErrorAndRethrow(); if (flushOnCheckpoint) {
do {
bulkProcessor.flush();
checkErrorAndRethrow();
} while (numPendingRequests.get() != 0);
}
}

In all the subclass of this, no one override these two method.

作为sink和operatorstate的实例。

六、恢复

20.Restore

20.1 Introduction

无状态的重分布,直接数据重分布就可。有了状态,就需要先把状态存下来,然后再拆分,以一定的策略来重分布。

20.2 OperatorState

目前flink官方只实现了如下的重分布方案。

RoundRobinOperatorStateRepartitioner

20.3 KeyedState

20.3.1 key distribution

hash(key) mod parallelism

对keyedState而言,只是跟随key的分布即可。但是为了提高效率,引入了KeyGroup的概念。

20.3.2 KeyGroup

20.3.2.1 Introduce of KeyGroup

Without KeyGroup, the keys in the subtask are wrote sequentially, which is not easy to rescale on parallelism adjust. KeyGroup may have a range of keys, and can be assigned to subtask. Then when checkpointing, keys within the KeyGroup will be wrote together, when rescaling, KeyState of the keys within the same KeyGroup will be read sequeatially. The number of KeyGroup is the upper limit for parallelism, and the number of KeyGroup must be determined before the job is started and cannot be changed after the fact.

20.3.2.2 Determine of KeyGroup

setMaxParallelism,the lower limit is 0<, and the upper limit is <=32768.

KeyGroup的数量和maxParallelism的值是一致的。

七、其他

21.Misc

1.能否在非keyby的语句后面直接接一个RichFunction来使用KeyedState?

在构造StreamGraph的过程中,会判断当前的transform是否有keySelector,如果有,就会在streamNode上设置keySerializer。

然后在Operator的初始化过程中,会判断是否有KeySerializer,如果有,才会生成KeyedStateBackend。

后续利用KeyedstateBackend来生成相应的KeyedState。

如果没有keyby,直接实现一个RichMapFunction,则可以判断出没有KeyedStateBackend,在运行时会抛出异常。

2.究竟KeyedState中的ListState和OperatorState中的ListState是不是一回事?

首先来看ListState是个啥

public interface ListState<T> extends MergingState<T, Iterable<T>> {}

显然它只是一个空接口,用命名的方式来增加一种约束说明。下面是它的继承图。

可以看到最初的基类以及中间的父类,分布都通过命名的方式来增加约束,其中State只定义了clear方法,AppendingState定义了get和add方法,MergingState的意义和ListState的类似。

然后我们看DefaultOperatorStateBackend中定义了生成state的接口,

 <S> ListState<S> getListState(ListStateDescriptor<S> stateDescriptor)

的确,它返回的是一个ListState,但别忘了,这只是一个接口,实际返回是什么了?是PartitionableListState<S>,那就来看看他的继承关系:

可以看到,他实现了ListState这个接口,具体的代码也比较简单,内部以一个ArrayList来存储泛型S类型的State。

好,回过头来,我们看看KeyedState的逻辑,看看最外面的接口KeyedStateStore的声明方式:

 @PublicEvolving
public interface KeyedStateStore {
@PublicEvolving
<T> ListState<T> getListState(ListStateDescriptor<T> stateProperties);

看到这里,我们看到,声明的出参和OperatorState的是一致,可是我们也知道这个只是个空接口,实际如何了?

还得回到HeapKeyedStateBackend来看下,

@Override
public <N, T> InternalListState<N, T> createListState(
TypeSerializer<N> namespaceSerializer,
ListStateDescriptor<T> stateDesc) throws Exception {
……
return new HeapListState<>(stateDesc, stateTable, keySerializer, namespaceSerializer);
}

中间部分我们都略去,看到这里其实变了,实际函数的出参是InternalListState,可以理解它是ListState的一个子类,但最终返回的是一个HeapListState,同样,来看看它的继承图:

从这个图上也能看到,HeapListState实现了InternalListState进而间接实现了ListState,但其实这两个接口都是空接口,都只是一种声明,没有任何的动作或者方法包含在里面。

所以,回到问题上,KeyedState的ListState和OperatorState的ListState是一回事吗?

还是不好回答,从语法上来讲,的确是一回事,因为就是同一个类型啊;可是在实际运行当中,前面也看到了,还是有很大不同的两个类。

那肯定又有人问了,PartitionableListState和HeapListState有什么区别?如果直接回答,一个是用在OperatorState中的,一个是用在KeyedState中,估计你肯定不满意。孔子曰,神经病。

Flink源码解读之状态管理的更多相关文章

  1. 14:Spark Streaming源码解读之State管理之updateStateByKey和mapWithState解密

    首先简单解释一下)) //要使用updateStateByKey方法,必须设置Checkpoint. ssc.checkpoint("/checkpoint/") val sock ...

  2. Spark Streaming源码解读之State管理之UpdataStateByKey和MapWithState解密

    本期内容 : UpdateStateByKey解密 MapWithState解密 Spark Streaming是实现State状态管理因素: 01. Spark Streaming是按照整个Bach ...

  3. apache flink源码挖坑 (未完待续)

    Apache Flink 源码解读(一) ​ By yyz940922原创 项目模块 (除去.git, .github, .idea, docs等): flink-annotations: flink ...

  4. Flink 源码解析 —— 深度解析 Flink 是如何管理好内存的?

    前言 如今,许多用于分析大型数据集的开源系统都是用 Java 或者是基于 JVM 的编程语言实现的.最着名的例子是 Apache Hadoop,还有较新的框架,如 Apache Spark.Apach ...

  5. Flink 源码解析 —— 源码编译运行

    更新一篇知识星球里面的源码分析文章,去年写的,周末自己录了个视频,大家看下效果好吗?如果好的话,后面补录发在知识星球里面的其他源码解析文章. 前言 之前自己本地 clone 了 Flink 的源码,编 ...

  6. Flink 源码解析 —— 如何获取 ExecutionGraph ?

    https://t.zsxq.com/UnA2jIi 博客 1.Flink 从0到1学习 -- Apache Flink 介绍 2.Flink 从0到1学习 -- Mac 上搭建 Flink 1.6. ...

  7. Flink 源码解析 —— 如何获取 JobGraph?

    JobGraph https://t.zsxq.com/naaMf6y 博客 1.Flink 从0到1学习 -- Apache Flink 介绍 2.Flink 从0到1学习 -- Mac 上搭建 F ...

  8. Flink 源码解析 —— Flink JobManager 有什么作用?

    JobManager 的作用 https://t.zsxq.com/2VRrbuf 博客 1.Flink 从0到1学习 -- Apache Flink 介绍 2.Flink 从0到1学习 -- Mac ...

  9. Flink 源码解析 —— JobManager 处理 SubmitJob 的过程

    JobManager 处理 SubmitJob https://t.zsxq.com/3JQJMzZ 博客 1.Flink 从0到1学习 -- Apache Flink 介绍 2.Flink 从0到1 ...

随机推荐

  1. vim删除文件所有内容

    在命令模式下,输入:.,$d 回车.

  2. numpy的总结

    一:基础篇 1)数值 import numpy as np np.set_printoptions(linewidth=200,suppress=True) a = np.array([1,2,3,4 ...

  3. nignx 配置服务集群

    前言:这里只是简单介绍Nginx简单APP Server集群的搭建和设置发向代理. 后续有时间我会陆续加上Nginx的基础知识.三种负载均衡的策略设置.实现算法的介绍.(最后如果有测试环境,再模拟Ng ...

  4. IDEA开发vue.js卡死问题

    在执行cnpm install后会在node_modules这个文件下面生成vue的相关依赖文件, 这个时候当执行cnpm run dev命令时,会导致IDEA出现卡死的问题,解决方法如下:

  5. for循环小练习

    for循环是前测试循环语句 for(初始值:判定条件:步长){ 循环语句 } For循环原理: For循环第一次执行:首先执行语句1,然后执行语句2,如果条件为真,向内执行执行循环语句3. 如果条件为 ...

  6. MySQL 主从服务器配置

    在主服务器Ubuntu上进行备份,执行命令: mysqldump -uroot -p --all-databases --lock-all-tables > ~/master_db.sql -u ...

  7. HDU 1495 非常可乐 (只是转了个弯的广搜题)

    N - 非常可乐 =========================================================================================== ...

  8. (长期更新)OI常用模板

    代码很简单的模板就不收录了. DFT 离散傅立叶变换 void dft(pdd *a,int l,bool r){ int i,j=l/2,k; for(i=1;i<l;++i){ if(i&l ...

  9. ABAP CDS ON HANA-(12)ODATA Service

    Create a CDS view and we have the view type as ‘BASIC’ view To publish this as oData, add the annota ...

  10. P1189 SEARCH(逃跑的拉尔夫)

    P1189 SEARCH 题目描述 年轻的拉尔夫开玩笑地从一个小镇上偷走了一辆车,但他没想到的是那辆车属于警察局,并且车上装有用于发射车子移动路线的装置. 那个装置太旧了,以至于只能发射关于那辆车的移 ...