Flink - state
public class StreamTaskState implements Serializable, Closeable { private static final long serialVersionUID = 1L; private StateHandle<?> operatorState; private StateHandle<Serializable> functionState; private HashMap<String, KvStateSnapshot<?, ?, ?, ?, ?>> kvStates;
Flink中state分为三种,
可以看到,StreamTaskState是对三种state的封装,
1. KVState
是最基本的state,
抽象是一对,KvState和KvStateSnapshot
通过两个接口,互相转化
/**
* Key/Value state implementation for user-defined state. The state is backed by a state
* backend, which typically follows one of the following patterns: Either the state is stored
* in the key/value state object directly (meaning in the executing JVM) and snapshotted by the
* state backend into some store (during checkpoints), or the key/value state is in fact backed
* by an external key/value store as the state backend, and checkpoints merely record the
* metadata of what is considered part of the checkpoint.
*
* @param <K> The type of the key.
* @param <N> The type of the namespace.
* @param <S> The type of {@link State} this {@code KvState} holds.
* @param <SD> The type of the {@link StateDescriptor} for state {@code S}.
* @param <Backend> The type of {@link AbstractStateBackend} that manages this {@code KvState}.
*/
public interface KvState<K, N, S extends State, SD extends StateDescriptor<S, ?>, Backend extends AbstractStateBackend> { /**
* Sets the current key, which will be used when using the state access methods.
*
* @param key The key.
*/
void setCurrentKey(K key); /**
* Sets the current namespace, which will be used when using the state access methods.
*
* @param namespace The namespace.
*/
void setCurrentNamespace(N namespace); /**
* Creates a snapshot of this state.
*
* @param checkpointId The ID of the checkpoint for which the snapshot should be created.
* @param timestamp The timestamp of the checkpoint.
* @return A snapshot handle for this key/value state.
*
* @throws Exception Exceptions during snapshotting the state should be forwarded, so the system
* can react to failed snapshots.
*/
KvStateSnapshot<K, N, S, SD, Backend> snapshot(long checkpointId, long timestamp) throws Exception; /**
* Disposes the key/value state, releasing all occupied resources.
*/
void dispose();
}
定义也比较简单,关键是snapshot接口,产生KvStateSnapshot
public interface KvStateSnapshot<K, N, S extends State, SD extends StateDescriptor<S, ?>, Backend extends AbstractStateBackend>
extends StateObject { /**
* Loads the key/value state back from this snapshot.
*
* @param stateBackend The state backend that created this snapshot and can restore the key/value state
* from this snapshot.
* @param keySerializer The serializer for the keys.
* @param classLoader The class loader for user-defined types.
*
* @return An instance of the key/value state loaded from this snapshot.
*
* @throws Exception Exceptions can occur during the state loading and are forwarded.
*/
KvState<K, N, S, SD, Backend> restoreState(
Backend stateBackend,
TypeSerializer<K> keySerializer,
ClassLoader classLoader) throws Exception;
}
KvStateSnapshot,对应于KvState,关键是restoreState接口
以具体的,FsState为例,
public abstract class AbstractFsState<K, N, SV, S extends State, SD extends StateDescriptor<S, ?>>
extends AbstractHeapState<K, N, SV, S, SD, FsStateBackend> {
可以看到AbstractFsState是继承AbstractHeapState的,因为对于FsState的状态也是cache在Heap中的,只是在snapshot的时候需要写文件
所以先看下AbstractHeapState,
/**
* Base class for partitioned {@link ListState} implementations that are backed by a regular
* heap hash map. The concrete implementations define how the state is checkpointed.
*
* @param <K> The type of the key.
* @param <N> The type of the namespace.
* @param <SV> The type of the values in the state.
* @param <S> The type of State
* @param <SD> The type of StateDescriptor for the State S
* @param <Backend> The type of the backend that snapshots this key/value state.
*/
public abstract class AbstractHeapState<K, N, SV, S extends State, SD extends StateDescriptor<S, ?>, Backend extends AbstractStateBackend>
implements KvState<K, N, S, SD, Backend>, State { /** Map containing the actual key/value pairs */
protected final HashMap<N, Map<K, SV>> state; //可以看到这里,多了个namespace的概念,避免key太容易重复 /** Serializer for the state value. The state value could be a List<V>, for example. */
protected final TypeSerializer<SV> stateSerializer; /** The serializer for the keys */
protected final TypeSerializer<K> keySerializer; /** The serializer for the namespace */
protected final TypeSerializer<N> namespaceSerializer; /** This holds the name of the state and can create an initial default value for the state. */
protected final SD stateDesc; //StateDescriptor,用于放一些state的信息,比如default值 /** The current key, which the next value methods will refer to */
protected K currentKey; /** The current namespace, which the access methods will refer to. */
protected N currentNamespace = null; /** Cache the state map for the current key. */
protected Map<K, SV> currentNSState; /**
* Creates a new empty key/value state.
*
* @param keySerializer The serializer for the keys.
* @param namespaceSerializer The serializer for the namespace.
* @param stateDesc The state identifier for the state. This contains name
* and can create a default state value.
*/
protected AbstractHeapState(TypeSerializer<K> keySerializer,
TypeSerializer<N> namespaceSerializer,
TypeSerializer<SV> stateSerializer,
SD stateDesc) {
this(keySerializer, namespaceSerializer, stateSerializer, stateDesc, new HashMap<N, Map<K, SV>>());
}
AbstractFsState
public abstract class AbstractFsState<K, N, SV, S extends State, SD extends StateDescriptor<S, ?>>
extends AbstractHeapState<K, N, SV, S, SD, FsStateBackend> { /** The file system state backend backing snapshots of this state */
private final FsStateBackend backend; public abstract KvStateSnapshot<K, N, S, SD, FsStateBackend> createHeapSnapshot(Path filePath); // @Override
public KvStateSnapshot<K, N, S, SD, FsStateBackend> snapshot(long checkpointId, long timestamp) throws Exception { try (FsStateBackend.FsCheckpointStateOutputStream out = backend.createCheckpointStateOutputStream(checkpointId, timestamp)) { // // serialize the state to the output stream
DataOutputViewStreamWrapper outView = new DataOutputViewStreamWrapper(new DataOutputStream(out));
outView.writeInt(state.size());
for (Map.Entry<N, Map<K, SV>> namespaceState: state.entrySet()) {
N namespace = namespaceState.getKey();
namespaceSerializer.serialize(namespace, outView);
outView.writeInt(namespaceState.getValue().size());
for (Map.Entry<K, SV> entry: namespaceState.getValue().entrySet()) {
keySerializer.serialize(entry.getKey(), outView);
stateSerializer.serialize(entry.getValue(), outView);
}
}
outView.flush(); //真实的内容是刷到文件的 // create a handle to the state
return createHeapSnapshot(out.closeAndGetPath()); //snapshot里面需要的只是path
}
}
}
对于kv state,也分为好几类,valuestate,liststate,reducestate,foldstate,
简单起见,先看valuestate
public class FsValueState<K, N, V>
extends AbstractFsState<K, N, V, ValueState<V>, ValueStateDescriptor<V>>
implements ValueState<V> { @Override
public V value() {
if (currentNSState == null) {
currentNSState = state.get(currentNamespace); //现初始化当前namespace的kv
}
if (currentNSState != null) {
V value = currentNSState.get(currentKey);
return value != null ? value : stateDesc.getDefaultValue(); //取出value,如果为null,从stateDesc中取出default
}
return stateDesc.getDefaultValue();
} @Override
public void update(V value) {
if (currentKey == null) {
throw new RuntimeException("No key available.");
} if (value == null) {
clear();
return;
} if (currentNSState == null) {
currentNSState = new HashMap<>();
state.put(currentNamespace, currentNSState);
} currentNSState.put(currentKey, value); //更新
} @Override
public KvStateSnapshot<K, N, ValueState<V>, ValueStateDescriptor<V>, FsStateBackend> createHeapSnapshot(Path filePath) {
return new Snapshot<>(getKeySerializer(), getNamespaceSerializer(), stateSerializer, stateDesc, filePath); //以文件路径,创建snapshot
}
继续看FsStateSnapshot
public abstract class AbstractFsStateSnapshot<K, N, SV, S extends State, SD extends StateDescriptor<S, ?>>
extends AbstractFileStateHandle implements KvStateSnapshot<K, N, S, SD, FsStateBackend> { public abstract KvState<K, N, S, SD, FsStateBackend> createFsState(FsStateBackend backend, HashMap<N, Map<K, SV>> stateMap); // @Override
public KvState<K, N, S, SD, FsStateBackend> restoreState(
FsStateBackend stateBackend,
final TypeSerializer<K> keySerializer,
ClassLoader classLoader) throws Exception { // state restore
ensureNotClosed(); try (FSDataInputStream inStream = stateBackend.getFileSystem().open(getFilePath())) {
// make sure the in-progress restore from the handle can be closed
registerCloseable(inStream); DataInputViewStreamWrapper inView = new DataInputViewStreamWrapper(inStream); final int numKeys = inView.readInt();
HashMap<N, Map<K, SV>> stateMap = new HashMap<>(numKeys); for (int i = 0; i < numKeys; i++) {
N namespace = namespaceSerializer.deserialize(inView);
final int numValues = inView.readInt();
Map<K, SV> namespaceMap = new HashMap<>(numValues);
stateMap.put(namespace, namespaceMap);
for (int j = 0; j < numValues; j++) {
K key = keySerializer.deserialize(inView);
SV value = stateSerializer.deserialize(inView);
namespaceMap.put(key, value);
}
} return createFsState(stateBackend, stateMap); //
}
catch (Exception e) {
throw new Exception("Failed to restore state from file system", e);
}
}
}
FsValueState内部实现的snapshot
public static class Snapshot<K, N, V> extends AbstractFsStateSnapshot<K, N, V, ValueState<V>, ValueStateDescriptor<V>> {
private static final long serialVersionUID = 1L; public Snapshot(TypeSerializer<K> keySerializer,
TypeSerializer<N> namespaceSerializer,
TypeSerializer<V> stateSerializer,
ValueStateDescriptor<V> stateDescs,
Path filePath) {
super(keySerializer, namespaceSerializer, stateSerializer, stateDescs, filePath);
} @Override
public KvState<K, N, ValueState<V>, ValueStateDescriptor<V>, FsStateBackend> createFsState(FsStateBackend backend, HashMap<N, Map<K, V>> stateMap) {
return new FsValueState<>(backend, keySerializer, namespaceSerializer, stateDesc, stateMap);
}
}
2. FunctionState
stateHandle对于KvState,更为通用一些
/**
* StateHandle is a general handle interface meant to abstract operator state fetching.
* A StateHandle implementation can for example include the state itself in cases where the state
* is lightweight or fetching it lazily from some external storage when the state is too large.
*/
public interface StateHandle<T> extends StateObject { /**
* This retrieves and return the state represented by the handle.
*
* @param userCodeClassLoader Class loader for deserializing user code specific classes
*
* @return The state represented by the handle.
* @throws java.lang.Exception Thrown, if the state cannot be fetched.
*/
T getState(ClassLoader userCodeClassLoader) throws Exception;
}
3. OperatorState,典型的是windowOperater的状态
OperatorState,也是用StateHandle作为,snapshot的抽象
看下这三种State如何做snapshot的
AbstractStreamOperator,看看和checkpoint相关的接口,可以看到只会snapshot KvState
@Override
public StreamTaskState snapshotOperatorState(long checkpointId, long timestamp) throws Exception {
// here, we deal with key/value state snapshots StreamTaskState state = new StreamTaskState(); if (stateBackend != null) {
HashMap<String, KvStateSnapshot<?, ?, ?, ?, ?>> partitionedSnapshots =
stateBackend.snapshotPartitionedState(checkpointId, timestamp);
if (partitionedSnapshots != null) {
state.setKvStates(partitionedSnapshots);
}
} return state;
} @Override
@SuppressWarnings("rawtypes,unchecked")
public void restoreState(StreamTaskState state) throws Exception {
// restore the key/value state. the actual restore happens lazily, when the function requests
// the state again, because the restore method needs information provided by the user function
if (stateBackend != null) {
stateBackend.injectKeyValueStateSnapshots((HashMap)state.getKvStates());
}
} @Override
public void notifyOfCompletedCheckpoint(long checkpointId) throws Exception {
if (stateBackend != null) {
stateBackend.notifyOfCompletedCheckpoint(checkpointId);
}
}
AbstractUdfStreamOperator
public abstract class AbstractUdfStreamOperator<OUT, F extends Function> extends AbstractStreamOperator<OUT> implements OutputTypeConfigurable<OUT>
这个首先继承了AbstractStreamOperator,看下checkpoint相关的接口,
@Override
public StreamTaskState snapshotOperatorState(long checkpointId, long timestamp) throws Exception {
StreamTaskState state = super.snapshotOperatorState(checkpointId, timestamp); //先执行super的snapshotOperatorState,即Kv state的snapshot if (userFunction instanceof Checkpointed) {
@SuppressWarnings("unchecked")
Checkpointed<Serializable> chkFunction = (Checkpointed<Serializable>) userFunction; Serializable udfState;
try {
udfState = chkFunction.snapshotState(checkpointId, timestamp); //snapshot,function的状态
}
catch (Exception e) {
throw new Exception("Failed to draw state snapshot from function: " + e.getMessage(), e);
} if (udfState != null) {
try {
AbstractStateBackend stateBackend = getStateBackend();
StateHandle<Serializable> handle =
stateBackend.checkpointStateSerializable(udfState, checkpointId, timestamp); //调用stateBackend存储state,并返回snapshot
state.setFunctionState(handle);
}
catch (Exception e) {
throw new Exception("Failed to add the state snapshot of the function to the checkpoint: "
+ e.getMessage(), e);
}
}
} return state;
} @Override
public void restoreState(StreamTaskState state) throws Exception {
super.restoreState(state); StateHandle<Serializable> stateHandle = state.getFunctionState(); if (userFunction instanceof Checkpointed && stateHandle != null) {
@SuppressWarnings("unchecked")
Checkpointed<Serializable> chkFunction = (Checkpointed<Serializable>) userFunction; Serializable functionState = stateHandle.getState(getUserCodeClassloader());
if (functionState != null) {
try {
chkFunction.restoreState(functionState);
}
catch (Exception e) {
throw new Exception("Failed to restore state to function: " + e.getMessage(), e);
}
}
}
} @Override
public void notifyOfCompletedCheckpoint(long checkpointId) throws Exception {
super.notifyOfCompletedCheckpoint(checkpointId); if (userFunction instanceof CheckpointListener) {
((CheckpointListener) userFunction).notifyCheckpointComplete(checkpointId);
}
}
可以看到这个operater,会snapshot kv state,和udf中的function的state
WindowOperator,典型的operater state
public class WindowOperator<K, IN, ACC, OUT, W extends Window>
extends AbstractUdfStreamOperator<OUT, InternalWindowFunction<ACC, OUT, K, W>>
implements OneInputStreamOperator<IN, OUT>, Triggerable, InputTypeConfigurable
public StreamTaskState snapshotOperatorState(long checkpointId, long timestamp) throws Exception { if (mergingWindowsByKey != null) {
TupleSerializer<Tuple2<W, W>> tupleSerializer = new TupleSerializer<>((Class) Tuple2.class, new TypeSerializer[] {windowSerializer, windowSerializer} );
ListStateDescriptor<Tuple2<W, W>> mergeStateDescriptor = new ListStateDescriptor<>("merging-window-set", tupleSerializer);
for (Map.Entry<K, MergingWindowSet<W>> key: mergingWindowsByKey.entrySet()) {
setKeyContext(key.getKey());
ListState<Tuple2<W, W>> mergeState = getStateBackend().getPartitionedState(null, VoidSerializer.INSTANCE, mergeStateDescriptor);
mergeState.clear();
key.getValue().persist(mergeState);
}
} StreamTaskState taskState = super.snapshotOperatorState(checkpointId, timestamp); AbstractStateBackend.CheckpointStateOutputView out =
getStateBackend().createCheckpointStateOutputView(checkpointId, timestamp); snapshotTimers(out); taskState.setOperatorState(out.closeAndGetHandle()); return taskState;
} @Override
public void restoreState(StreamTaskState taskState) throws Exception {
super.restoreState(taskState); final ClassLoader userClassloader = getUserCodeClassloader(); @SuppressWarnings("unchecked")
StateHandle<DataInputView> inputState = (StateHandle<DataInputView>) taskState.getOperatorState();
DataInputView in = inputState.getState(userClassloader); restoreTimers(in);
}
Flink - state的更多相关文章
- Flink State 有可能代替数据库吗?
有状态的计算作为容错以及数据一致性的保证,是当今实时计算必不可少的特性之一,流行的实时计算引擎包括 Google Dataflow.Flink.Spark (Structure) Streaming. ...
- 一文了解Flink State Backends
原文链接: 一文了解Flink State Backends 当我们使用Flink进行流式计算时,通常会产生各种形式的中间结果,我们称之为State.有状态产生,就必然涉及到状态的存储,那么Flink ...
- Flink - state管理
在Flink – Checkpoint 没有描述了整个checkpoint的流程,但是对于如何生成snapshot和恢复snapshot的过程,并没有详细描述,这里补充 StreamOperato ...
- Flink State Backends (状态后端)
State Backends 的作用 有状态的流计算是Flink的一大特点,状态本质上是数据,数据是需要维护的,例如数据库就是维护数据的一种解决方案.State Backends 的作用就是用来维护S ...
- Flink State Rescale性能优化
背景 今天我们来聊一聊flink中状态rescale的性能优化.我们知道flink是一个支持带状态计算的引擎,其中的状态分为了operator state和 keyed state两类.简而言之ope ...
- Flink State的两张图
streamTask的invoke方法中,会循环去调用task上的每个operator的initializeState方法,在这个方法中,会真正创建除了savepointStream的其他三个对象, ...
- 从udaf谈flink的state
1.前言 本文主要基于实践过程中遇到的一系列问题,来详细说明Flink的状态后端是什么样的执行机制,以理解自定义函数应该怎么写比较合理,避免踩坑. 内容是基于Flink SQL的使用,主要说明自定义聚 ...
- Flink的高可用集群环境
Flink的高可用集群环境 Flink简介 Flink核心是一个流式的数据流执行引擎,其针对数据流的分布式计算提供了数据分布,数据通信以及容错机制等功能. 因现在主要Flink这一块做先关方面的学习, ...
- Apache-Flink深度解析-State
摘要: 实际问题 在流计算场景中,数据会源源不断的流入Apache Flink系统,每条数据进入Apache Flink系统都会触发计算.如果我们想进行一个Count聚合计算,那么每次触发计算是将历史 ...
随机推荐
- ArcGIS 点到直线的距离
/****点到直线的距离*** * 过点(x1,y1)和点(x2,y2)的直线方程为:KX -Y + (x2y1 - x1y2)/(x2-x1) = 0 * 设直线斜率为K = (y2-y1)/(x2 ...
- 转载:Java面试笔试题大汇总
本文来源于:http://blog.csdn.net/wulianghuan 1.面向对象的特征有哪些方面 1).抽象: 抽象就是忽略一个主题中与当前目标无关的那些方面,以便更充分地注意与当前目标有关 ...
- Python int与string 的转换
string → int 1.10进制的string转化为int int('12') → type(int('12')) 进行验证 2.16进制的string转化为int int('12', 16) ...
- git 学习笔记5--rm & mv,undo
rm 删除文件 rm <file> #Unix删除文件 git rm <file> # git删除文件 git rm -f <file> # git强制删除文件 g ...
- oracle通过sequence定义触发器支持id的自增
sequence:s_author_id,从1开始步幅是1,最大值是999999999 -- Create sequence create sequence S_AUTHOR_ID minvalue ...
- 实战Hadoop中遇到的几个类、接口说明
1. Configuration :public 类型接口,这个接口包含的多数方法是进行与数据属性<key,value>有关的操作. 几个方法: 1)addProperty(String ...
- myeclipse下拷贝的项目,tomcat下部署名称和导出为war包的名称默认值修改
拷贝一个项目,作为一个新的项目,给它换了名字,这时候默认的部署名称等都是原来项目的,这时候要在属性里面修改一下.
- ural 2064. Caterpillars
2064. Caterpillars Time limit: 3.0 secondMemory limit: 64 MB Young gardener didn’t visit his garden ...
- CDOJ 1437 谭松松的旅游计划 Label:倍增LCA && 最短路
谭松松的旅游计划 Time Limit: 3000/1000MS (Java/Others) Memory Limit: 65535/65535KB (Java/Others) Submit ...
- JavaScript进阶篇
组团,并给团取个名(如何创建数组) 使用数组之前首先要创建,而且需要把数组本身赋至一个变量.好比我们出游,要组团,并给团定个名字“云南之旅”. 创建数组语法: var myarray=new Arra ...