flink在k8s上的状态值丢失--恢复工作时jobmanagertaskmanager崩溃。

问题描述 投票:0回答:1

当flink作业集群(deploymentpod)在kubernetes上运行时,我们删除了jobmanager和taskmanager(kubectl delete pod XXX)。在pod运行后,我们发现以前的pod中的rocksDB和checkpoint文件路径从PVC挂载的状态不见了,而且工作正常,有什么建议可以恢复pod运行后的状态吗?我仔细检查了代码,发现检查点没有启用。我发现检查点没有启用。这是根本原因的工作不能恢复?

环境设置如下

RocksDBStateBackend backend = new RocksDBStateBackend(checkPointDataUri + "/checkpoint",true);
        backend.setDbStoragePath(checkPointDataUri + "/RocksDB");
        backend.setNumberOfTransferingThreads(1);

        // add state backend
        env.setStateBackend((StateBackend)backend);

能不能像下面这样启用检查点?

    env.enableCheckpointing(1000);
        env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
        env.getCheckpointConfig().setMinPauseBetweenCheckpoints(500);
        env.getCheckpointConfig().setCheckpointTimeout(60000);
        env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
        env.getCheckpointConfig().enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);

下面是重启日志。

2020-06-09 06:48:11,921 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Configuring application-defined state backend with job/cluster config
2020-06-09 06:48:11,921 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Using application-defined state backend: RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 'file:/opt/flink/data/ss-kpi-ewfj8/checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), localRocksDbDirectories=[/opt/flink/data/ss-kpi-ewfj8/RocksDB], enableIncrementalCheckpointing=TRUE, numberOfTransferingThreads=1}
2020-06-09 06:48:11,962 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Configuring application-defined state backend with job/cluster config
2020-06-09 06:48:11,941 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Using application-defined state backend: RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 'file:/opt/flink/data/ss-kpi-ewfj8/checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), localRocksDbDirectories=[/opt/flink/data/ss-kpi-ewfj8/RocksDB], enableIncrementalCheckpointing=TRUE, numberOfTransferingThreads=1}
2020-06-09 06:48:11,962 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Configuring application-defined state backend with job/cluster config
2020-06-09 06:48:11,941 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Using application-defined state backend: RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 'file:/opt/flink/data/ss-kpi-ewfj8/checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), localRocksDbDirectories=[/opt/flink/data/ss-kpi-ewfj8/RocksDB], enableIncrementalCheckpointing=TRUE, numberOfTransferingThreads=1}
2020-06-09 06:48:11,963 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Configuring application-defined state backend with job/cluster config
2020-06-09 06:48:11,921 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Using application-defined state backend: RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 'file:/opt/flink/data/ss-kpi-ewfj8/checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), localRocksDbDirectories=[/opt/flink/data/ss-kpi-ewfj8/RocksDB], enableIncrementalCheckpointing=TRUE, numberOfTransferingThreads=1}
2020-06-09 06:48:11,963 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Configuring application-defined state backend with job/cluster config
2020-06-09 06:48:11,963 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Using application-defined state backend: RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 'file:/opt/flink/data/ss-kpi-ewfj8/checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), localRocksDbDirectories=[/opt/flink/data/ss-kpi-ewfj8/RocksDB], enableIncrementalCheckpointing=TRUE, numberOfTransferingThreads=1}
2020-06-09 06:48:11,963 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Configuring application-defined state backend with job/cluster config
2020-06-09 06:48:11,942 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Using application-defined state backend: RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 'file:/opt/flink/data/ss-kpi-ewfj8/checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), localRocksDbDirectories=[/opt/flink/data/ss-kpi-ewfj8/RocksDB], enableIncrementalCheckpointing=TRUE, numberOfTransferingThreads=1}
2020-06-09 06:48:11,965 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Configuring application-defined state backend with job/cluster config
2020-06-09 06:48:11,961 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Using application-defined state backend: RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 'file:/opt/flink/data/ss-kpi-ewfj8/checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), localRocksDbDirectories=[/opt/flink/data/ss-kpi-ewfj8/RocksDB], enableIncrementalCheckpointing=TRUE, numberOfTransferingThreads=1}
2020-06-09 06:48:11,965 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Configuring application-defined state backend with job/cluster config
2020-06-09 06:48:11,942 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Using application-defined state backend: RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 'file:/opt/flink/data/ss-kpi-ewfj8/checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), localRocksDbDirectories=[/opt/flink/data/ss-kpi-ewfj8/RocksDB], enableIncrementalCheckpointing=TRUE, numberOfTransferingThreads=1}
2020-06-09 06:48:11,981 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Configuring application-defined state backend with job/cluster config
2020-06-09 06:48:11,944 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Using application-defined state backend: RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 'file:/opt/flink/data/ss-kpi-ewfj8/checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), localRocksDbDirectories=[/opt/flink/data/ss-kpi-ewfj8/RocksDB], enableIncrementalCheckpointing=TRUE, numberOfTransferingThreads=1}
apache-flink flink-streaming flink-cep
1个回答
0
投票

RocksDB和检查点存储在同一个文件系统中是没有意义的。RocksDB应该使用最快的可用本地文件系统--kubernetes外延存储就可以了。而检查点必须存储有人持久,在某种分布式文件系统中。

© www.soinside.com 2019 - 2024. All rights reserved.