flink sql 作业抛出 no space left 异常,尽管有足够的空间可用

问题描述 投票:0回答:1

Flink版本:1.17.1 环境:EKS Flink Kubernetes Operator

RocksDB 后端的 Flink SQL 作业检查点状态小于 10GB。 我们将实例类型从 m5d.2xlarge 更改为 r5d.xlarge,并且 flink yaml 中的唯一更改是任务管理器 CPU 从 4 > 3 更改。

不知何故,应用程序无法从保存点启动并抱怨空间不足。 尝试在 opt 文件夹中创建小文件并且它有效。最后用最后一个稳定的检查点而不是保存点开始工作并开始工作。

问题:还有哪些可能的问题会导致“设备上没有剩余空间”错误,因为在这种情况下有足够的空间

例外:

2024-01-08 17:16:50,402 ERROR org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder [] - Caught unexpected exception.
org.rocksdb.RocksDBException: While open a file for appending: /opt/flink/rocksdb/job_7b2937ad2b8a0189e1b27c0103408fc1_op_StreamingJoinOperator_9a0d21fd6364c562c773029964ae8006__2_8__uuid_5f4af36e-57bf-4a4e-b8c2-cc5f628ad226/db/000074.log: No space left on device
at org.rocksdb.RocksDB.write0(Native Method) ~[flink-dist-1.17.1.jar:1.17.1]
at org.rocksdb.RocksDB.write(RocksDB.java:1784) ~[flink-dist-1.17.1.jar:1.17.1]
at org.apache.flink.contrib.streaming.state.RocksDBWriteBatchWrapper.flush(RocksDBWriteBatchWrapper.java:116) ~[flink-dist-1.17.1.jar:1.17.1]
at org.apache.flink.contrib.streaming.state.RocksDBWriteBatchWrapper.flushIfNeeded(RocksDBWriteBatchWrapper.java:138) ~[flink-dist-1.17.1.jar:1.17.1]
at org.apache.flink.contrib.streaming.state.RocksDBWriteBatchWrapper.put(RocksDBWriteBatchWrapper.java:99) ~[flink-dist-1.17.1.jar:1.17.1]
at org.apache.flink.contrib.streaming.state.restore.RocksDBFullRestoreOperation.restoreKVStateData(RocksDBFullRestoreOperation.java:153) ~[flink-dist-1.17.1.jar:1.17.1]
at org.apache.flink.contrib.streaming.state.restore.RocksDBFullRestoreOperation.applyRestoreResult(RocksDBFullRestoreOperation.java:127) ~[flink-dist-1.17.1.jar:1.17.1]
at org.apache.flink.contrib.streaming.state.restore.RocksDBFullRestoreOperation.restore(RocksDBFullRestoreOperation.java:102) ~[flink-dist-1.17.1.jar:1.17.1]
at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:329) ~[flink-dist-1.17.1.jar:1.17.1]
at org.apache.flink.contrib.streaming.state.EmbeddedRocksDBStateBackend.createKeyedStateBackend(EmbeddedRocksDBStateBackend.java:512) ~[flink-dist-1.17.1.jar:1.17.1]
at org.apache.flink.contrib.streaming.state.EmbeddedRocksDBStateBackend.createKeyedStateBackend(EmbeddedRocksDBStateBackend.java:99) ~[flink-dist-1.17.1.jar:1.17.1]
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:336) ~[flink-dist-1.17.1.jar:1.17.1]
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168) ~[flink-dist-1.17.1.jar:1.17.1]
at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135) ~[flink-dist-1.17.1.jar:1.17.1]
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:353) ~[flink-dist-1.17.1.jar:1.17.1]
at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:165) ~[flink-dist-1.17.1.jar:1.17.1]
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:256) ~[flink-dist-1.17.1.jar:1.17.1]
at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106) ~[flink-dist-1.17.1.jar:1.17.1]
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:734) ~[flink-dist-1.17.1.jar:1.17.1]
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55) ~[flink-dist-1.17.1.jar:1.17.1]
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:709) ~[flink-dist-1.17.1.jar:1.17.1]
at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:675) ~[flink-dist-1.17.1.jar:1.17.1]
at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:952) [flink-dist-1.17.1.jar:1.17.1]
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:921) [flink-dist-1.17.1.jar:1.17.1]
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:745) [flink-dist-1.17.1.jar:1.17.1]
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562) [flink-dist-1.17.1.jar:1.17.1]
at java.lang.Thread.run(Unknown Source) [?:?]
flink-sql rocksdb
1个回答
0
投票

当您从保存点重新启动作业时,RocksDB 状态后端必须从头开始重新创建数据库,这涉及重新创建所有 SST 文件。而从检查点重新启动时,检查点具有必要的 SST 文件,Flink 可以直接使用它们。

© www.soinside.com 2019 - 2024. All rights reserved.