当数据从生产者推送时,Kafka Node间歇性崩溃

问题描述 投票:0回答:1

我们的环境中有一个3节点的Kafka集群(版本5.2.1,apache kafka版本:2.2.0)。有一段时间,我们一直观察到一个异常,每当我们试图从测试生产者推送数据时,这个异常就会间歇性地发生。以下是这个异常。

[Log partition=debug-topic-1, dir=/tmp/kafka-logs] Found deletable segments with base offsets [4] due to retention time 604800000ms breach (kafka.log.Log:66)
[2020-04-20 22:42:39,303] INFO [ProducerStateManager partition=debug-topic-1] Writing producer snapshot at offset 5 (kafka.log.ProducerStateManager:66)
[2020-04-20 22:42:39,304] INFO [Log partition=debug-topic-1, dir=/tmp/kafka-logs] Rolled new log segment at offset 5 in 1 ms. (kafka.log.Log:66)
[2020-04-20 22:42:39,304] INFO [Log partition=debug-topic-1, dir=/tmp/kafka-logs] Scheduling log segment [baseOffset 4, size 84] for deletion. (kafka.log.Log:66)
[2020-04-20 22:42:39,310] ERROR Error while deleting segments for debug-topic-1 in dir /tmp/kafka-logs (kafka.server.LogDirFailureChannel:76)
java.nio.file.NoSuchFileException: /tmp/kafka-logs/debug-topic-1/00000000000000000004.log
    at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
    at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
    at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
    at java.nio.file.Files.move(Files.java:1395)
    at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:805)
    at org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:224)
    at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:488)
    at kafka.log.Log.asyncDeleteSegment(Log.scala:1924)
    at kafka.log.Log.deleteSegment(Log.scala:1909)
    at kafka.log.Log.$anonfun$deleteSegments$3(Log.scala:1455)
    at kafka.log.Log.$anonfun$deleteSegments$3$adapted(Log.scala:1455)
    at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
    at kafka.log.Log.$anonfun$deleteSegments$2(Log.scala:1455)
    at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:23)
    at kafka.log.Log.maybeHandleIOException(Log.scala:2013)
    at kafka.log.Log.deleteSegments(Log.scala:1446)
    at kafka.log.Log.deleteOldSegments(Log.scala:1441)
    at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1519)
    at kafka.log.Log.deleteOldSegments(Log.scala:1509)
    at kafka.log.LogManager.$anonfun$cleanupLogs$3(LogManager.scala:913)
    at kafka.log.LogManager.$anonfun$cleanupLogs$3$adapted(LogManager.scala:910)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at kafka.log.LogManager.cleanupLogs(LogManager.scala:910)
    at kafka.log.LogManager.$anonfun$startup$2(LogManager.scala:395)
    at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114)
    at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:63)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    Suppressed: java.nio.file.NoSuchFileException: /tmp/kafka-logs/debug-topic-1/00000000000000000004.log -> /tmp/kafka-logs/debug-topic-1/00000000000000000004.log.deleted
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
        at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
        at java.nio.file.Files.move(Files.java:1395)
        at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:802)
        ... 30 more
[2020-04-20 22:42:39,311] ERROR Uncaught exception in scheduled task 'kafka-log-retention' (kafka.utils.KafkaScheduler:76)
org.apache.kafka.common.errors.KafkaStorageException: Error while deleting segments for debug-topic-1 in dir /tmp/kafka-logs
Caused by: java.nio.file.NoSuchFileException: /tmp/kafka-logs/debug-topic-1/00000000000000000004.log
    at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
    at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:409)
    at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
    at java.nio.file.Files.move(Files.java:1395)
    at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:805)
    at org.apache.kafka.common.record.FileRecords.renameTo(FileRecords.java:224)
    at kafka.log.LogSegment.changeFileSuffixes(LogSegment.scala:488)
    at kafka.log.Log.asyncDeleteSegment(Log.scala:1924)
    at kafka.log.Log.deleteSegment(Log.scala:1909)
    at kafka.log.Log.$anonfun$deleteSegments$3(Log.scala:1455)
    at kafka.log.Log.$anonfun$deleteSegments$3$adapted(Log.scala:1455)
    at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
    at kafka.log.Log.$anonfun$deleteSegments$2(Log.scala:1455)
    at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:23)
    at kafka.log.Log.maybeHandleIOException(Log.scala:2013)
    at kafka.log.Log.deleteSegments(Log.scala:1446)
    at kafka.log.Log.deleteOldSegments(Log.scala:1441)
    at kafka.log.Log.deleteRetentionMsBreachedSegments(Log.scala:1519)
    at kafka.log.Log.deleteOldSegments(Log.scala:1509)
    at kafka.log.LogManager.$anonfun$cleanupLogs$3(LogManager.scala:913)
    at kafka.log.LogManager.$anonfun$cleanupLogs$3$adapted(LogManager.scala:910)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at kafka.log.LogManager.cleanupLogs(LogManager.scala:910)
    at kafka.log.LogManager.$anonfun$startup$2(LogManager.scala:395)
    at kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:114)
    at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:63)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
    Suppressed: java.nio.file.NoSuchFileException: /tmp/kafka-logs/debug-topic-1/00000000000000000004.log -> /tmp/kafka-logs/debug-topic-1/00000000000000000004.log.deleted
        at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:396)
        at sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:262)
        at java.nio.file.Files.move(Files.java:1395)
        at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:802)
        ... 30 more

我们有其他的生产者,他们不断地将数据推送到集群中的不同主题,但上述问题从未发生。我已经尝试删除并重新创建这个主题。调试主题-1 数次,以确保zookeeper和kakfa日志中没有损坏或错误的状态。但是这个问题还是会在一段时间后发生。如果有人遇到过类似的问题,并且能够解决,请告诉我。

apache-kafka kafka-producer-api
1个回答
0
投票

看来你的机器可能重启了,或者是 /tmp 是用其他方式清除的。

必须 换卡夫卡 log.dirs (和Zookeeper dataDir)至 不用 /tmp

© www.soinside.com 2019 - 2024. All rights reserved.