Spark作业失败:storage.DiskBlockObjectWriter:恢复对文件的部分写入时未捕获的异常

问题描述 投票:4回答:2

我有一个在Yarn上运行的Spark(1.4.1)应用程序,它使用以下执行程序日志条目失败:

16/07/21 23:09:08 ERROR executor.CoarseGrainedExecutorBackend: Driver 9.4.136.20:55995 disassociated! Shutting down.
16/07/21 23:09:08 ERROR storage.DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /dfs1/hadoop/yarn/local/usercache/mitchus/appcache/application_1465987751317_1172/blockmgr-f367f43b-f4c8-4faf-a829-530da30fb040/1c/temp_shuffle_581adb36-1561-4db8-a556-c4ac0e6400ed
java.io.FileNotFoundException: /dfs1/hadoop/yarn/local/usercache/mitchus/appcache/application_1465987751317_1172/blockmgr-f367f43b-f4c8-4faf-a829-530da30fb040/1c/temp_shuffle_581adb36-1561-4db8-a556-c4ac0e6400ed (No such file or directory)
    at java.io.FileOutputStream.open0(Native Method)
    at java.io.FileOutputStream.open(FileOutputStream.java:270)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
    at org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(BlockObjectWriter.scala:189)
    at org.apache.spark.util.collection.ExternalSorter.spillToMergeableFile(ExternalSorter.scala:328)
    at org.apache.spark.util.collection.ExternalSorter.spill(ExternalSorter.scala:257)
    at org.apache.spark.util.collection.ExternalSorter.spill(ExternalSorter.scala:95)
    at org.apache.spark.util.collection.Spillable$class.maybeSpill(Spillable.scala:83)
    at org.apache.spark.util.collection.ExternalSorter.maybeSpill(ExternalSorter.scala:95)
    at org.apache.spark.util.collection.ExternalSorter.maybeSpillCollection(ExternalSorter.scala:240)
    at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:220)
    at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    at org.apache.spark.scheduler.Task.run(Task.scala:70)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

什么可能出错的线索?

apache-spark yarn
2个回答
0
投票

你可以尝试改善spark.yarn.executor.memoryOverhead


0
投票

temp shuffle文件导致的原因被删除。有很多原因,因为我遇到的是因为另一个执行人被Yarn杀死了。执行程序终止后,SHUT_DOWN信号将被发送到其他执行程序,然后ShutdownHookManager将删除已注册到ShutdownHookManager的所有临时文件。这就是你看错的原因。所以你可能需要检查是否有任何名为log的ShutdownHookManager。

© www.soinside.com 2019 - 2024. All rights reserved.