Spark 流作业在运行几天后失败

问题描述 投票:0回答:1

我面临这个问题,我的 Spark Streaming 作业在运行几天后不断失败,并出现以下错误:

AM Container for appattempt_1610108774021_0354_000001 exited with exitCode: -104
Failing this attempt.Diagnostics: Container [pid=31537,containerID=container_1610108774021_0354_01_000001] is running beyond physical memory limits. Current usage: 5.8 GB of 5.5 GB physical memory used; 8.0 GB of 27.3 GB virtual memory used. Killing container.
Dump of the process-tree for container_1610108774021_0354_01_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 31742 31537 31537 31537 (java) 1583676 58530 8499392512 1507368 /usr/lib/jvm/java-openjdk/bin/java -server -Xmx5078m -
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

火花提交:

spark-submit --name DWH-CDC-commonJob --deploy-mode cluster --master yarn --conf spark.sql.shuffle.partitions=10 --conf  spark.eventLog.enabled=false --conf spark.sql.caseSensitive=true --conf spark.driver.memory=5078M --class com.aos.Loader --jars file:////home/hadoop/lib/* --executor-memory 5000M  --conf "spark.alert.duration=4" --conf spark.dynamicAllocation.enabled=false --num-executors 3 --files /home/hadoop/log4j.properties,/home/hadoop/application.conf --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties" streams_2.11-1.0.jar application.conf

已尝试增加spark.executor.memoryOverhead,但几天后失败,我想了解我们如何才能达到可以在没有任何中断的情况下运行的数量。或者我还缺少其他任何配置吗? 火花2.4版本 AWS 电子病历:5.23 斯卡拉:2.11.12 两个数据节点(vCPU 4,每个 16 GB 内存)。

scala apache-spark spark-streaming
1个回答
0
投票

NM 正在杀死你的工作,检查节点 hdfs 磁盘使用情况。您是否正在整理大量数据?还要检查您是否使用容器日志轰炸 hdfs。 另请检查这是否是动态分配或静态分配的一部分。

© www.soinside.com 2019 - 2024. All rights reserved.