Apache Spark驱动程序日志未指定阶段取消的原因

问题描述 投票:1回答:1

我在YARN的AWS EMR上运行Apache Spark。该集群有1个主节点和10个执行者。经过几个小时的处理,我的集群失败了,我去查看日志。

所以,我看到所有正在执行的执行者都试图一次杀死任务(这是某人的执行者的日志:]

20/03/05 00:02:12 INFO Executor: Executor is trying to kill task 66.0 in stage 2.0 (TID 466), reason: Stage cancelled
20/03/05 00:02:12 INFO Executor: Executor is trying to kill task 65.0 in stage 2.0 (TID 465), reason: Stage cancelled
20/03/05 00:02:12 INFO Executor: Executor is trying to kill task 67.0 in stage 2.0 (TID 467), reason: Stage cancelled
20/03/05 00:02:12 INFO Executor: Executor is trying to kill task 64.0 in stage 2.0 (TID 464), reason: Stage cancelled
20/03/05 00:02:12 ERROR Utils: Aborting a task

我知道原因是Stage cancelled,但我无法获得任何详细信息。我看到驱动程序日志,发现它们在更早的时间拥有最后一条记录。

所以我有2个问题:

  • 为什么驱动程序日志比执行者日志短得多?
  • 我如何获得取消舞台的真正原因?
20/03/04 18:39:40 INFO TaskSetManager: Starting task 159.0 in stage 1.0 (TID 359, ip-172-31-6-236.us-west-2.compute.internal, executor 40, partition 159, RACK_LOCAL, 8421 bytes)
20/03/04 18:39:40 INFO ExecutorAllocationManager: New executor 40 has registered (new total is 40)
20/03/04 18:39:41 INFO BlockManagerMasterEndpoint: Registering block manager ip-172-31-6-236.us-west-2.compute.internal:33589 with 2.8 GB RAM, BlockManagerId(40, ip-172-31-6-236.us-west-2.compute.internal, 33589, None)
20/03/04 18:39:42 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on ip-172-31-6-236.us-west-2.compute.internal:33589 (size: 44.7 KB, free: 2.8 GB)
20/03/04 18:39:48 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on ip-172-31-6-236.us-west-2.compute.internal:33589 (size: 37.4 KB, free: 2.8 GB)
apache-spark yarn amazon-emr
1个回答
0
投票

您找到答案了吗?我有同样的问题

© www.soinside.com 2019 - 2024. All rights reserved.