如何从Shell脚本捕获Spark错误

问题描述 投票:1回答:1

我在AWS Data Pipeline中有一个运行名为shell.sh的shell脚本的管道:

$ spark-submit transform_json.py


Running command on cluster...
[54.144.10.162] Running command...
[52.206.87.30] Running command...
[54.144.10.162] Command complete.
[52.206.87.30] Command complete.
run_command finished in 0:00:06.

AWS Data Pipeline控制台说作业已“完成”,但在stderr日志中我看到该作业实际上已中止:

Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 404, AWS Service: Amazon S3, AWS Request ID: xxxxx, AWS Error Code: null, AWS Error Message: Not Found...        
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 1 times, most recent failure: Lost task 0.0 in stage 5.0 (TID 5, localhost, executor driver): org.apache.spark.SparkException: Task failed while writing rows.
    ...
        20/05/22 11:42:47 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
        20/05/22 11:42:47 INFO MemoryStore: MemoryStore cleared
        20/05/22 11:42:47 INFO BlockManager: BlockManager stopped
        20/05/22 11:42:47 INFO BlockManagerMaster: BlockManagerMaster stopped
        20/05/22 11:42:47 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
        20/05/22 11:42:47 INFO SparkContext: Successfully stopped SparkContext
        20/05/22 11:42:47 INFO ShutdownHookManager: Shutdown hook called

我对数据管道和Spark有点陌生;无法掩盖幕后实际发生的事情。如何获取shell脚本以捕获SparkException

amazon-web-services apache-spark amazon-data-pipeline
1个回答
2
投票

请尝试以下示例...

您的shell脚本可以捕获这样的错误代码...,其中非零退出代码是错误的

$?是最近执行的命令的退出状态;按照惯例,0表示成功,其他表示失败。


spark-submit transform_json.py


 ret_code=$?
   if [ $ret_code -ne 0 ]; then 
      exit $ret_code
   fi

您必须编写代码以在错误情况下通过sys.exit(-1)返回退出代码。检查此Python异常处理...

检查此Exit codes in Python

© www.soinside.com 2019 - 2024. All rights reserved.