如何在执行该应用程序后终止的Amazon EMR中将PySpark应用程序作为步骤执行运行?

问题描述 投票:1回答:1
Spark version 2.4.5

我有需要在S3存储桶中处理的文件。 (s3a://tobeprocessed

我有一个pyspark应用程序,该应用程序从S3存储桶读取文件并将输出写入另一个S3存储桶(s3://processed)。

我打算在我的emr群集中将其作为步进功能运行。

我以前从终端执行以下命令来向集群添加步骤。

aws emr add-steps --cluster-id j-xxxxxx --steps Name=etlapp,Jar=command-runner.jar,Args=[spark-submit,--deploy-mode,cluster,--master,yarn,--conf,spark.yarn.submit.waitAppCompletion=true,s3://bucketname/spark_app.py,s3://bucketname/configuration_file.cfg],ActionOnFailure=CONTINUE

我收到这样的错误消息

STDERR

20/03/10 19:50:46 INFO RMProxy: Connecting to ResourceManager at ip-172-31-27-34.ec2.internal/172.31.27.34:8032
20/03/10 19:50:47 INFO Client: Requesting a new application from cluster with 2 NodeManagers
20/03/10 19:50:47 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (12288 MB per container)
20/03/10 19:50:47 INFO Client: Will allocate AM container, with 2432 MB memory including 384 MB overhead
20/03/10 19:50:47 INFO Client: Setting up container launch context for our AM
20/03/10 19:50:47 INFO Client: Setting up the launch environment for our AM container
20/03/10 19:50:47 INFO Client: Preparing resources for our AM container
20/03/10 19:50:47 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
20/03/10 19:50:49 INFO Client: Uploading resource file:/mnt/tmp/spark-4c4ea7ac-b2bb-4a61-929d-c371d87417ff/__spark_libs__2224504543987850085.zip -> hdfs://ip-172-31-27-34.ec2.internal:8020/user/hadoop/.sparkStaging/application_1583867709817_0003/__spark_libs__2224504543987850085.zip
20/03/10 19:50:50 INFO ClientConfigurationFactory: Set initial getObject socket timeout to 2000 ms.
20/03/10 19:50:50 INFO Client: Uploading resource s3://imdbetlapp/complete_etl.py -> hdfs://ip-172-31-27-34.ec2.internal:8020/user/hadoop/.sparkStaging/application_1583867709817_0003/complete_etl.py
20/03/10 19:50:51 INFO S3NativeFileSystem: Opening 's3://imdbetlapp/complete_etl.py' for reading
20/03/10 19:50:51 INFO Client: Uploading resource file:/usr/lib/spark/python/lib/pyspark.zip -> hdfs://ip-172-31-27-34.ec2.internal:8020/user/hadoop/.sparkStaging/application_1583867709817_0003/pyspark.zip
20/03/10 19:50:51 INFO Client: Uploading resource file:/usr/lib/spark/python/lib/py4j-0.10.7-src.zip -> hdfs://ip-172-31-27-34.ec2.internal:8020/user/hadoop/.sparkStaging/application_1583867709817_0003/py4j-0.10.7-src.zip
20/03/10 19:50:52 INFO Client: Uploading resource file:/mnt/tmp/spark-4c4ea7ac-b2bb-4a61-929d-c371d87417ff/__spark_conf__476112427502500805.zip -> hdfs://ip-172-31-27-34.ec2.internal:8020/user/hadoop/.sparkStaging/application_1583867709817_0003/__spark_conf__.zip
20/03/10 19:50:52 INFO SecurityManager: Changing view acls to: hadoop
20/03/10 19:50:52 INFO SecurityManager: Changing modify acls to: hadoop
20/03/10 19:50:52 INFO SecurityManager: Changing view acls groups to: 
20/03/10 19:50:52 INFO SecurityManager: Changing modify acls groups to: 
20/03/10 19:50:52 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hadoop); groups with view permissions: Set(); users  with modify permissions: Set(hadoop); groups with modify permissions: Set()
20/03/10 19:50:53 INFO Client: Submitting application application_1583867709817_0003 to ResourceManager
20/03/10 19:50:53 INFO YarnClientImpl: Submitted application application_1583867709817_0003
20/03/10 19:50:54 INFO Client: Application report for application_1583867709817_0003 (state: ACCEPTED)
20/03/10 19:50:54 INFO Client: 
     client token: N/A
     diagnostics: AM container is launched, waiting for AM container to Register with RM
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1583869853550
     final status: UNDEFINED
     tracking URL: http://ip-172-31-27-34.ec2.internal:20888/proxy/application_1583867709817_0003/
     user: hadoop
20/03/10 19:50:55 INFO Client: Application report for application_1583867709817_0003 (state: ACCEPTED)
20/03/10 19:50:56 INFO Client: Application report for application_1583867709817_0003 (state: ACCEPTED)
20/03/10 19:50:57 INFO Client: Application report for application_1583867709817_0003 (state: ACCEPTED)
20/03/10 19:50:58 INFO Client: Application report for application_1583867709817_0003 (state: ACCEPTED)
20/03/10 19:50:59 INFO Client: Application report for application_1583867709817_0003 (state: ACCEPTED)
20/03/10 19:51:00 INFO Client: Application report for application_1583867709817_0003 (state: ACCEPTED)
20/03/10 19:51:01 INFO Client: Application report for application_1583867709817_0003 (state: ACCEPTED)
20/03/10 19:51:02 INFO Client: Application report for application_1583867709817_0003 (state: FAILED)
20/03/10 19:51:02 INFO Client: 
     client token: N/A
     diagnostics: Application application_1583867709817_0003 failed 2 times due to AM Container for appattempt_1583867709817_0003_000002 exited with  exitCode: 13
Failing this attempt.Diagnostics: Exception from container-launch.
Container id: container_1583867709817_0003_02_000001
Exit code: 13
Stack trace: ExitCodeException exitCode=13: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
    at org.apache.hadoop.util.Shell.run(Shell.java:869)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)


Container exited with a non-zero exit code 13
For more detailed output, check the application tracking page: http://ip-172-31-27-34.ec2.internal:8088/cluster/app/application_1583867709817_0003 Then click on links to logs of each attempt.
. Failing the application.
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1583869853550
     final status: FAILED
     tracking URL: http://ip-172-31-27-34.ec2.internal:8088/cluster/app/application_1583867709817_0003
     user: hadoop
20/03/10 19:51:02 ERROR Client: Application diagnostics message: Application application_1583867709817_0003 failed 2 times due to AM Container for appattempt_1583867709817_0003_000002 exited with  exitCode: 13
Failing this attempt.Diagnostics: Exception from container-launch.
Container id: container_1583867709817_0003_02_000001
Exit code: 13
Stack trace: ExitCodeException exitCode=13: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
    at org.apache.hadoop.util.Shell.run(Shell.java:869)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:235)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)


Container exited with a non-zero exit code 13
For more detailed output, check the application tracking page: http://ip-172-31-27-34.ec2.internal:8088/cluster/app/application_1583867709817_0003 Then click on links to logs of each attempt.
. Failing the application.
Exception in thread "main" org.apache.spark.SparkException: Application application_1583867709817_0003 finished with failed status
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1149)
    at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1526)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:928)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:937)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
20/03/10 19:51:02 INFO ShutdownHookManager: Shutdown hook called
20/03/10 19:51:02 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-4c4ea7ac-b2bb-4a61-929d-c371d87417ff
20/03/10 19:51:02 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-a72b6dba-91bb-46b0-b2c3-893ac3b8581f
Command exiting with ret '1'

Controller

2020-03-10T19:42:51.930Z INFO Ensure step 5 jar file command-runner.jar
2020-03-10T19:42:51.930Z INFO StepRunner: Created Runner for step 5
INFO startExec 'hadoop jar /var/lib/aws/emr/step-runner/hadoop-jars/command-runner.jar spark-submit --deploy-mode cluster --master yarn --conf spark.yarn.submit.waitAppCompletion=true s3://imdbetlapp/complete_etl.py s3://imdbetlapp/Imdb_Etl/aws_config.cfg'
INFO Environment:
  PATH=/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/sbin:/opt/aws/bin
  LESS_TERMCAP_md=[01;38;5;208m
  LESS_TERMCAP_me=[0m
  HISTCONTROL=ignoredups
  LESS_TERMCAP_mb=[01;31m
  AWS_AUTO_SCALING_HOME=/opt/aws/apitools/as
  UPSTART_JOB=rc
  LESS_TERMCAP_se=[0m
  HISTSIZE=1000
  HADOOP_ROOT_LOGGER=INFO,DRFA
  JAVA_HOME=/etc/alternatives/jre
  AWS_DEFAULT_REGION=us-east-1
  AWS_ELB_HOME=/opt/aws/apitools/elb
  LESS_TERMCAP_us=[04;38;5;111m
  EC2_HOME=/opt/aws/apitools/ec2
  TERM=linux
  runlevel=3
  LANG=en_US.UTF-8
  AWS_CLOUDWATCH_HOME=/opt/aws/apitools/mon
  MAIL=/var/spool/mail/hadoop
  LESS_TERMCAP_ue=[0m
  LOGNAME=hadoop
  PWD=/
  LANGSH_SOURCED=1
  HADOOP_CLIENT_OPTS=-Djava.io.tmpdir=/mnt/var/lib/hadoop/steps/s-2R9E51P9Z68JU/tmp
  _=/etc/alternatives/jre/bin/java
  CONSOLETYPE=serial
  RUNLEVEL=3
  LESSOPEN=||/usr/bin/lesspipe.sh %s
  previous=N
  UPSTART_EVENTS=runlevel
  AWS_PATH=/opt/aws
  USER=hadoop
  UPSTART_INSTANCE=
  PREVLEVEL=N
  HADOOP_LOGFILE=syslog
  PYTHON_INSTALL_LAYOUT=amzn
  HOSTNAME=ip-172-31-27-34
  HADOOP_LOG_DIR=/mnt/var/log/hadoop/steps/s-2R9E51P9Z68JU
  EC2_AMITOOL_HOME=/opt/aws/amitools/ec2
  EMR_STEP_ID=s-2R9E51P9Z68JU
  SHLVL=5
  HOME=/home/hadoop
  HADOOP_IDENT_STRING=hadoop
INFO redirectOutput to /mnt/var/log/hadoop/steps/s-2R9E51P9Z68JU/stdout
INFO redirectError to /mnt/var/log/hadoop/steps/s-2R9E51P9Z68JU/stderr
INFO Working dir /mnt/var/lib/hadoop/steps/s-2R9E51P9Z68JU
INFO ProcessRunner started child process 21732
2020-03-10T19:42:51.932Z INFO HadoopJarStepRunner.Runner: startRun() called for s-2R9E51P9Z68JU Child Pid: 21732
INFO Synchronously wait child process to complete : hadoop jar /var/lib/aws/emr/step-runner/hadoop-...
INFO waitProcessCompletion ended with exit code 1 : hadoop jar /var/lib/aws/emr/step-runner/hadoop-...
INFO total process run time: 22 seconds
2020-03-10T19:43:14.033Z INFO Step created jobs: 
2020-03-10T19:43:14.033Z WARN Step failed with exitCode 1 and took 22 seconds

谁能告诉我我错过了什么?

amazon-web-services apache-spark hadoop pyspark amazon-emr
1个回答
0
投票

在控制台主页上,单击创建集群,将显示一个页面。在顶部,有一个选项“转到高级选项”。在那里,您可以在“完成最后一步”之后找到“自动终止”选项enter image description here

© www.soinside.com 2019 - 2024. All rights reserved.