因此,我开始编写自己的包含Apache Spark动作的Oozie工作流,尽管我清楚地用Scala 2.11.8和Spark 2.3.0将源代码打包了,但该纱线说。
java.lang.NoSuchMethodError:scala.collection.immutable。$ colon $ colon.hd $ 1()Ljava / lang / Object;
[有一个包含Hortonworks HDP Sandbox的docker容器,该容器在ec2机器上运行,具有16个核心cpu和41个内存。我已经使用命令行更新了Oozie共享库。以下是我的job.properties文件和workflow.xml文件。
job.properties
jobTracker=sandbox-hdp.hortonworks.com:8032
master=yarn-cluster
oozie.action.sharelib.for.spark=spark2
oozie.action.sharelib.for.spark.exclusion=oozie/jackson
# Time and schedule details
start_date=2015-01-01T00:00Z
end_date=2015-06-30T00:00Z
frequency=55 23 L * ?
nameNode=hdfs://sandbox-hdp.hortonworks.com:8020
# Workflow to run
wf_application_path=hdfs://sandbox-hdp.hortonworks.com:8020/user/hue/learn_oozie/ch08/spark_rainfall
# Coordinator to run
oozie.coord.application.path=hdfs://sandbox-hdp.hortonworks.com:8020/user/hue/learn_oozie/ch08/spark_rainfall
# Datasets
data_definitions=hdfs://sandbox-hdp.hortonworks.com:8020/user/hue/learn_oozie/ch08/datasets/datasets.xml
# Controls
timeout=10
concurrency_level=1
execution_order=FIFO
materialization_throttle=1
workflow.xml
<workflow-app name="ch08_spark_max_rainfall" xmlns="uri:oozie:workflow:0.5">
<start to="max_rainfall"/>
<action name="max_rainfall">
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<master>${master}</master>
<name>"Spark Ch08 Max Rain Calculator"</name>
<class>life.jugnu.learnoozie.ch08.MaxRainfall</class>
<jar>hdfs://sandbox-hdp.hortonworks.com:8020/user/hue/learn_oozie/ch08/rainbow/target/scala-2.11/rainbow_2.11-1.0.14.jar</jar>
<spark-opts>
--conf spark.yarn.historyServer.address=http://sandbox-hdp.hortonworks.com:18088
--conf spark.eventLog.dir=hdfs://sandbox-hdp.hortonworks.com:8020/user/spark/applicationHistory
--conf spark.eventLog.enabled=true
</spark-opts>
<arg>${input}</arg>
<arg>${output}</arg>
</spark>
<ok to="End"/>
<error to="Kill"/>
</action>
<kill name="Kill">
<message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="End"/>
</workflow-app>
我希望此工作流运行不会出现任何问题,并将结果写入远程集群的HDFS中,但是星火行动被杀死,并显示以下错误消息。
线程“主”中的异常java.lang.NoSuchMethodError:scala.collection.immutable。$ colon $ colon.hd $ 1()Ljava / lang / Object;在org.apache.spark.deploy.yarn.ApplicationMasterArguments.parseArgs(ApplicationMasterArguments.scala:45)在org.apache.spark.deploy.yarn.ApplicationMasterArguments。(ApplicationMasterArguments.scala:34)在org.apache.spark.deploy.yarn.ApplicationMaster $ .main(ApplicationMaster.scala:576)在org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
我有同样的问题。这是因为我们在集群上使用的是Oozie 4.1.0版本。 Spark操作仅在4.2.0+中可用。