我的项目中的jar与spark-2.4.0 jars文件夹中的jar存在冲突。我的Retrofit带来了okhttp-3.13.1.jar
(已在mvndependency:tree中验证),但服务器中的spark有okhttp-3.8.1.jar
,我得到了NoSuchMethodException
。因此,我正在尝试明确给我的jar重写它。当我尝试在client
模式下运行spark Submit命令时,它将拾取我提供的显式jar,但是当我尝试在cluster
模式下运行该显式jar时,此操作将无法覆盖工作节点上的jar,执行程序使用相同的旧罐子Spark产生了NoSuchMethodError
。我的罐子是胖罐子,但火花罐以某种方式优先于同一个罐子。如果我可以删除Spark提供的jar,它可能会起作用,但是我不能这样做,因为其他服务可能正在使用它。
以下是我的命令:
./spark-submit --class com.myJob --conf spark.yarn.appMasterEnv.ENV=uat --conf spark.driver.memory=12g --conf spark.executor.memory=40g --conf spark.sql.warehouse.dir=/user/myuser/spark-warehouse --conf "spark.driver.extraClassPath=/home/test/okhttp-3.13.1.jar" --conf "spark.executor.extraClassPath=/home/test/okhttp-3.13.1.jar" --jars /home/test/okhttp-3.13.1.jar --conf spark.submit.deployMode=cluster --conf spark.yarn.archive=hdfs://namenode/frameworks/spark/spark-2.4.0-archives/spark-2.4.0-archive.zip --conf spark.master=yarn --conf spark.executor.cores=4 --queue public file:///home/mytest/myjar-SNAPSHOT.jar
final Retrofit retrofit = new Retrofit.Builder()
.baseUrl(configuration.ApiUrl()) //this throws nosuchmethodexception
.addConverterFactory(JacksonConverterFactory.create(new ObjectMapper()))
.build();
我的mvn依赖项:树未指示我的jar中的任何其他可传递jar。而且它可以在IntelliJ以及mvn clean install
的本地环境中正常运行。
我什至尝试提供jars(hdfs://users/myuser/myjars/okhttp-3.13.1.jar
)的hdfs路径也没有运气。有人可以给点光吗?
如果同时尝试--conf "spark.driver.userClassPathFirst=true" --conf "spark.executor.userClassPathFirst=true"
,则会出现以下异常
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.<init>(YarnSparkHadoopUtil.scala:48)
at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.<clinit>(YarnSparkHadoopUtil.scala)
at org.apache.spark.deploy.yarn.Client$$anonfun$1.apply$mcJ$sp(Client.scala:81)
at org.apache.spark.deploy.yarn.Client$$anonfun$1.apply(Client.scala:81)
at org.apache.spark.deploy.yarn.Client$$anonfun$1.apply(Client.scala:81)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.deploy.yarn.Client.<init>(Client.scala:80)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1526)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassCastException: org.apache.hadoop.yarn.api.records.impl.pb.PriorityPBImpl cannot be cast to org.apache.hadoop.yarn.api.records.Priority
at org.apache.hadoop.yarn.api.records.Priority.newInstance(Priority.java:39)
at org.apache.hadoop.yarn.api.records.Priority.<clinit>(Priority.java:34)
... 15 more
但是如果我只有--conf "spark.executor.userClassPathFirst=true"
则挂起
我已经使用maven shade plugin解决了这个问题。
参考视频:
https://youtu.be/WyfHUNnMutg?t=23m1s
我按照此处给出的答案并添加了以下内容。即使在源代码SparkSubmit中,如果我们给--jar,您也会看到jar被追加到总jar列表中,因此它永远不会被这些选项覆盖,但会添加jar。
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<relocations>
<relocation>
<pattern>okio</pattern>
<shadedPattern>com.shaded.okio</shadedPattern>
</relocation>
<relocation>
<pattern>okhttp3</pattern>
<shadedPattern>com.shaded.okhttp3</shadedPattern>
</relocation>
</relocations>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
<exclude>log4j.properties</exclude>
</excludes>
</filter>
</filters>
</configuration>
</execution>
</executions>
</plugin>