如何在AWS EMR上使用sbt程序集在Scala中创建单个jar?遇到重复数据删除:在以下文件中找到了不同的文件内容:错误

问题描述 投票:1回答:1

我在刚刚站起来的AWS EMR集群上,有一个scala文件可以编译,我想将其构建到一个程序集中。但是,当我发布sbt程序集时,会遇到重复数据删除错误。

https://medium.com/@tedherman/compile-scala-on-emr-cb77610559f0我本来有一个指向我的lib到usr lib spark jar的符号链接;

ln -s /usr/lib/spark/jars lib

尽管我已经注意到我的代码通过了sbt编译,无论是否带有此代码。我很困惑为什么/如何解决sbt程序集欺骗错误。我还将注意到,根据文章,我在引导操作中安装了sbt。

[带符号链接

某些重复数据删除似乎是完全相同的重复数据;例如:

[error] deduplicate: different file contents found in the following:
[error] /home/hadoop/.ivy2/cache/org.apache.parquet/parquet-jackson/jars/parquet-jackson-1.10.1.jar:shaded/parquet/org/codehaus/jackson/util/CharTypes.class
[error] /usr/lib/spark/jars/parquet-jackson-1.10.1-spark-amzn-1.jar:shaded/parquet/org/codehaus/jackson/util/CharTypes.class

其他人似乎是相互竞争的版本;

[error] deduplicate: different file contents found in the following:
[error] /home/hadoop/.ivy2/cache/org.apache.spark/spark-core_2.11/jars/spark-core_2.11-2.4.3.jar:org/spark_project/jetty/util/MultiPartOutputStream.class
[error] /usr/lib/spark/jars/spark-core_2.11-2.4.5-amzn-0.jar:org/spark_project/jetty/util/MultiPartOutputStream.class

我不明白为什么会有竞争版本;或默认情况下是这样,还是我做了一些介绍。

没有符号链接

我以为,如果删除此内容,我的问题会更少;尽管我还有骗子(少一些);

[error] deduplicate: different file contents found in the following:
[error] /home/hadoop/.ivy2/cache/org.apache.hadoop/hadoop-yarn-api/jars/hadoop-yarn-api-2.6.5.jar:org/apache/hadoop/yarn/factory/providers/package-info.class
[error] /home/hadoop/.ivy2/cache/org.apache.hadoop/hadoop-yarn-common/jars/hadoop-yarn-common-2.6.5.jar:org/apache/hadoop/yarn/factory/providers/package-info.class

我不明白为什么以上是一个骗子,考虑到一个是hadoop-yarn-api-2.6.5.jar,另一个是hadoop-yarn-common-2.6.5.jar。为何使用不同的名称?

其他似乎是版本;

[error] deduplicate: different file contents found in the following:
[error] /home/hadoop/.ivy2/cache/javax.inject/javax.inject/jars/javax.inject-1.jar:javax/inject/Named.class
[error] /home/hadoop/.ivy2/cache/org.glassfish.hk2.external/javax.inject/jars/javax.inject-2.4.0-b34.jar:javax/inject/Named.class

有些文件名相同,但路径/罐子不同...

[error] deduplicate: different file contents found in the following:
[error] /home/hadoop/.ivy2/cache/org.apache.arrow/arrow-format/jars/arrow-format-0.10.0.jar:git.properties
[error] /home/hadoop/.ivy2/cache/org.apache.arrow/arrow-memory/jars/arrow-memory-0.10.0.jar:git.properties
[error] /home/hadoop/.ivy2/cache/org.apache.arrow/arrow-vector/jars/arrow-vector-0.10.0.jar:git.properties

与这些相同...

[error] deduplicate: different file contents found in the following:
[error] /home/hadoop/.ivy2/cache/org.apache.spark/spark-catalyst_2.11/jars/spark-catalyst_2.11-2.4.3.jar:org/apache/spark/unused/UnusedStubClass.class
[error] /home/hadoop/.ivy2/cache/org.apache.spark/spark-core_2.11/jars/spark-core_2.11-2.4.3.jar:org/apache/spark/unused/UnusedStubClass.class
[error] /home/hadoop/.ivy2/cache/org.apache.spark/spark-graphx_2.11/jars/spark-graphx_2.11-2.4.3.jar:org/apache/spark/unused/UnusedStubClass.class

供参考,一些其他信息

导入到我的Scala对象中

import org.apache.spark.sql.SparkSession
import java.time.LocalDateTime
import com.amazonaws.regions.Regions
import com.amazonaws.services.secretsmanager.AWSSecretsManagerClientBuilder
import com.amazonaws.services.secretsmanager.model.GetSecretValueRequest
import org.json4s.{DefaultFormats, MappingException}
import org.json4s.jackson.JsonMethods._
import com.datarobot.prediction.spark.Predictors.{getPredictorFromServer, getPredictor}

我的build.sbt

libraryDependencies ++= Seq(
  "net.snowflake" % "snowflake-jdbc" % "3.12.5",
  "net.snowflake" % "spark-snowflake_2.11" % "2.7.1-spark_2.4",
  "com.datarobot" % "scoring-code-spark-api_2.4.3" % "0.0.19",
  "com.datarobot" % "datarobot-prediction" % "2.1.4",
  "com.amazonaws" % "aws-java-sdk-secretsmanager" % "1.11.789",
  "software.amazon.awssdk" % "regions" % "2.13.23"
) 

有什么想法?请指教。

apache-spark amazon-ec2 sbt
1个回答
3
投票
“随机示例:”

assemblyMergeStrategy in assembly := { case PathList("META-INF", _) => MergeStrategy.discard case PathList("git.properties", _) => MergeStrategy.discard case "application.conf" => MergeStrategy.concat case "reference.conf" => MergeStrategy.concat case _ => MergeStrategy.first }

© www.soinside.com 2019 - 2024. All rights reserved.