ClassNotFoundException:breeze.storage.Zero$DoubleZero$

问题描述 投票:0回答:2

我正在尝试使用 Spark MLLIB 的分布式 Kmeans 运行分布式 Kmeans,但出现以下错误:

Caused by: java.lang.ClassNotFoundException: breeze.storage.Zero$DoubleZero$
    at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)

我正在使用 scala 2.13.0 和 spark 3.3.0。 and breeze 2.1.0 有谁知道怎么解决吗?

scala apache-spark machine-learning distributed-computing scala-breeze
2个回答
0
投票

这里是一个重现错误的小例子:

import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.SparkSession
import org.apache.spark.mllib.clustering.KMeans
import org.apache.spark.mllib.linalg.Vectors

object example {

  def main(args: Array[String]): Unit = {

    val data = List(Vectors.dense(Array(-1.2067543462416856,1.3095550194913217)),
      Vectors.dense(Array(0.07214871343256794,1.2317180069067792)),
      Vectors.dense(Array(1.2382694463625876,1.498952083293292)),
      Vectors.dense(Array(1.4227882484992194,1.1326606729937694)),
      Vectors.dense(Array(0.028564865614650627,1.1697757168356784)),
      Vectors.dense(Array(1.3008028016732505,1.3992632244080325)),
      Vectors.dense(Array(-0.4515288119480808,-0.44940482288858774)),
      Vectors.dense(Array(1.3912470190900275,-1.2895692645735999)),
      Vectors.dense(Array(-0.5498887597576244,-0.4937628444210279)),
      Vectors.dense(Array(0.03640545102051686,-1.3540754314126295)),
      Vectors.dense(Array(-1.2520223542111055,1.2709646562853476)))

    Logger.getLogger("org").setLevel(Level.OFF)

    val SS = SparkSession
      .builder()
      .appName("example")
      .config("spark.master", "local[*]").getOrCreate()
    val sc = SS.sparkContext

    val rdd = sc.parallelize(data)
    val kmeans = KMeans.train(rdd,10,100)
  }
}

0
投票

看起来像是依赖关系的问题。

微风 1.3-

breeze.storage.Zero.DoubleZero
被定义为

@SerialVersionUID(1L)
implicit object DoubleZero extends Zero[Double] {
  override def zero = 0.0
}

https://github.com/scalanlp/breeze/blob/releases/v1.3/math/src/main/scala/breeze/storage/Zero.scala#L77

breeze.storage.Zero.DoubleZero.getClass
产生
breeze.storage.Zero$DoubleZero$
.

但是在breeze 2.0+中

DoubleZero
定义为

implicit val DoubleZero: Zero[Double] = Zero(0.0)

https://github.com/scalanlp/breeze/blob/releases/v2.0/math/src/main/scala/breeze/storage/Zero.scala#L46

@SerialVersionUID(1L)
case class Zero[@specialized T](zero: T) extends Serializable

breeze.storage.Zero.DoubleZero.getClass
产生
breeze.storage.Zero$mcD$sp
(因为
@specialized
)而
Class.forName("breeze.storage.Zero$DoubleZero$")
抛出
ClassNotFoundException
.

你应该看看什么依赖还用breeze 1.3-


更新。感谢MCVE。

调试显示

NoClassDefFoundError
/
ClassNotFoundException
被抛到这里

  private lazy val loadableSparkClasses: Seq[Class[_]] = {
    Seq(
      // ...
      "org.apache.spark.ml.linalg.SparseMatrix",
      // ...
    ).flatMap { name =>
      try {
        Some[Class[_]](Utils.classForName(name))
      } catch {
        case NonFatal(_) => None // do nothing
        case _: NoClassDefFoundError if Utils.isTesting => None // See SPARK-23422.
      }
    }
  }

https://github.com/apache/spark/blob/v3.3.0/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala#L521

更简单的复制是

Class.forName("org.apache.spark.ml.linalg.SparseMatrix")
// java.lang.NoClassDefFoundError: breeze/storage/Zero$DoubleZero$ ...
// Caused by: java.lang.ClassNotFoundException: breeze.storage.Zero$DoubleZero$ ...

正如我所说,其中一个依赖项使用 breeze 1.3- 尽管您认为您使用的是 breeze 2.1.0。即,

org.apache.spark.ml.linalg.SparseMatrix
来自
spark-mllib-local
spark-mllib-local
3.3.0 使用 breeze 1.2

<dependency>
    <groupId>org.scalanlp</groupId>
    <artifactId>breeze_2.13</artifactId>
    <version>1.2</version>
    <scope>compile</scope>
    <exclusions>
        <exclusion>
            <artifactId>commons-math3</artifactId>
            <groupId>org.apache.commons</groupId>
        </exclusion>
    </exclusions>
</dependency>

https://repo1.maven.org/maven2/org/apache/spark/spark-mllib-local_2.13/3.3.0/spark-mllib-local_2.13-3.3.0.pom

所以 Spark 3.3.0(和 3.3.2)与 breeze 2.0+ 不兼容。使用微风 1.3-

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-sql" % "3.3.0",
  "org.apache.spark" %% "spark-mllib" % "3.3.0",
  "org.scalanlp" %% "breeze" % "1.3"
)

然后你的代码运行成功。

https://github.com/scalanlp/breeze/issues/710

Apache Spark - java.lang.NoSuchMethodError: breeze.linalg.Vector$.scalarOf()Lbreeze/linalg/support/ScalarOf

https://github.com/scalanlp/breeze/issues/690

Breeze 应该在 Spark 3.4.0 中升级到 2.0

https://issues.apache.org/jira/browse/SPARK-39616

© www.soinside.com 2019 - 2024. All rights reserved.