Spark Scala MLlib异常:java.lang.IllegalArgumentException

问题描述 投票:0回答:1

我是Spark MLLib的新手,正在尝试执行下面的spark代码

import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.ml.linalg.Vectors

val dataset = spark.createDataFrame(
  Seq((0, 18, 1.0, Vectors.dense(0.0, 10.0, 0.5), 1.0))
).toDF("id", "hour", "mobile", "userFeatures", "clicked")

val assembler = new VectorAssembler()
  .setInputCols(Array("hour", "mobile", "userFeatures"))
  .setOutputCol("features")

val output = assembler.transform(dataset)

但是我收到以下异常

java.lang.IllegalArgumentException: Data type struct<type:tinyint,size:int,indices:array<int>,values:array<double>> of column userFeatures is not supported.
  at org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:169)
  at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74)
  at org.apache.spark.ml.feature.VectorAssembler.transform(VectorAssembler.scala:86)
  ... 51 elided

scala apache-spark apache-spark-ml
1个回答
0
投票

在我的机器上工作得很好

您在IntelliJ中的代码

import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.ml.linalg.Vectors
import org.apache.spark.sql.SparkSession

object MllibError {
  val spark = SparkSession
    .builder()
    .appName("MllibError")
    .master("local[*]")
    .config("spark.sql.shuffle.partitions","4") //Change to a more reasonable default number of partitions for our data
    .config("spark.app.id","MllibError") // To silence Metrics warning
    .getOrCreate()
  def main(args: Array[String]): Unit = {
    val dataset = spark.createDataFrame(
      Seq((0, 18, 1.0, Vectors.dense(0.0, 10.0, 0.5), 1.0))
    ).toDF("id", "hour", "mobile", "userFeatures", "clicked")

    val assembler = new VectorAssembler()
      .setInputCols(Array("hour", "mobile", "userFeatures"))
      .setOutputCol("features")

    val output = assembler.transform(dataset)
    output.show(truncate = false)
  }
}

输出

+---+----+------+--------------+-------+-----------------------+
|id |hour|mobile|userFeatures  |clicked|features               |
+---+----+------+--------------+-------+-----------------------+
|0  |18  |1.0   |[0.0,10.0,0.5]|1.0    |[18.0,1.0,0.0,10.0,0.5]|
+---+----+------+--------------+-------+-----------------------+

也许您对依赖项或版本库有问题我的build.sbt

name := "scala-programming-for-data-science"

version := "0.1"

scalaVersion := "2.11.10"

// https://mvnrepository.com/artifact/org.apache.spark/spark-mllib
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.2.0"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.0"
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.2.0"
libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.2.0"

我希望,它会为您提供一些线索。问候。

© www.soinside.com 2019 - 2024. All rights reserved.