在pyspark mlib中运行Logistic回归时出错

问题描述 投票:1回答:1

我有一个数据框(df_ml_nullable),如下所示:

+-----+--------------------+
|label|            features|
+-----+--------------------+
|  0.0|[127.0,132.0,123....|
|  0.0|[67.0,67.0,67.0,6...|
|  0.0|[-29.0,-30.0,-28....|
|  4.0|[31.0,31.0,31.0,3...|
|  0.0|[39.0,40.0,42.0,4...|
+-----+--------------------+

下面是此数据框的架构:df_ml_nullable.printSchema()

root
 |-- label: double (nullable = false)
 |-- features: vector (nullable = false)

我尝试像这样运行逻辑回归:

    from pyspark.ml.linalg import Vectors
    from pyspark.ml.classification import LogisticRegression
    lr = LogisticRegression(maxIter=10, regParam=0.01)
    (train_d,test_d)=df_ml_nullable.randomSplit([0.7, 0.3])
    model1 = lr.fit(train_d)

[当我尝试运行此命令时,出现此错误:IllegalArgumentException:您的要求失败:列特征必须为struct,values:array>类型,但实际上是struct,values:array>。'

有人遇到这个问题吗?

apache-spark apache-spark-mllib pyspark-dataframes
1个回答
0
投票

问题出在导入上。我不是从ml导入,而是从mllib导入向量。下面的更正解决了问题:

#from pyspark.mllib.linalg import Vectors, VectorUDT
from pyspark.ml.linalg import Vectors,VectorUDT

@ Vincent-谢谢提示。

© www.soinside.com 2019 - 2024. All rights reserved.