RandomForestClassifier没有属性转换,那么如何获得预测?

问题描述 投票:0回答:1

您如何从RandomForestClassifier中获得预测?松懈地遵循最新文档here,我的代码看起来像...

# Split the data into training and test sets (30% held out for testing)
SPLIT_SEED = 64  # some const seed just for reproducibility
TRAIN_RATIO = 0.75
(trainingData, testData) = df.randomSplit([TRAIN_RATIO, 1-TRAIN_RATIO], seed=SPLIT_SEED)
print(f"Training set ({trainingData.count()}):")
trainingData.show(n=3)
print(f"Test set ({testData.count()}):")
testData.show(n=3)

# Train a RandomForest model.
rf = RandomForestClassifier(labelCol="labels", featuresCol="features", numTrees=36)

rf.fit(trainingData)
#print(rf.featureImportances)

preds = rf.transform(testData)

运行此程序时,出现错误

AttributeError:'RandomForestClassifier'对象没有属性'transform'

python api docs的检查,我看不到任何与从经过训练的模型中生成预测有关的东西(对于这一点,特征也不重要)。 mllib经验不足,因此不确定该怎么做。有更多经验的人都知道该怎么办吗?

pyspark apache-spark-mllib
1个回答
1
投票

通过仔细查看文档

>>> model = rf.fit(td)
>>> model.featureImportances
SparseVector(1, {0: 1.0})
>>> allclose(model.treeWeights, [1.0, 1.0, 1.0])
True
>>> test0 = spark.createDataFrame([(Vectors.dense(-1.0),)], ["features"])
>>> result = model.transform(test0).head()
>>> result.prediction

您会注意到rf.fit返回的拟合模型不同于原始的RandomForestClassifier类。

并且模型将具有转换的方法,并且具有重要意义

因此在您的代码中

# Train a RandomForest model.
rf = RandomForestClassifier(labelCol="labels", featuresCol="features", numTrees=36)

model = rf.fit(trainingData)
#print(rf.featureImportances)

preds = model.transform(testData)
© www.soinside.com 2019 - 2024. All rights reserved.