我是 Pyspark 和 Databricks 的新手,正在尝试创建 Logistic 回归模型(通过 Databrticks 本身提供的 Spark_DS&ML_exercise)。将模型拟合到我的训练数据后。我正在尝试通过阈值从摘要中获取 f 度量。
我运行了以下代码:
from pyspark.ml.classification import LogisticRegression
# Create initial LogisticRegression model
lr = LogisticRegression(labelCol="label", featuresCol="features", maxIter=10)
# set threshold for the probability above which to predict a 1
lr.setThreshold(train_positive_rate)
# lr.setThreshold(0.5) # could use this if knew you had balanced data
# Train model with Training Data
lrModel = lr.fit(train)
# get training summary used for eval metrics and other params
lrTrainingSummary = lrModel.summary
# Find the best model threshold if you would like to use that instead of the empirical positve rate
fMeasure = lrTrainingSummary.fMeasureByThreshold
但是我收到了这个 AttributeError:
AttributeError: 'LogisticRegressionTrainingSummary' object has no attribute 'fMeasureByThreshold'
看来
fMeasureByThreshold
已经不存在了。是这样吗?
所以,我找到了答案。我正在寻找的方法位于
BinaryLogisticRegressionTrainingSummary
而不是 LogisticRegressionTrainingSummary
。前者是二项式逻辑回归模型的总结(我的是多项式)。对于后者,获得 f 测量的唯一方法是使用 fMeasureByLabel
进行标签。