Pyspark ML CrossValidator 评估多个评估器

Question

在 GridSearchCV 的 Sklearn 中，我们可以为模型提供不同的评分，并通过改装参数，我们使用整个数据集中找到的最佳参数来改装其中之一。

有什么方法可以使用 pyspark 的 ML 包中的 CrossValidator 执行类似的操作吗？

Answer 1

根据文档here，这会在

fit()

的

CrossValidator

方法中自动发生：

在确定了最佳ParamMap之后，CrossValidator最终使用最佳ParamMap和整个数据集重新拟合Estimator。

事实上，您可以在代码here中看到这种情况发生，例如：

        bestModel = est.fit(dataset, epm[bestIndex])