在 Databricks 上使用 MLflow 记录 Spark 模型时出错 - mlflow.spark.log_model()

Question

我正在尝试使用下面的代码片段记录 Spark 模型。模型指标和参数保存在 ML 流运行中，但模型本身不会保存在工件下。但是，在同一环境中使用 model.sklearn.log_model() 记录 Scikit-learn 模型时，模型已成功保存。

环境： Databricks 10.4 LTS ML 集群

train, test = train_test_random_split(conf, data)

experiment_name = "/mlflow_experiments/debug_spark_model"
mlflow.set_experiment(experiment_name)


evaluator = BinaryClassificationEvaluator()

rf = RandomForestClassifier()

param_grid = (
    ParamGridBuilder()
    .addGrid(rf.numTrees,[15)
    .addGrid(rf.maxDepth, [6])
    .addGrid(
        rf.minInstancesPerNode,
       [7],
    )
    .build()
)

cv = CrossValidator(
    estimator=rf,
    estimatorParamMaps=param_grid,
    evaluator=BinaryClassificationEvaluator(metricName="areaUnderROC"),
    numFolds=10,
)
cv_model = cv.fit(train)

# best model
model = cv_model.bestModel

model_params_best = {
    "numTrees": cv_model.getEstimatorParamMaps()[np.argmax(cv_model.avgMetrics)][
        cv_model.bestModel.numTrees
    ],
    "maxDepth": cv_model.getEstimatorParamMaps()[np.argmax(cv_model.avgMetrics)][
        cv_model.bestModel.maxDepth
    ],
    "minInstancesPerNode": cv_model.getEstimatorParamMaps()[
        np.argmax(cv_model.avgMetrics)
    ][cv_model.bestModel.minInstancesPerNode],
}

model_metrics_best, artifacts_best, predicted_df_best = train_model(
    model, train, test, evaluator
)
with mlflow.start_run(run_name="debug_run_1"):
    run_id = mlflow.active_run().info.run_id
    mlflow.log_params(model_params_best)
    mlflow.log_metrics(model_metrics_best)

    #debug 1
    artifact_path = "best_model"
    mlflow.spark.log_model(spark_model = model, artifact_path = artifact_path) 
    source = get_artifact_uri(run_id=run_id, artifact_path=artifact_path)

它给出以下错误。

com.databricks.mlflowdbfs.MlflowHttpException：statusCode=404 ReasonPhrase=[未找到] bodyMessage=[{"error_code":"RESOURCE_DOES_NOT_EXIST","message":"运行未找到“bfe90fd5074f49c39a475b613d020cbf”。”}]

我很感激有关此错误的任何调试方向或解决方案。

Answer 1

找到了此错误或大多数与 mlflowdbfs 相关的错误的解决方法。

在 Databricks ML Runtime 集群中禁用

mlflowdbfs

可解决上述错误。另一种选择是使用普通的 Databricks 运行时集群。

import os
os.environ["DISABLE_MLFLOWDBFS"] = "true"

在 Databricks 上使用 MLflow 记录 Spark 模型时出错 - mlflow.spark.log_model()

问题描述投票：0回答：1

1个回答

最新问题

在 Databricks 上使用 MLflow 记录 Spark 模型时出错 - mlflow.spark.log_model()

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1