我正在尝试在 sagemaker 中训练 LightGBM 模型。我想我缺少如何设置超参数。模型在训练过程中失败并出现此错误
2024-04-01 01:36:47,011 sagemaker-training-toolkit INFO Failed to parse hyperparameter objective value binary to Json.
这就是我定义训练步骤的方式
train_model_id, train_model_version, train_scope = "lightgbm-classification-model", "*", "training"
image_uri = sagemaker.image_uris.retrieve(
framework=None,
model_id=train_model_id,
model_version=train_model_version,
image_scope=train_scope,
region=AWS_REGION,
py_version="py3",
instance_type=instance_type,
)
lgbm_train = Estimator(
image_uri=image_uri,
instance_type=instance_type,
instance_count=1,
output_path=model_path,
role=ROLE_ARN,
sagemaker_session=sagemaker_session,
)
lgbm_train.set_hyperparameters(
objective="binary",
early_stopping_round=150,
num_threads=20,
learning_rate=0.01,
is_unbalance=True,
max_depth=15,
num_leaves=15,
num_iterations=500,
)
train_args = lgbm_train.fit(
inputs={
"train": TrainingInput(
s3_data=step_spark_pre_proc.properties.ProcessingOutputConfig.Outputs[
"train"
].S3Output.S3Uri,
content_type="text/csv",
),
"validation": TrainingInput(
s3_data=step_spark_pre_proc.properties.ProcessingOutputConfig.Outputs[
"test"
].S3Output.S3Uri,
content_type="text/csv",
),
}
)
XGBoost 超参数集中的
binary
参数值 objective
无效。
如果要进行二元分类,可以选择以下值之一作为学习目标:
binary:logistic
:二元分类的逻辑回归,输出概率binary:logitraw
:二元分类的逻辑回归,输出逻辑变换前的分数binary:hinge
:二元分类的铰链损失。这会做出 0 或 1 的预测,而不是产生概率。您的代码需要进行如下调整:
[...]
lgbm_train.set_hyperparameters(
objective="binary:logistic",
early_stopping_round=150,
num_threads=20,
learning_rate=0.01,
is_unbalance=True,
max_depth=15,
num_leaves=15,
num_iterations=500,
)
[...]
您可以在此处找到所有超参数和相应值的概述。学习目标的有效值可以在XGBoost 文档中找到。