我正在尝试从 Python 脚本触发 Azure ML 服务上的文本 NER 作业,并将训练和验证文件夹从本地路径上传到数据存储上。代码如下:
import os
from azure.identity import DefaultAzureCredential
from azure.ai.ml import automl, Input, MLClient
from azure.ai.ml.constants import AssetTypes
from azure.ai.ml.entities import ResourceConfiguration
os.environ["AZURE_CLIENT_ID"] = <my_client_id>
os.environ["AZURE_TENANT_ID"] = <my_tenant_id>
os.environ["AZURE_CLIENT_SECRET"] = <my_client_secret_id>
subscription_id = <my_subscription_id>
resource_group = <my_resource_group_id>
workspace = <my_workspace_id>
ml_client = MLClient(DefaultAzureCredential(), subscription_id, resource_group, workspace)
training_mltable_path = "./training-mltable-folder/"
validation_mltable_path = "./validation-mltable-folder/"
my_training_data_input = Input(type=AssetTypes.MLTABLE, path=training_mltable_path)
my_validation_data_input = Input(type=AssetTypes.MLTABLE, path=validation_mltable_path)
text_ner_job = automl.text_ner(
name="dpv2-nlp-text-ner-job-01",
experiment_name="dpv2-nlp-text-ner-experiment",
training_data=my_training_data_input,
validation_data=my_validation_data_input
)
text_ner_job.set_limits(timeout_minutes=60)
text_ner_job.resources = ResourceConfiguration(instance_type="Standard_NC6s_v3")
returned_job = ml_client.jobs.create_or_update(
text_ner_job
)
print(f"Created job: {returned_job}")
ml_client.jobs.stream(returned_job.name)
但是,当我运行此代码时,它返回以下错误:
Traceback (most recent call last):
...
raise JobException(
azure.ai.ml.exceptions.JobException: Exception :
{
"error": {
"code": "UserError",
"message": "Failed to validate user configuration and data.\n 1. The data file does not exists. Ensure data correctness and availability.",
"message_parameters": {},
"target": "ValidationService",
"details": [
{
"code": "UserError",
"severity": 2,
"message": "The data file does not exists. Ensure data correctness and availability.",
"message_format": "The data file does not exists. Ensure data correctness and availability.",
"message_parameters": {
"0": "System.Collections.Generic.Dictionary`2[System.String,System.String]"
},
"target": "training_data",
"details": [
{
"message": "null",
"message_parameters": {},
"details": []
}
],
"inner_error": {
"code": "BadArgument",
"inner_error": {
"code": "ArgumentInvalid",
"inner_error": {
"code": "DatasetInvalidPath"
}
}
}
}
]
},
"time": "0001-01-01T00:00:00.000Z"
}
我相当有信心上传的数据对于此类任务而言格式正确,因此这可能是一个可用性问题。
关于如何解决这个问题有什么想法吗?
您需要具有
AssetTypes.MLTABLE
类型的输入数据。它应该如下图所示。
应该有一个名为
MLTable
的文件,里面应该提到数据文件的路径。
检查您的输入文件夹并按如上所示更改它们。
或者,您可以在数据资产中创建 MLTable 类型数据并使用该路径,如下所示。
my_training_data_input = Input(type=AssetTypes.MLTABLE, path="azureml://datastores/workspaceblobstore/paths/my_training_mltable")
my_validation_data_input = Input(type=AssetTypes.MLTABLE, path="azureml://datastores/workspaceblobstore/paths/my_validation_mltable")
创建数据资产:
转到 Data > 单击 create > 选择 source files 选项 > 选择 workspace blobstore 位置 > upload your files 并单击 create。
创建完成后,你会得到如下图所示的路径。在代码中使用该路径。