无法从 Python 脚本创建和运行 Azure ML Text NER 作业

问题描述 投票:0回答:1

我正在尝试从 Python 脚本触发 Azure ML 服务上的文本 NER 作业,并将训练和验证文件夹从本地路径上传到数据存储上。代码如下:

import os

from azure.identity import DefaultAzureCredential
from azure.ai.ml import automl, Input, MLClient
from azure.ai.ml.constants import AssetTypes
from azure.ai.ml.entities import ResourceConfiguration

os.environ["AZURE_CLIENT_ID"] = <my_client_id>
os.environ["AZURE_TENANT_ID"] = <my_tenant_id>
os.environ["AZURE_CLIENT_SECRET"] = <my_client_secret_id>

subscription_id = <my_subscription_id>
resource_group = <my_resource_group_id>
workspace = <my_workspace_id>

ml_client = MLClient(DefaultAzureCredential(), subscription_id, resource_group, workspace)

training_mltable_path = "./training-mltable-folder/"
validation_mltable_path = "./validation-mltable-folder/"

my_training_data_input = Input(type=AssetTypes.MLTABLE, path=training_mltable_path)
my_validation_data_input = Input(type=AssetTypes.MLTABLE, path=validation_mltable_path)

text_ner_job = automl.text_ner(
    name="dpv2-nlp-text-ner-job-01",
    experiment_name="dpv2-nlp-text-ner-experiment",
    training_data=my_training_data_input,
    validation_data=my_validation_data_input
)

text_ner_job.set_limits(timeout_minutes=60)
text_ner_job.resources = ResourceConfiguration(instance_type="Standard_NC6s_v3")

returned_job = ml_client.jobs.create_or_update(
    text_ner_job
)

print(f"Created job: {returned_job}")

ml_client.jobs.stream(returned_job.name)

但是,当我运行此代码时,它返回以下错误:

Traceback (most recent call last):
  ...
    raise JobException(
azure.ai.ml.exceptions.JobException: Exception : 
 {
    "error": {
        "code": "UserError",
        "message": "Failed to validate user configuration and data.\n 1. The data file does not exists. Ensure data correctness and availability.",
        "message_parameters": {},
        "target": "ValidationService",
        "details": [
            {
                "code": "UserError",
                "severity": 2,
                "message": "The data file does not exists. Ensure data correctness and availability.",
                "message_format": "The data file does not exists. Ensure data correctness and availability.",
                "message_parameters": {
                    "0": "System.Collections.Generic.Dictionary`2[System.String,System.String]"
                },
                "target": "training_data",
                "details": [
                    {
                        "message": "null",
                        "message_parameters": {},
                        "details": []
                    }
                ],
                "inner_error": {
                    "code": "BadArgument",
                    "inner_error": {
                        "code": "ArgumentInvalid",
                        "inner_error": {
                            "code": "DatasetInvalidPath"
                        }
                    }
                }
            }
        ]
    },
    "time": "0001-01-01T00:00:00.000Z"
}

我相当有信心上传的数据对于此类任务而言格式正确,因此这可能是一个可用性问题。

关于如何解决这个问题有什么想法吗?

python sdk azure-machine-learning-service named-entity-recognition
1个回答
0
投票

您需要具有

AssetTypes.MLTABLE
类型的输入数据。它应该如下图所示。

enter image description here

应该有一个名为

MLTable
的文件,里面应该提到数据文件的路径。

检查您的输入文件夹并按如上所示更改它们。

或者,您可以在数据资产中创建 MLTable 类型数据并使用该路径,如下所示。

my_training_data_input = Input(type=AssetTypes.MLTABLE, path="azureml://datastores/workspaceblobstore/paths/my_training_mltable")

my_validation_data_input = Input(type=AssetTypes.MLTABLE, path="azureml://datastores/workspaceblobstore/paths/my_validation_mltable")

创建数据资产:

转到 Data > 单击 create > 选择 source files 选项 > 选择 workspace blobstore 位置 > upload your files 并单击 create。

enter image description here

创建完成后,你会得到如下图所示的路径。在代码中使用该路径。

enter image description here

© www.soinside.com 2019 - 2024. All rights reserved.