模型对 Sagemaker 失败的推断

Question

我已将张量流模型 model.tar.gz 上传到 Sagemaker，然后我执行以下代码进行推理

import boto3
import json

client = boto3.client('sagemaker-runtime')

response = client.invoke_endpoint(
    EndpointName="tensorflow-inference-2024-01-14-14-11-34-629",
    ContentType="application/json",
    Body=mock_request_json  # JSON string payload
)

print(response['Body'].read().decode())

我收到错误：


ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
    "error": "Failed to process element: 0 key: data of 'instances' list. Error: Invalid argument: JSON object: does not have named input: data"
}". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/tensorflow-inference-2024-01-14-14-11-34-629 in account 969632233674 for more information.```

However, I have an old deployed endpoint that works fine with the same code but it works fine. I checked all the code and it seems the same except for some minor difference in the folder structure of model.tar.gz. Any ideas what could be causing teh error ?

Also as referece, here is my deployment code:

```import sagemaker
from sagemaker.tensorflow import TensorFlowModel
from sagemaker.deserializers import JSONDeserializer
from utilities.sagemaker_utils.training_utils import delete_endpoint, get_last_training_job, deploy_trained_model
import json

model_name = "search-ranking-model"
training_job = "search-ranking-model-2023-12-24-09-01-19-826"  # replace with the actual training job

try:
    predictor = deploy_trained_model(
        model_name=model_name,
        training_job=training_job,
        instance_type='ml.m5.large',
        framework='tensorflow',
        framework_version='2.6',  # replace with the version used during training
        py_version='py38'  # replace with the python version used during training
    )
    predictor.deserializer = JSONDeserializer()
except Exception as e:
    # Log the exception and handle any cleanup if necessary
    print(f"An error occurred during model deployment: {e}")

Answer 1

错误似乎出现在您自己的容器推理代码中。您能否在推理容器中记录每一行，以便我们更好地理解，这将显示在 CW 日志中。一般来说，最好在本地测试您的模型期望的格式。其他选项是使用 SageMaker 在本地测试容器 + 模型数据，这些选项包括：

您还可以选择使用 SageMaker Model Builder 类，这有助于本地部署：https://github.com/aws-samples/sagemaker-hosting/tree/main/SageMaker-Model-Builder/traditional-models
本地拉取容器并提供推理以更快地调试：https://towardsdatascience.com/debugging-sagemaker-endpoints-with-docker-7a703fae3a26

我在 AWS 工作，意见是我自己的。

模型对 Sagemaker 失败的推断

问题描述投票：0回答：1

1个回答

最新问题

模型对 Sagemaker 失败的推断

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1