我已在 AWS SageMaker 上使用 NVIDIA Triton 推理服务器部署了一个模型,并尝试使用 AWS API Gateway 通过 REST API 公开它。这将使客户可以访问它。
最初,我编写了使用特定 MIME 类型
application/vnd.sagemaker-triton.binary+json;json-header-size={NUMBER}
直接调用 AWS SageMaker 的代码(详见 AWS 文档)。 Content-Type 标头中的此 MIME 类型(其中 {NUMBER}
表示要读取为 JSON 后跟二进制数据的字节数)可以完美地工作。
按照 AWS 博客说明,我创建了一个 API 并将其设置为代理我对 SageMaker 运行时的响应,无需修改。此外,我将
application/vnd.sagemaker-triton.binary+json
添加到二进制媒体类型,以确保它以二进制形式代理而不进行更改。
但是,当我测试 AWS Gateway 终端节点时,遇到错误:
The error message (unexpected size for input 'np_tensor', expecting 4 additional bytes) suggests that the Triton server is not receiving the correct binary data size, possibly due to the way API Gateway is processing the request.
AWS Gateway 似乎没有保留
Content-Type=application/vnd.sagemaker-triton.binary+json;json-header-size={NUMBER}
标头。直接访问 SageMaker 端点时省略此标头会导致相同的错误。
日志表明标头最初存在,但后续条目仅显示截断的输出,这并没有提供太多帮助。
这是我使用的代码片段:
python客户端代码
import boto3
import botocore.session
from botocore.auth import SigV4Auth
from botocore.awsrequest import AWSRequest
import numpy as np
import json
import requests
aws_region = 'us-east-1'
# API Gateway URL
url = ""
# SageMaker Endpoint URL (commented out since we're using API Gateway)
# url = ""
# Sample dummy input data for testing
input_data = np.array([[-0.0024108887]]).astype('float32')
# Define the request body for the Triton server
json_request = {
"inputs": [
{
"name": "np_tensor",
"shape": list(input_data.shape),
"datatype": "FP32",
"parameters": {"binary_data_size": input_data.nbytes},
},
],
"outputs": [
{"name": "transcription", "parameters": {"binary_data": True}},
],
}
# Convert the request to a JSON string and then to bytes
json_request_str = json.dumps(json_request)
request_body = json_request_str.encode() + input_data.tobytes()
header_length = len(json_request_str)
# # Not needed for AWS Gateway
# # AWS session and credentials setup
# session = boto3.Session()
# credentials = session.get_credentials()
# # AWS Request with SigV4 Authentication
# request = AWSRequest(method="POST", url=url, data=request_body)
# SigV4Auth(credentials, 'sagemaker', aws_region).add_auth(request)
# signed_headers = dict(request.headers)
# Prepare headers, including the custom Content-Type header
signed_headers = {}
signed_headers["Content-Type"] = "application/vnd.sagemaker-triton.binary+json;json-header-size={}".format(header_length)
# Send the request and print the response
response = requests.post(
url,
headers=signed_headers,
data=request_body
)
print(response.content.decode("utf8"))
我的问题是:
任何见解或建议将不胜感激。
我找到了解决方案,感谢这个答案!除非明确告知,否则 Amazon API Gateway 不会将标头传递给 SageMaker(这有点违反直觉,因为没有明确提及)。传递标题: