如何在 aws sagemaker 的多模型端点中从 triton 服务器获取字符串响应?

问题描述 投票:0回答:0

我在 sagemaker 中使用 nvidia triton 设置了一个多模型端点。下面的示例代码

  • model.py 和 config.pbtxt 用于 python 后端。我在配置文件和 model.py 文件中将输入类型设置为 int 并在执行函数中将输出设置为字符串,我传递了文本的标记化输入,我想使用 distilbart (https://huggingface .co/docs/transformers/model_doc/distilbert) 模型。根据文档,execute 函数返回一个 pb_utils.InferenceResponse 列表。是否可以从此函数返回字符串。如果是,有人可以给我举个例子吗?

模型.py

import numpy as np
import sys
import os
import json
from pathlib import Path

import torch

import triton_python_backend_utils as pb_utils

class TritonPythonModel:

    def initialize(self, args):
       ....
       

    def execute(self, requests):
        responses = []
        for request in requests:
            input_ids = pb_utils.get_input_tensor_by_name(request, "input_ids")
            input_ids = input_ids.as_numpy()
            input_ids = torch.as_tensor(input_ids).long().cuda()

            inputs = {'input_ids': input_ids}
            translation = self.model.generate(**inputs, num_beams=1)

       
            inference_response = pb_utils.InferenceResponse(
                output_tensors=[
                    pb_utils.Tensor(
                        "output",
                        ....
                    )
                ]
            )
            responses.append(inference_response)
        return responses

config.pbtxt

name: 'somename'
backend: 'python'
max_batch_size: 16
input [{
  name: "INPUT"
  data_type: TYPE_INT32
  dims: [ -1 ]}
]
output[{
   name: "OUTPUT"
   data_type: TYPE_STRING
   ...
}]
python machine-learning nvidia amazon-sagemaker
© www.soinside.com 2019 - 2024. All rights reserved.