我想通过 aws sagemaker 部署一个 Huggingface 文本嵌入模型端点。
这是迄今为止我的代码:
import sagemaker
from sagemaker.huggingface.model import HuggingFaceModel
# sess = sagemaker.Session()
role = sagemaker.get_execution_role()
# Hub Model configuration. <https://huggingface.co/models>
hub = {
'HF_MODEL_ID':'sentence-transformers/all-MiniLM-L12-v2', # model_id from hf.co/models
'HF_TASK':'feature-extraction' # NLP task you want to use for predictions
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
env=hub, # configuration for loading model from Hub
role=role, # iam role with permissions to create an Endpoint
py_version='py36',
transformers_version="4.6", # transformers version used
pytorch_version="1.7", # pytorch version used
)
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.m5.xlarge"
)
data = {
"inputs": ["This is an example sentence", "Each sentence is converted"]
}
result = predictor.predict(data)
print(len(result[0]))
print(result[0])
虽然这确实成功部署了端点,但它的行为却没有达到应有的效果。 我希望输入列表中的每个字符串都能获得 1x384 浮点数列表作为输出。 但相反,我得到每个句子的 7x384 列表。 我可能使用了错误的管道吗?
您看到的输出是该模型生成的默认输出。如果您想按照预期调整输出,您可以在客户端执行此操作(收到输出后),也可以附加一个 inference.py 脚本,该脚本实现将调整输出的函数:特别是predict_fn和output_fn函数。
示例:https://github.com/huggingface/notebooks/tree/main/sagemaker/17_custom_inference_script/code
def model_fn(model_dir):
# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModel.from_pretrained(model_dir)
return model, tokenizer
def predict_fn(data, model_and_tokenizer):
# destruct model and tokenizer
model, tokenizer = model_and_tokenizer
# Tokenize sentences, preprocessing etc ....