使用 SageMaker Python SDK 微调到 AWS SageMaker 终端节点后部署 Falcon-7B 时出错

Question

我目前遇到了 AWS SageMaker 的问题，即在使用 AWS 训练作业对其进行训练后，我无法将经过微调的 Falcon-7B 模型部署到 SageMaker 终端节点。我大致遵循这个教程：

https://www.philschmid.de/sagemaker-mistral#2-load-and-prepare-the-dataset

遵循相当可预测的工作流程：创建训练脚本、设置超参数、创建 HF 估计器，然后根据 S3 存储桶中的数据训练模型。这部分工作正常，我可以将未压缩的模型权重存储到 s3 存储桶中。

从那里，我得到了LLM图像uri：

from sagemaker.huggingface import get_huggingface_llm_image_uri

# retrieve the llm image uri
llm_image = get_huggingface_llm_image_uri(
  "huggingface",
  version="1.1.0",
  session=sess,
)

# print ecr image uri
print(f"llm image uri: {llm_image}")

创建一个新的 HuggingFace Estimator，并将模型数据设置到 s3 存储桶路径：

import json
from sagemaker.huggingface import HuggingFaceModel

model_s3_path = huggingface_estimator.model_data["S3DataSource"]["S3Uri"]

# sagemaker config
instance_type = "ml.g5.12xlarge"
number_of_gpu = 1
health_check_timeout = 300

# Define Model and Endpoint configuration parameter
config = {
  'HF_MODEL_ID': "/opt/ml/model", # path to where sagemaker stores the model
  'SM_NUM_GPUS': json.dumps(number_of_gpu), # Number of GPU used per replica
  'MAX_INPUT_LENGTH': json.dumps(1024), # Max length of input text
  'MAX_TOTAL_TOKENS': json.dumps(2048), # Max length of the generation (including input text)
}

# create HuggingFaceModel with the image uri
llm_model = HuggingFaceModel(
  role=role,
  image_uri=llm_image,
  model_data={'S3DataSource':{'S3Uri': model_s3_path,'S3DataType': 'S3Prefix','CompressionType': 'None'}},
  env=config
)

然后最终部署模型：

llm = llm_model.deploy(
  initial_instance_count=1,
  instance_type=instance_type,
  container_startup_health_check_timeout=health_check_timeout, # 10 minutes to be able to load the model
)

从这里我总是收到这样的错误：

UnexpectedStatusException: Error hosting endpoint huggingface-pytorch-tgi-inference-2023-10-21-16-47-53-072: Failed. Reason: The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint..

当我检查 cloudwatch 日志时，这是日志中弹出的第一个错误：

RuntimeError: Not enough memory to handle 4096 prefill tokens. You need to decrease `--max-batch-prefill-tokens`

这实际上没有意义，因为我使用的是相当大的 ml.g5.12xlarge 实例，而 Falcon-7B 相对来说并不是一个大模型，在教程中，作者成功地将 Mistral 7B 部署到甚至更小的实例（如 ml.g5） .2x大。另外，即使我将预填充令牌切成两半，我仍然会收到此错误：

RuntimeError: Not enough memory to handle 2048 prefill tokens. You need to decrease `--max-batch-prefill-tokens`

我已经尝试了几种排列方式，在 AWS 上训练模型并将其推送到 Huggingface，然后尝试从 Huggingface 部署模型（我知道这不必要地复杂，但我迫切需要解决方法），但这没有用并返回一个错误，指出

ValueError: Unsupported model type falcon

我还尝试以压缩形式训练和部署模型（如 model.tar.gz 中），但这也不起作用并返回相同的错误。我无论如何都不是专家，但我觉得这不应该这么难，我想知道是否有人以前经历过并解决过这个问题，这个问题是否是Falcon模型系列和SageMaker所独有的？

Answer 1

请参阅此示例 - https://github.com/aws/amazon-sagemaker-examples/blob/main/inference/generativeai/llm-workshop/lab11-llama2/meta-llama-2-7b-lmi。 ipynb

可能存在 2 个潜在问题 - 1/ 您正在设置较大的预填充标记值，这会导致这些错误，从小值开始。 2/ Llama2 的最大输入 + 输出令牌是 4096，但根据我的测试，理想情况下应该在 3500 左右，因此请务必适当设置这些值

此外，您不再需要创建 tar 球来在 SageMaker 中托管模型。请参阅此 - https://docs.aws.amazon.com/sagemaker/latest/dg/large-model-inference-uncompressed.html

使用 SageMaker Python SDK 微调到 AWS SageMaker 终端节点后部署 Falcon-7B 时出错

问题描述投票：0回答：1

1个回答

最新问题

使用 SageMaker Python SDK 微调到 AWS SageMaker 终端节点后部署 Falcon-7B 时出错

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1