我想在 Sagemaker 上部署 LLM 模型，但它给了我这个错误。我也尝试过不同的模型，但仍然面临同样的错误

Question

I'm deploying TheBloke/Llama-2-7b-Chat-GPTQ " model on sagemaker. I'm running this code in sagemaker notebook instance. I've used "ml.g4dn.xlarge" instance for deployement. I've used the same code that have been shown on the deployment on Amazon Sagemaker button on huggingface.

After running the code it takes 10 min of processing it shows me this output while processing: Output:

These dashes shows the model is deploying. After these dashes I got this error:

Error:

UnexpectedStatusException：托管端点 Huggingface-pytorch-tgi-inference-2023-08-24-06-51-13-816 时出错：失败。原因：生产变体 AllTraffic 的主容器未通过 ping 运行状况检查。请检查此端点的 CloudWatch 日志..

Answer 1

这里有一些故障排除技巧：

既然您尝试使用13B参数模型，请尝试使用官方博客
```
这里
```
默认推荐的实例类型ml.g5.12xlarge。如果您还没有
```
ml.g5.12xlarge
```
的配额，您可能需要申请。
尝试将 Huggingface 的版本从
```
0.9.3
```
更改为
```
0.8.2
```
，看看是否适合您。
按照此处提供的故障排除步骤进行操作主容器未通过 ping 运行状况检查

替代方法：

对于部署大型模型，建议您按照部署未压缩模型。

您还可以检查您尝试部署的模型是否已在 SageMaker JumpStart 中可用。

我想在 Sagemaker 上部署 LLM 模型，但它给了我这个错误。我也尝试过不同的模型，但仍然面临同样的错误

问题描述投票：0回答：1

`After running the code it takes 10 min of processing it shows me this output while processing: Output:`

1个回答

最新问题

我想在 Sagemaker 上部署 LLM 模型，但它给了我这个错误。我也尝试过不同的模型，但仍然面临同样的错误

问题描述 投票：0回答：1

After running the code it takes 10 min of processing it shows me this output while processing: Output:

1个回答

最新问题

问题描述投票：0回答：1

`After running the code it takes 10 min of processing it shows me this output while processing: Output:`