在 Google Colab 中设置 Llama-2 时出现问题 - 加载检查点分片时单元运行失败

问题描述投票：0回答：1

我正在尝试在 Google Colab (Python 3.10.12) 中使用带有 7B 参数的 Llama 2 聊天（通过拥抱脸）。我已经通过 Meta 获取了我的访问令牌。我只是使用拥抱面孔中的代码以及我的访问令牌来实现该模型。这是我的代码：

!pip install transformers
 
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

token = "---Token copied from Hugging Face and pasted here---"

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf", token=token)
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf", token=token)

它开始下载模型，但当它到达加载检查点分片时：它只是停止运行并且没有错误：

python huggingface-transformers large-language-model llama

1个回答

0
投票

问题是 Colab 实例耗尽 CPU RAM。

对于 LLama 模型，您需要大约 25 GB 的 float32 模型（但您需要 CPU RAM 和相同的 25 GB GPU RAM）。对于 bfloat16 模型，它约为 13 Gb，仍然勉强适合基本的 Colab Cpu 实例。

查看此链接以获取所需资源的详细信息： Huggingface.co/NousResearch/Llama-2-7b-chat-hf/discussions/3

最新问题

© www.soinside.com 2019 - 2024. All rights reserved.