为什么 Llama 2 7b 版本可以工作，而 70b 版本却不行？

Question

我使用类似于here的东西来运行Llama 2。

from os.path import dirname
from transformers import LlamaForCausalLM, LlamaTokenizer
import torch 

model = "/Llama-2-70b-chat-hf/"
# model = "/Llama-2-7b-chat-hf/"

tokenizer = LlamaTokenizer.from_pretrained(dirname(model))  

model = LlamaForCausalLM.from_pretrained(dirname(model)) 

eval_prompt = """
Summarize this dialog:
A: Hi Tom, are you busy tomorrow’s afternoon?
B: I’m pretty sure I am. What’s up?
A: Can you go with me to the animal shelter?.
B: What do you want to do?
A: I want to get a puppy for my son.
B: That will make him so happy.
A: Yeah, we’ve discussed it many times. I think he’s ready now.
B: That’s good. Raising a dog is a tough issue. Like having a baby ;-) 
A: I'll get him one of those little dogs.
B: One that won't grow up too big;-)
A: And eat too much;-))
B: Do you know which one he would like?
A: Oh, yes, I took him there last Monday. He showed me one that he really liked.
B: I bet you had to drag him away.
A: He wanted to take it home right away ;-).
B: I wonder what he'll name it.
A: He said he’d name it after his dead hamster – Lemmy  - he's  a great Motorhead fan :-)))
---
Summary:
"""

model_input = tokenizer(eval_prompt, return_tensors="pt")   

model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True))

7b版本输出答案。但 70b 版本加载分片后会出现错误。这里的

size_mismatch

部分重复了很多次（具有不同的权重）。

Loading checkpoint shards: 100%|███████████████████████████████████████████████| 15/15 [11:56<00:00, 47.78s/it]
Traceback (most recent call last):
  File "/llama2.py", line 52, in <module>
    model = LlamaForCausalLM.from_pretrained(dirname(model))
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/miniconda3/envs/llama2/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2795, in from_pretrained
    ) = cls._load_pretrained_model(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/miniconda3/envs/llama2/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3173, in _load_pretrained_model
    raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
    size mismatch for model.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([1024, 8192]) from checkpoint, the shape in current model is torch.Size([8192, 8192]).
    size mismatch for model.layers.0.self_attn.v_proj.weight: copying a param with shape torch.Size([1024, 8192]) from checkpoint, the shape in current model is torch.Size([8192, 8192]).

You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.

我因忽略不匹配的尺寸而收到另一个错误

KeyError: 'lm_head.weight'

。但如果它用 7b 运行，为什么不用 70b 运行呢？

Answer 1

硬件不足

您没有提到任何有关运行它的硬件的信息，所以我只能假设这是硬件不足的典型案例。根据经验，每十亿个模型参数至少需要 1GB RAM（最好是 VRAM，具体取决于架构）。

对于 70b 型号，您应该拥有 70GB vram（或统一 RAM），这在实践中通常意味着 96GB

为什么 Llama 2 7b 版本可以工作，而 70b 版本却不行？

问题描述投票：0回答：1

1个回答

最新问题

为什么 Llama 2 7b 版本可以工作，而 70b 版本却不行？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1