使用 SFT 训练器微调 Llama2 后，使用 DataCollatorForCompletionOnlyLM 获取预测

Question

我正在使用 SFT 训练器微调 Llama2 并使用 Lora 进行量化。我的数据集由结构如下的问题组成：

<s>[INST] 
<<SYS>> Please select the correct answer from the given multiple Options based on the given Context: <</SYS>>  
 Context: Abrasion is another type of mechanical weathering. With abrasion, one rock bumps against another rock. Gravity causes abrasion as a rock tumbles down a slope. Moving water causes abrasion it moves rocks so that they bump against one another (Figure 9.3). Strong winds cause abrasion by blasting sand against rock surfaces.  
 Question: Gravity causes erosion by all of the following except  \
 Options:(A) glaciers (B) moving air (C) flowing water (D) mass movement  
 Answer: [/INST] D </s>

我目前正在使用 DataCollatorForCompletionOnlyLM 根据预测答案计算损失。就我而言，这是指令结构：

我是否提供上下文、问题和选项作为说明模板？

instruction_template = "</SYS>>\n\n Context:" response_template = "Answer: [/INST]" collator = DataCollatorForCompletionOnlyLM(instruction_template=instruction_template, response_template=response_template, tokenizer=tokenizer, mlm=False)

或

response_template = "Answer: [/INST]" collator = DataCollatorForCompletionOnlyLM(response_template=response_template, tokenizer=tokenizer, mlm=False)

我尝试了多个响应模板，但总是收到错误：运行时错误：在令牌 ID 张量（[ 1, 835, ...]）中找不到响应键 [835, 4007, 22137, 29901]

您能指导我找到正确的吗？

Answer 1

与大多数其他分词器不同，llama-2 分词器取决于上下文。例如-

sent-1: """### User: Hello\n\n### Assistant: Hi, how can I help you?"""
sent-2: "### Assistant:"

此处

### Assistant:

的 token-id 将有所不同，因为 sent-1 将

\n

作为上下文，而 sent-2 则没有。

解决方案： 使用相同上下文进行标记是最终解决方案。这里 Huggigface 正在详细解释类似的问题。

使用 SFT 训练器微调 Llama2 后，使用 DataCollatorForCompletionOnlyLM 获取预测

问题描述投票：0回答：1

1个回答

最新问题

使用 SFT 训练器微调 Llama2 后，使用 DataCollatorForCompletionOnlyLM 获取预测

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1