Transformers Trainer 的批量和 Epoch 训练指标

问题描述 投票:0回答:1

有多种方法可以获取transformers.Trainer

的指标
,但仅用于评估,而不是用于训练。我阅读并发现答案分散在不同的帖子中,例如这篇文章

但是我没有找到如何获取一个时期的指标(损失、准确度、召回率、精度、f1)**

我在第一个答案中提供了解决方案。

如果代码中有一些优化方法或错误,请告诉我,我很乐意听到。

huggingface-transformers metrics text-classification transformer-model huggingface-trainer
1个回答
0
投票

这里有一个方法,我还提供了一种获取每个批次的指标的方法:

总结:

    自定义Trainer类来存储预测、标签和损失。
  1. 覆盖自定义训练器类中的
  2. computer_loss()
     函数,以获取每批结束时的预测、标签和损失。
  3. 创建一个
  4. CustomCallback
    类+自定义该类中的
    on_epoch_end()
    函数来获取、连接、计算和返回指标(我使用了loss、acc、recall、prec.和f1)
  5. 在训练函数中定义训练器后添加 CustomCallBack。
扩展及代码:

注释:在我的例子中,我直接使用 wandb 进行绘图,而不是返回值 + 我的分类是二进制的,我不确定是否需要更改某些内容以进行多类预测 + 我在单个 GPU 上运行。

步骤 1 和 2:

CustomTrainer(transformers.Trainer): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self.epoch_predictions = [] self.epoch_labels = [] self.epoch_loss = [] def compute_loss(self, model, inputs, return_outputs=False): """ MAX: Subclassed to compute training accuracy. How the loss is computed by Trainer. By default, all models return the loss in the first element. Subclass and override for custom behavior. """ if self.label_smoother is not None and "labels" in inputs: labels = inputs.pop("labels") else: labels = None outputs = model(**inputs) if "labels" in inputs: preds = outputs.logits.detach() # Log accuracy acc = ( (preds.argmax(axis=1) == inputs["labels"]) .type(torch.float) .mean() .item() ) # Uncomment it if you want to plot the batch accuracy # wandb.log({"batch_accuracy": acc}) # Log accuracy # Store predictions and labels for epoch-level metrics self.epoch_predictions.append(preds.cpu().numpy()) self.epoch_labels.append(inputs["labels"].cpu().numpy()) # Save past state if it exists if self.args.past_index >= 0: self._past = outputs[self.args.past_index] if labels is not None: loss = self.label_smoother(outputs, labels) else: loss = outputs["loss"] if isinstance(outputs, dict) else outputs[0] # Uncomment it if you want to plot the batch loss # wandb.log({"batch_loss": loss}) self.epoch_loss.append(loss.item()) # Store loss for epoch-level metrics return (loss, outputs) if return_outputs else loss
步骤3:

class CustomCallback(TrainerCallback): def __init__(self, trainer) -> None: super().__init__() self._trainer = trainer def on_epoch_end(self, args, state, control, **kwargs): # Aggregate predictions and labels for the entire epoch epoch_predictions = np.concatenate(self._trainer.epoch_predictions) epoch_labels = np.concatenate(self._trainer.epoch_labels) # Compute accuracy accuracy = np.mean(epoch_predictions.argmax(axis=1) == epoch_labels) # Compute mean loss mean_loss = np.mean(self._trainer.epoch_loss) # Log epoch-level metrics wandb.log({"epoch_accuracy": accuracy, "epoch_loss": mean_loss}) # Clear stored predictions, labels, and loss for the next epoch self._trainer.epoch_predictions = [] self._trainer.epoch_labels = [] self._trainer.epoch_loss = [] return None
第四步:

... trainer = CustomTrainer(model=model, tokenizer=tokenizer, args=training_args, compute_metrics=compute_metrics, train_dataset=train_dataset, eval_dataset=val_dataset, data_collator=data_collator) trainer.add_callback(CustomCallback(trainer)) trainer.train() ...
    
© www.soinside.com 2019 - 2024. All rights reserved.