Transformers Trainer 的批量和 Epoch 训练指标

0
投票

这里有一个方法，我还提供了一种获取每个批次的指标的方法：

总结：

computer_loss()

 函数，以获取每批结束时的预测、标签和损失。

CustomCallback

类+自定义该类中的

on_epoch_end()

函数来获取、连接、计算和返回指标（我使用了loss、acc、recall、prec.和f1）

扩展及代码：

注释：在我的例子中，我直接使用 wandb 进行绘图，而不是返回值 + 我的分类是二进制的，我不确定是否需要更改某些内容以进行多类预测 + 我在单个 GPU 上运行。

步骤 1 和 2：

CustomTrainer(transformers.Trainer):

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.epoch_predictions = []
        self.epoch_labels = []
        self.epoch_loss = []

    def compute_loss(self, model, inputs, return_outputs=False):
        """
        MAX: Subclassed to compute training accuracy.

        How the loss is computed by Trainer. By default, all models return the loss in
        the first element.

        Subclass and override for custom behavior.
        """
        if self.label_smoother is not None and "labels" in inputs:
            labels = inputs.pop("labels")
        else:
            labels = None
        outputs = model(**inputs)

        if "labels" in inputs:
            preds = outputs.logits.detach()

            # Log accuracy
            acc = (
                (preds.argmax(axis=1) == inputs["labels"])
                .type(torch.float)
                .mean()
                .item()
            )
            # Uncomment it if you want to plot the batch accuracy
            # wandb.log({"batch_accuracy": acc})  # Log accuracy

            # Store predictions and labels for epoch-level metrics
            self.epoch_predictions.append(preds.cpu().numpy())
            self.epoch_labels.append(inputs["labels"].cpu().numpy())

        # Save past state if it exists
        if self.args.past_index >= 0:
            self._past = outputs[self.args.past_index]

        if labels is not None:
            loss = self.label_smoother(outputs, labels)
        else:
            loss = outputs["loss"] if isinstance(outputs, dict) else outputs[0]
            # Uncomment it if you want to plot the batch loss
            # wandb.log({"batch_loss": loss})
            self.epoch_loss.append(loss.item())  # Store loss for epoch-level metrics

        return (loss, outputs) if return_outputs else loss

步骤3：

class CustomCallback(TrainerCallback):

    def __init__(self, trainer) -> None:
        super().__init__()
        self._trainer = trainer

    def on_epoch_end(self, args, state, control, **kwargs):
        # Aggregate predictions and labels for the entire epoch
        epoch_predictions = np.concatenate(self._trainer.epoch_predictions)
        epoch_labels = np.concatenate(self._trainer.epoch_labels)

        # Compute accuracy
        accuracy = np.mean(epoch_predictions.argmax(axis=1) == epoch_labels)

        # Compute mean loss
        mean_loss = np.mean(self._trainer.epoch_loss)

        # Log epoch-level metrics
        wandb.log({"epoch_accuracy": accuracy, "epoch_loss": mean_loss})

        # Clear stored predictions, labels, and loss for the next epoch
        self._trainer.epoch_predictions = []
        self._trainer.epoch_labels = []
        self._trainer.epoch_loss = []
        return None

第四步：

...
    trainer = CustomTrainer(model=model,
                            tokenizer=tokenizer,
                            args=training_args,
                            compute_metrics=compute_metrics,
                            train_dataset=train_dataset,
                            eval_dataset=val_dataset,
                            data_collator=data_collator)

    trainer.add_callback(CustomCallback(trainer))
    trainer.train()
...

问题描述投票：0回答：1

1个回答

最新问题

Transformers Trainer 的批量和 Epoch 训练指标

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1