错误:尝试在自定义 HF 数据集上使用 trainer.train() 时,vars() 参数必须具有 __dict__ 属性?

问题描述 投票:0回答:1

我有以下模型正在尝试微调(CLIP_ViT + 分类头)。这是我的模型定义:

class CLIPNN(nn.Module):

    def __init__(self, num_labels, pretrained_name="openai/clip-vit-base-patch32", dropout=0.1):
        super().__init__()
        self.num_labels = num_labels
        # load pre-trained transformer & processor
        self.transformer = CLIPVisionModel.from_pretrained(pretrained_name)
        self.processor = CLIPProcessor.from_pretrained(pretrained_name)
        # initialize other layers (head after the transformer body)
        self.classifier = nn.Sequential(
            nn.Linear(512, 128, bias=True),
            nn.ReLU(inplace=True),
            nn.Dropout(p=dropout, inplace=False),
            nn.Linear(128, self.num_labels, bias=True))
        
        def forward(self, inputs, labels=None, **kwargs):
            logits = self.classifier(inputs)
            loss = None
            if labels is not None:
                loss_fct = nn.CrossEntropyLoss()
                loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))

            return SequenceClassifierOutput(
                loss=loss,
                logits=logits,
            )

我对数据集还有以下定义:

class CLIPDataset(nn.utils.data.Dataset):
    def __init__(self, embeddings, labels):
        self.embeddings = embeddings
        self.labels = labels

    def __getitem__(self, idx):
        item = {"embeddings": nn.Tensor(self.embeddings[idx])}
        item['labels'] = nn.LongTensor([self.labels[idx]])
        return item

    def __len__(self):
        return len(self.labels)

注意:这里我假设模型是预先计算的嵌入并且不计算嵌入,我知道如果我想微调 CLIP 基础模型,这不是正确的逻辑,我只是想获取我的代码去工作。

类似这样的事情会引发错误:

model = CLIPNN(num_labels=2)
train_data = CLIPDataset(train_data, y_train)
test_data = CLIPDataset(test_data, y_test)

trainer = Trainer(
    model=model, args=training_args, train_dataset=train_data, eval_dataset=test_data
)
trainer.train()

类型错误回溯(最近一次调用) ----> 1 个trainer.train()

~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/transformers/trainer.py 在火车中(自我,resume_from_checkpoint,审判,ignore_keys_for_eval, **kwargs)第1256章 self.control = self.callback_handler.on_epoch_begin(args,self.state,self.control) 1257 → 1258 为步骤,输入 enumerate(epoch_iterator): 1259 1260 # 如果恢复训练,请跳过任何已训练的步骤

~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/utils/data/dataloader.py 在 next(self) 515 如果 self._sampler_iter 为 None: 516 self._reset() → 517 数据 = self._next_data() 518 self._num_yielded += 1 519 if self._dataset_kind == _DatasetKind.Iterable 和 \

~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/utils/data/dataloader.py 在 _next_data(self) 555 def _next_data(self): 556 索引 = self._next_index() # 可能引发 StopIteration → 557 data = self._dataset_fetcher.fetch(index) # 可能会引发 StopIteration 558 如果 self._pin_内存:559数据=_utils.pin_内存.pin_内存(数据)

~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py 在 fetch(self, possible_batched_index) 45 else: 46 data = self.dataset[possible_batched_index] —> 47 返回 self.collate_fn(数据)

~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/transformers/data/data_collator.py 在 default_data_collator(features, return_tensors) 64 65 如果 return_tensors == “pt”: —> 66 返回 torch_default_data_collator(features) 67 elif return_tensors == “tf”: 68 返回 tf_default_data_collator(features)

~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/transformers/data/data_collator.py 在 torch_default_data_collator(features) 80 81 如果没有 isinstance(features[0], (dict, BatchEncoding)): —> 82 个特征 = [特征中 f 的 vars(f)] 83 第一个 = 特征 [0] 84 批次 = {}

~/anaconda3/envs/pytorch_latest_p37/lib/python3.7/site-packages/transformers/data/data_collator.py 在 (.0) 80 81 中,如果不是 isinstance(features[0], (dict, BatchEncoding)): —> 82 特征 = [vars(f) for f in features] 83 首先 = 特征[0] 84 批次 = {}

类型错误:vars() 参数必须具有 dict 属性

知道我做错了什么吗?

dictionary machine-learning pytorch dataset huggingface-transformers
1个回答
-1
投票

您需要将

label_names
属性添加到您的
Trainer

trainer = Trainer(
    model=model, args=training_args, train_dataset=train_data, label_names=['labels'], eval_dataset=test_data
)
© www.soinside.com 2019 - 2024. All rights reserved.