Pytorch torchvision，负样本，ValueError 预期目标框是形状为 [N, 4] 的张量，得到 torch.Size([0])

Question

我想将没有边界框的图像添加到我的数据集中。

当我添加没有 xml 文件的图像时，出现此错误。

ValueError                                Traceback (most recent call last)
Input In [14], in <module>
      4 torch.cuda.empty_cache()
      6 for epoch in range(num_epochs):
      7     # training for one epoch
----> 8     train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
      9     # update the learning rate
     10     lr_scheduler.step()

File /notebooks/ml639a/pt651m/engine.py:31, in train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq, scaler)
     29 targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
     30 with torch.cuda.amp.autocast(enabled=scaler is not None):
---> 31     loss_dict = model(images, targets)
     32     losses = sum(loss for loss in loss_dict.values())
     34 # reduce losses over all GPUs for logging purposes

File /opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py:1110, in Module._call_impl(self, *input, **kwargs)
   1106 # If we don't have any hooks, we want to skip the rest of the logic in
   1107 # this function, and just call forward.
   1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110     return forward_call(*input, **kwargs)
   1111 # Do not call functions when jit is used
   1112 full_backward_hooks, non_full_backward_hooks = [], []
...
---> 68         raise ValueError(f"Expected target boxes to be a tensor of shape [N, 4], got {boxes.shape}.")
     69 else:
     70     raise ValueError(f"Expected target boxes to be of type Tensor, got {type(boxes)}.")

ValueError: Expected target boxes to be a tensor of shape [N, 4], got torch.Size([0]).

我看到了这个。 https://github.com/pytorch/vision/releases/tag/v0.6.0

现在可以将训练图像提供给 Faster / Mask / Keypoint R-CNN 不包含任何正面注释。这使得训练时增加负样本的数量。对于那些图像，注释期望张量的数量为 0 物体尺寸，...

还有这个例子。 https://github.com/pytorch/vision/blob/f9ef235c402f48a335293c626e17bd8504d3af87/test/test_models_detection_negative_samples.py#L16

这里提到了这一点.. https://github.com/pytorch/vision/issues/2144 和这里.. https://discuss.pytorch.org/t/can-i-feed-a-model-with-some-background-only-images/76279/6

这是我的

__getitem__

，基于上面的参考资料。

def __getitem__(self, idx):

    img_name = self.imgs[idx]
    image_path = os.path.join(self.files_dir, img_name)
    ...
    # annotation file
    annot_filename = img_name[:-4] + '.xml'
    annot_file_path = os.path.join(self.files_dir, annot_filename)
    boxes = []
    labels = []
    # if there is an xml file then parse it, otherwise 
    if os.path.exists(annot_file_path):
        tree = et.parse(annot_file_path)
        root = tree.getroot()
        # cv2 image gives size as height x width
        wt = img.shape[1]
        ht = img.shape[0]
        # box coordinates for xml files are extracted and corrected for image size given
        for member in root.findall('object'):
            labels.append(self.classes.index(member.find('name').text))
            # bounding box
            xmin = int(member.find('bndbox').find('xmin').text)
            xmax = int(member.find('bndbox').find('xmax').text)
            ymin = int(member.find('bndbox').find('ymin').text)
            ymax = int(member.find('bndbox').find('ymax').text)
            xmin_corr = (xmin/wt)*self.width
            xmax_corr = (xmax/wt)*self.width
            ymin_corr = (ymin/ht)*self.height
            ymax_corr = (ymax/ht)*self.height
            boxes.append([xmin_corr, ymin_corr, xmax_corr, ymax_corr])
        # convert boxes into a torch.Tensor
        boxes = torch.as_tensor(boxes, dtype=torch.float32)
        # getting the areas of the boxes
        area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
        # suppose all instances are not crowd
        iscrowd = torch.zeros((boxes.shape[0],), dtype=torch.int64)
        labels = torch.as_tensor(labels, dtype=torch.int64)
        target = {}
        target["boxes"] = boxes
        target["labels"] = labels
        target["area"] = area
        target["iscrowd"] = iscrowd
        image_id = torch.tensor([idx])
        target["image_id"] = image_id
    else:  
        image_id = torch.tensor([idx])
        target = {"boxes": torch.zeros((0, 4), dtype=torch.float32),
            "labels": torch.zeros(0, dtype=torch.int64),
            "image_id": torch.tensor([idx]),
            "area": torch.zeros(0, dtype=torch.float32),
            "iscrowd": torch.zeros((0,), dtype=torch.int64)}
        
    if self.transforms:
        sample = self.transforms(image = img_res,
                                    bboxes = target['boxes'],
                                    labels = labels)
        img_res = sample['image']
        target['boxes'] = torch.Tensor(sample['bboxes'])
                    
    return img_res, target

所有代码都在这里： https://github.com/dgleba/r655q/blob/main/negim/pt651m_ir4f_gi-negim.ipynb

任何人都可以看到我做错了什么吗？

Answer 1

尝试为没有框的图像设置

bboxes = torch.zeros(0,4)

。它对我有用。

Answer 2

torch.Tensor(boxes).reshape(-1, 4) 将更改为预期尺寸。

Pytorch torchvision，负样本，ValueError 预期目标框是形状为 [N, 4] 的张量，得到 torch.Size([0])

问题描述投票：0回答：2

2个回答

最新问题

Pytorch torchvision，负样本，ValueError 预期目标框是形状为 [N, 4] 的张量，得到 torch.Size([0])

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2