运行时错误：排列（sparse_coo）：张量输入中的维度数与所需维度排序的长度不匹配

Question

所以，我使用这个剪辑模型来执行一些标记任务。但是当我使用剪辑模型的文本编码器时，它给出以下错误：

    <ipython-input-117-4c513cc2d787> in forward(self, batch)
     34     print(y.size())
     35     print(y.dim())
---> 36     y = self.text_encoder(y)
     37     y = self.classifier(y)
     38 

/usr/local/lib/python3.10/dist-packages/clip/model.py in encode_text(self, text)
    345         x = x + self.positional_embedding.type(self.dtype)
    346         x = x.permute(1, 0, 2)  # NLD -> LND
--> 347         x = self.transformer(x)
    348         x = x.permute(1, 0, 2)  # LND -> NLD
    349         x = self.ln_final(x).type(self.dtype)

RuntimeError: permute(sparse_coo): number of dimensions in the tensor input does not match the length of the desired ordering of dimensions i.e. input.dim() = 4 is not equal to len(dims) = 3

问题是，一张图像的标签有多个，因此我在数据加载器中使用 collate_fn 和 pad_sequence，然后再输入模型。

  def pad_sequence(batch):
    return torch.nn.utils.rnn.pad_sequence(batch, batch_first=True, padding_value=0)
  def my_collate_fn(batch):    
    batch['i'] = torch.stack(batch['i'].float())
    batch['l'] = pad_sequence(batch['l'].long())      
    return batch

 class CustomCLIP(torch.nn.Module):
  def __init__(self, num_classes: int = 10, bias=False):
    super().__init__()
    #model, _ = clip.load("RN50")

  def forward(self, batch): 
    x = batch['i']
    x = self.encoder(x)
    x = self.classifier(x)
    y = batch['l']
    print(y)
    print(y.size())
    print(y.dim())
    y = self.text_encoder(y) #error on this line
    y = self.classifier(y)
    x_similarity = x @ x.T
    y_similarity = y @ y.T
    targets = F.softmax(
      (x_similarity + y_similarity) / 2 * temperature, dim=-1
    )
    outputs = (y @ x.T) / temperature

    return outputs, targets

我已经打印出了

的尺寸。它的3与长度尺寸相匹配。那么为什么输入张量维度为 4 会出现错误呢？

        [[49406,   332, 49407,  ...,     0,     0,     0],
         [49406,   320, 49407,  ...,     0,     0,     0],
         [49406,   333, 49407,  ...,     0,     0,     0],
         ...,
         [    0,     0,     0,  ...,     0,     0,     0],
         [    0,     0,     0,  ...,     0,     0,     0],
         [    0,     0,     0,  ...,     0,     0,     0]]], device='cuda:0')
torch.Size([32, 392, 77])
3

有人请指出问题是什么以及如何解决。预先感谢。

Answer 1

我通过对张量使用squeeze()操作来匹配所需的维度长度（即3，其中输入为4）来解决这个问题。我首先检查了输入形状，它实际上是4。

运行时错误：排列（sparse_coo）：张量输入中的维度数与所需维度排序的长度不匹配

问题描述投票：0回答：1

1个回答

最新问题

运行时错误：排列（sparse_coo）：张量输入中的维度数与所需维度排序的长度不匹配

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1