我在 Pytorch 中构建简单的神经网络时遇到了这个奇怪的错误。我不明白这个错误以及为什么在向后函数中使用 Long 和 Float 数据类型。有人遇到过这个吗?感谢您的帮助。
Traceback (most recent call last):
File "test.py", line 30, in <module>
loss.backward()
File "/home/liuyun/anaconda3/envs/torch/lib/python3.7/site-packages/torch/tensor.py", line 198, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/liuyun/anaconda3/envs/torch/lib/python3.7/site-packages/torch/autograd/__init__.py", line 100, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: expected dtype Float but got dtype Long (validate_dtype at /opt/conda/conda-bld/pytorch_1587428398394/work/aten/src/ATen/native/TensorIterator.cpp:143)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x4e (0x7f5856661b5e in /home/liuyun/anaconda3/envs/torch/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: at::TensorIterator::compute_types() + 0xce3 (0x7f587e3dc793 in /home/liuyun/anaconda3/envs/torch/lib/python3.7/site
-packages/torch/lib/libtorch_cpu.so)
frame #2: at::TensorIterator::build() + 0x44 (0x7f587e3df174 in /home/liuyun/anaconda3/envs/torch/lib/python3.7/site-packages
/torch/lib/libtorch_cpu.so)
frame #3: at::native::smooth_l1_loss_backward_out(at::Tensor&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long)
+ 0x193 (0x7f587e22cf73 in /home/liuyun/anaconda3/envs/torch/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0xe080b7 (0x7f58576960b7 in /home/liuyun/anaconda3/envs/torch/lib/python3.7/site-packages/torc
h/lib/libtorch_cuda.so)
frame #5: at::native::smooth_l1_loss_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, long) + 0x16e (0x7f587
e23569e in /home/liuyun/anaconda3/envs/torch/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #6: <unknown function> + 0xed98af (0x7f587e71c8af in /home/liuyun/anaconda3/envs/torch/lib/python3.7/site-packages/torc
h/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0xe22286 (0x7f587e665286 in /home/liuyun/anaconda3/envs/torch/lib/python3.7/site-packages/torc
h/lib/libtorch_cpu.so)
这是源代码:
import torch
import torch.nn as nn
import numpy as np
import torchvision
from torchvision import models
from UTKLoss import MultiLoss
from ipdb import set_trace
# out features [13, 2, 5]
model_ft = models.resnet18(pretrained=True)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 20)
model_ft.cuda()
criterion = MultiLoss()
optimizer = torch.optim.Adam(model_ft.parameters(), lr = 1e-3)
image = torch.randn((1, 3, 128, 128)).cuda()
age = torch.randint(110, (1,)).cuda()
gender = torch.randint(2, (1,)).cuda()
race = torch.randint(5, (1,)).cuda()
optimizer.zero_grad()
output = model_ft(image)
age_loss, gender_loss, race_loss = criterion(output, age, gender, race)
loss = age_loss + gender_loss + race_loss
loss.backward()
optimizer.step()
这是我定义的损失函数
import torch
import torch.nn as nn
import torch.nn.functional as F
class MultiLoss(nn.Module):
def __init__(self):
super().__init__()
def forward(self, output, age, gender, race):
age_pred = output[:, :13]
age_pred = torch.sum(age_pred, 1)
gender_pred = output[:, 13: 15]
race_pred = output[:, 15:]
age_loss = F.smooth_l1_loss(age_pred.view(-1, 1), age.cuda())
gender_loss = F.cross_entropy(gender_pred, torch.flatten(gender).cuda(), reduction='sum')
race_loss = F.cross_entropy(race_pred, torch.flatten(race).cuda(), reduction='sum')
return age_loss, gender_loss, race_loss
将
criterion
调用更改为:
age_loss, gender_loss, race_loss = criterion(output, age.float(), gender, race)
如果您查看错误,我们可以追踪到:
frame #3: at::native::smooth_l1_loss_backward_out
在 MultiLoss 类中,
smooth_l1_loss
与 age
一起使用。因此,我将其类型更改为 float(因为预期的 dtype 是 Float),同时将其传递给 criterion
。您可以通过打印 torch.int64
来检查年龄是否为
torch.long
(即
age.dtype
)
执行此操作后我没有收到错误。希望有帮助。
检查“输出”、“年龄”、“性别”、“种族”的数据类型
可能会有差异
"torch.float32"
"torch.float64"
将它们设置为同一类型。它会修复错误
此错误可能不直接与数据类型相关。我正在训练一个拥抱面部文本分类流程,该流程与较大的数据集完美配合,但是当我使用该数据集的小型版本对其进行测试时,我遇到了此错误。
对我来说,问题是我的小数据集只有 1 行数据。一旦我传入更大的数据集(训练中 4 行,测试中 2 行),问题就消失了。
这不是一个非常完整的解决方案,因为我没有进一步调试,但我认为它可能对其他人有帮助。我想象在拥抱脸的 Trainer
train
代码中的某个地方,当只有 1 个数据行时,会产生 Long 类型。