我正在尝试计算 Pytorch 图像分类器模型单独相对于每个类的梯度,例如
outputs = net(inputs)[0] # assuming we only consider the first sample of the batch
grads = [torch.autograd.grad(outputs[i], inputs, retain_graph=True)
for i in range(len(outputs))]
但是,在
torch.autograd.grad
文档中,它指出
'''
请注意,几乎在所有情况下,都不需要将此选项 (retain_graph) 设置为 True,并且通常可以通过更有效的方式解决
'''
Bing AI建议使用
identity = torch.eye(len(outputs))
outputs.backward(gradient=identity)
但显然不行
RuntimeError: Mismatch in shape: grad_output[0] has a shape of torch.Size([9, 9]) and outputs has a shape of torch.Size([9]).
(这里图像分类器有 9 个类别)
因此我想知道在这种情况下是否有更有效的方法,如果有,如何实现?
感谢您的帮助!
您在
outputs.backward(gradient=identity)
中遇到的问题是因为 identity
的形状错误。
通常梯度的参数应该与您调用
backward
的张量具有相同的形状。我说“通常”是因为从技术上讲,a.backward(gradient=b)
计算b^T @ Jacobian(a)
,所以它取决于雅可比行列式的形状,但是通常当您在模型的输出上调用backward
时,您想要相同的形状。
这是在矢量输出上调用
backward
的标准方法:
x = torch.randn(32, 4)
model = nn.Linear(4, 2)
y = model(x)
input_grad = torch.ones_like(y) # create vector of ones the same shape as y
y.backward(input_grad)
model.weight.grad
> tensor([[ 1.1243, 2.3456, 2.3656, -8.8384],
[ 1.1243, 2.3456, 2.3656, -8.8384]])
如果您想屏蔽梯度计算中的部分输出,您可以将相关
input_grad
值设置为 0
。
例如,这屏蔽了输出张量的第二列:
x = torch.randn(32, 4)
model = nn.Linear(4, 2)
y = model(x)
input_grad = torch.ones_like(y) # create vector of ones the same shape as y
input_grad[:,1] = 0. # mask the second column of the output
y.backward(input_grad)
# masking changes gradient output
model.weight.grad
> tensor([[ 1.1243, 2.3456, 2.3656, -8.8384],
[ 0.0000, 0.0000, 0.0000, 0.0000]])
此示例仅通过批次中的前四个项目进行反向传播:
x = torch.randn(32, 4)
model = nn.Linear(4, 2)
y = model(x)
input_grad = torch.ones_like(y) # create vector of ones the same shape as y
input_grad[4:] = 0. # only backprop through the first 4 items in the batch
y.backward(input_grad)
# masking changes gradient output
model.weight.grad
> tensor([[-0.2436, 0.8189, -0.1244, -0.4814],
[-0.2436, 0.8189, -0.1244, -0.4814]])
要从特定切片进行反向传播,请获取该切片并使用相同形状的张量:
x = torch.randn(32, 4)
model = nn.Linear(4, 2)
y = model(x)
y_slice = y[7] # grab specific batch item
input_grad = torch.ones_like(y_slice) # create vector of ones the same shape as y_slice
y_slice.backward(input_grad)
model.weight.grad
> tensor([[-0.1745, -1.1161, -0.8109, -0.6540],
[-0.1745, -1.1161, -0.8109, -0.6540]])
现在,对于计算不同类相对于输入的梯度的示例,您可以执行以下操作:
# inputs must have `requires_grad=True` if you want to backprop into them
x = torch.randn(32, 4, requires_grad=True)
# model has two output classes
model = nn.Linear(4, 2)
# y is size `(32, 2)` for 32 batch items and 2 classes
y = model(x)
grads = []
for batch_idx in range(x.shape[0]): # iterate over batch items
for class_idx in range(y.shape[1]): # iterate over classes
y_slice = y[batch_idx, class_idx] # get specific output value
# compute grad with retain graph
# note that the second argument must be `x`, you cannot
# pass a slice of `x` because it was not used in the
# compute graph that produced `y`
grad = torch.autograd.grad(y_slice, x, grad_outputs=torch.ones_like(y_slice), retain_graph=True)
# grad output is a tulple of shape `(grad,)
# grad[0] grabs the actual grad tensor
# the grad tensor is the same shape of `x`, but due to backproping from y_slice,
# all items except `batch_idx` are zero.
# `grad[0][batch_idx]` gets the gradient specifically of item `batch_idx` wrt `class_idx`
grad = grad[0][batch_idx]
# save tuple of batch_idx, class_idx, grad
grads.append((batch_idx, class_idx, grad))
为了进行有趣的练习,请运行上面的代码并查看每个类的梯度:
print([i[-1] for i in grads if i[1]==0])
print([i[-1] for i in grads if i[1]==1])
您会注意到对于给定类,每个输入项的梯度值是相同的。想想为什么这是有道理的。