在 Pytorch 中单独计算每个类的梯度的有效方法

问题描述 投票:0回答:1

我正在尝试计算 Pytorch 图像分类器模型单独相对于每个类的梯度,例如

outputs = net(inputs)[0] # assuming we only consider the first sample of the batch
grads = [torch.autograd.grad(outputs[i], inputs, retain_graph=True) 
         for i in range(len(outputs))]

但是,在

torch.autograd.grad
文档中,它指出 ''' 请注意,几乎在所有情况下,都不需要将此选项 (retain_graph) 设置为 True,并且通常可以通过更有效的方式解决 '''

Bing AI建议使用

identity = torch.eye(len(outputs))
outputs.backward(gradient=identity)

但显然不行

RuntimeError: Mismatch in shape: grad_output[0] has a shape of torch.Size([9, 9]) and outputs has a shape of torch.Size([9]).

(这里图像分类器有 9 个类别)

因此我想知道在这种情况下是否有更有效的方法,如果有,如何实现?

感谢您的帮助!

python deep-learning pytorch neural-network autograd
1个回答
0
投票

您在

outputs.backward(gradient=identity)
中遇到的问题是因为
identity
的形状错误。

通常梯度的参数应该与您调用

backward
的张量具有相同的形状。我说“通常”是因为从技术上讲,
a.backward(gradient=b)
计算
b^T @ Jacobian(a)
,所以它取决于雅可比行列式的形状,但是通常当您在模型的输出上调用
backward
时,您想要相同的形状。

这是在矢量输出上调用

backward
的标准方法:

x = torch.randn(32, 4)
model = nn.Linear(4, 2)
y = model(x)

input_grad = torch.ones_like(y) # create vector of ones the same shape as y
y.backward(input_grad)
model.weight.grad

> tensor([[ 1.1243,  2.3456,  2.3656, -8.8384],
          [ 1.1243,  2.3456,  2.3656, -8.8384]])

如果您想屏蔽梯度计算中的部分输出,您可以将相关

input_grad
值设置为
0

例如,这屏蔽了输出张量的第二列:

x = torch.randn(32, 4)
model = nn.Linear(4, 2)
y = model(x)

input_grad = torch.ones_like(y) # create vector of ones the same shape as y
input_grad[:,1] = 0. # mask the second column of the output

y.backward(input_grad)

# masking changes gradient output
model.weight.grad

> tensor([[ 1.1243,  2.3456,  2.3656, -8.8384],
          [ 0.0000,  0.0000,  0.0000,  0.0000]])

此示例仅通过批次中的前四个项目进行反向传播:

x = torch.randn(32, 4)
model = nn.Linear(4, 2)
y = model(x)

input_grad = torch.ones_like(y) # create vector of ones the same shape as y
input_grad[4:] = 0. # only backprop through the first 4 items in the batch

y.backward(input_grad)

# masking changes gradient output
model.weight.grad

> tensor([[-0.2436,  0.8189, -0.1244, -0.4814],
          [-0.2436,  0.8189, -0.1244, -0.4814]])

要从特定切片进行反向传播,请获取该切片并使用相同形状的张量:

x = torch.randn(32, 4)
model = nn.Linear(4, 2)
y = model(x)

y_slice = y[7] # grab specific batch item

input_grad = torch.ones_like(y_slice) # create vector of ones the same shape as y_slice

y_slice.backward(input_grad)

model.weight.grad

> tensor([[-0.1745, -1.1161, -0.8109, -0.6540],
          [-0.1745, -1.1161, -0.8109, -0.6540]])

现在,对于计算不同类相对于输入的梯度的示例,您可以执行以下操作:

# inputs must have `requires_grad=True` if you want to backprop into them
x = torch.randn(32, 4, requires_grad=True)

# model has two output classes
model = nn.Linear(4, 2)

# y is size `(32, 2)` for 32 batch items and 2 classes
y = model(x)

grads = []

for batch_idx in range(x.shape[0]): # iterate over batch items
    for class_idx in range(y.shape[1]): # iterate over classes
        y_slice = y[batch_idx, class_idx] # get specific output value

        # compute grad with retain graph
        # note that the second argument must be `x`, you cannot 
        # pass a slice of `x` because it was not used in the 
        # compute graph that produced `y`
        grad = torch.autograd.grad(y_slice, x, grad_outputs=torch.ones_like(y_slice), retain_graph=True)
        
        # grad output is a tulple of shape `(grad,)
        # grad[0] grabs the actual grad tensor
        # the grad tensor is the same shape of `x`, but due to backproping from y_slice,
        # all items except `batch_idx` are zero.
        # `grad[0][batch_idx]` gets the gradient specifically of item `batch_idx` wrt `class_idx`
        grad = grad[0][batch_idx]
        
        # save tuple of batch_idx, class_idx, grad
        grads.append((batch_idx, class_idx, grad))

为了进行有趣的练习,请运行上面的代码并查看每个类的梯度:

print([i[-1] for i in grads if i[1]==0])
print([i[-1] for i in grads if i[1]==1])

您会注意到对于给定类,每个输入项的梯度值是相同的。想想为什么这是有道理的。

© www.soinside.com 2019 - 2024. All rights reserved.