在 torch.no_grad() 中使用 .grad 的问题

Question

我写了一些玩具示例来理解 torch.no_grad() 的工作原理：

# example 1
import torch
a = torch.randn(10, 5, requires_grad = True)
z = a * 3
l = z - 0.5
l.sum().backward()
with torch.no_grad():
  a -= a.grad
  print(a.requires_grad)
# True

因此，

a -= a.grad

内的

with torch.no_grad()

将保持a.requires_grad = True

# example2
import torch
a = torch.randn(10, 5, requires_grad = True)
z = a * 3
l = z - 0.5
l.sum().backward()
with torch.no_grad():
  a = a - a.grad
  print(a.requires_grad)
# False

但是，

a -= 1

内部的

with torch.no_grad()

将设置a.requires_grad = False

# example3
import torch
a = torch.randn(10, 5, requires_grad = True)
z = a * 3
l = z - 0.5
l.sum().backward()
a -= a.grad
print(a.requires_grad)
# RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.

没有

a -= a.grad

的

with torch.no_grad()

将抛出 RuntimeError （但

a -= 1

不会）

我找不到上述结果的解释。有人可以指点方向吗？非常感谢！

Answer 1

这个

a.grad

是 None，并且在第一次调用

backward()

时变成张量。

grad

属性将包含计算出的梯度，并且将来对

backward()

的调用将累积（添加）梯度到其中。

如果您使用

with torch.no_grad():

，则 PyTorch autograd 引擎将被禁用，因此您不会遇到错误。

requires_grad = True

是记录张量梯度的指标。

您在上一个示例中遇到的错误告诉您不能使用

a.grad

进行就地操作，但您可以仅使用

a=a-a.grad

作为示例：

# example3
import torch
a = torch.randn(10, 5, requires_grad = True)
z = a * 3
l = z - 0.5
l.sum().backward()
a = a -a.grad
print(a.grad)
print(a.requires_grad)

出：

None
True

/usr/local/lib/python3.7/dist-packages/torch/_tensor.py:1013: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at  aten/src/ATen/core/TensorBody.h:417.)
  return self._grad

Answer 2

我认为上面的答案没有明确解释这3种情况。

案例1

a -= a.grad

是就地操作，因此它不会更改 a 的属性

require_grad

。于是

a.require = True

案例2

a = a - a.grad

不是就地操作，因此它在内存空间中创建一个新的张量对象。由于在无分级模式下，

a.require = False

案例3

您无法在

require grad = True

模式之外使用

no grad

更改叶张量。

我也尝试了下面的代码

import torch
a = torch.tensor([1.1], requires_grad=True)
b = a ** 2
b.backward()
a -= 1

它确实会抛出相同的运行时错误！

希望我的回答对你有帮助

在 torch.no_grad() 中使用 .grad 的问题

问题描述投票：0回答：2

2个回答

案例1

案例2

案例3

最新问题

在 torch.no_grad() 中使用 .grad 的问题

问题描述 投票：0回答：2

2个回答

案例1

案例2

案例3

最新问题

问题描述投票：0回答：2