pytorch中设置权重的计算图

Question

我需要澄清为FastAI2库中的某些功能编写的代码。

这是用FastAI2库编写的代码WeightDropout。

 class WeightDropout(Module):
        "A module that warps another layer in which some weights will be replaced by 0 during training."

        def __init__(self, module, weight_p, layer_names='weight_hh_l0'):
            self.module,self.weight_p,self.layer_names = module,weight_p,L(layer_names)
            for layer in self.layer_names:
                #Makes a copy of the weights of the selected layers.
                w = getattr(self.module, layer)
                delattr(self.module, layer)
                self.register_parameter(f'{layer}_raw', nn.Parameter(w.data))
                setattr(self.module, layer, F.dropout(w.data, p=self.weight_p, training=False))
                if isinstance(self.module, (nn.RNNBase, nn.modules.rnn.RNNBase)):
                    self.module.flatten_parameters = self._do_nothing

        def _setweights(self):
            "Apply dropout to the raw weights."
            for layer in self.layer_names:
                raw_w = getattr(self, f'{layer}_raw')
                setattr(self.module, layer, F.dropout(raw_w.data, p=self.weight_p, training=self.training))

        def forward(self, *args):
            self._setweights()
            with warnings.catch_warnings():
                #To avoid the warning that comes because the weights aren't flattened.
                warnings.simplefilter("ignore")
                return self.module.forward(*args)

        def reset(self):
            for layer in self.layer_names:
                raw_w = getattr(self, f'{layer}_raw')
                setattr(self.module, layer, 
    F.dropout(raw_w.data, p=self.weight_p, training=False))
            if hasattr(self.module, 'reset'): self.module.reset()

        def _do_nothing(self): pass

以上代码在其中将权重随机放入隐藏层的weight矩阵中。我主要感兴趣的是，

 def _setweights(self):
                "Apply dropout to the raw weights."
                for layer in self.layer_names:
                    raw_w = getattr(self, f'{layer}_raw')
                    setattr(self.module, layer, F.dropout(raw_w.data, p=self.weight_p, training=self.training))

我的问题是，这种改变权重的操作是否记录在梯度计算中。

Answer 1

否，在计算图中未跟踪分配新的权重，因为分配没有导数，因此不可能通过它获得梯度。

然后为什么该代码有效？该模型不会覆盖实际参数，但会使用修改后的版本进行计算，同时保持原始权重不变。有点晦涩，但是最重要的部分是在创建模型时复制参数的时间：

#Makes a copy of the weights of the selected layers.
w = getattr(self.module, layer)
delattr(self.module, layer)
self.register_parameter(f'{layer}_raw', nn.Parameter(w.data))

这里发生的是，对于每个参数，您创建一个以_raw结尾的副本。例如，如果模型上有一个线性层（例如self.linear1 = nn.Linear(2, 4)，则有两个名称分别为linear1.weight和linear1.bias的参数。现在将它们复制到linear1.weight_raw和linear1.bias_raw。准确地说，它们不会被复制，而是被重新分配给*_raw属性，然后删除原始属性，因此将它们从原始版本移到原始版本。由于它们不再是参数，因此需要删除原始版本。优化/学习）。

[之后，当应用丢弃时，经过优化/学习的参数（*_raw版本）不变，但是用于实际计算的权重是一些权重被随机丢弃的权重。在带有线性层的示例中，如果您手动进行计算，则如下所示：

# A dummy input
input = torch.randn(1, 2)

# The raw parameters of the linear layer, randomly initialised
weight_raw = nn.Parameter(torch.randn(4, 2))
bias_raw = nn.Parameter(torch.randn(4))

# Randomly dropping elements of the parameters with 50% probability
weight = F.dropout(weight_raw, p=0.5)
bias = F.dropout(bias_raw, p=0.5)

# Calculation of the linear layer (forward)
output = torch.matmul(input, weight.transpose(0, 1)) + bias

由此您可以看到，没有实际的重新分配，只有您熟悉的常规计算流程。

现在您可能想知道为什么要创建这些*_raw参数，而不是在前向传递中应用dropout（就像上面的示例一样）。这样做的原因是避免必须重新实现前向传递，否则每个模块都需要修改其前向方法，但是由于它们在模块之间差异很大，因此无法以通用方式完成。这种方法实质上劫持了参数，因此前向传递使用它们的修改版本。

从上面继续示例：

# Using the actual module for the same calculation
linear1 = nn.Linear(2, 4)

# Delete the parameters, so that regular tensors can be assigned to them
# Otherwise it throws an error that the tensor is not an nn.Parameter
del linear1.weight
del linear1.bias

# Assign the parameters with dropped elements
linear1.weight = weight
linear1.bias = bias

# Run the forward pass directly
output_linear1 = linear1(input)

torch.equal(output, output_linear1) # => True

最重要的是，这些参数是从模块中提取的，并且向前传递使用修改后的版本（在删除之后）进行计算，它们不再是参数，而是中间结果。

pytorch中设置权重的计算图

问题描述投票：1回答：1

1个回答

最新问题

pytorch中设置权重的计算图

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1