我需要澄清为FastAI2库中的某些功能编写的代码。
这是用FastAI2库编写的代码WeightDropout。
class WeightDropout(Module):
"A module that warps another layer in which some weights will be replaced by 0 during training."
def __init__(self, module, weight_p, layer_names='weight_hh_l0'):
self.module,self.weight_p,self.layer_names = module,weight_p,L(layer_names)
for layer in self.layer_names:
#Makes a copy of the weights of the selected layers.
w = getattr(self.module, layer)
delattr(self.module, layer)
self.register_parameter(f'{layer}_raw', nn.Parameter(w.data))
setattr(self.module, layer, F.dropout(w.data, p=self.weight_p, training=False))
if isinstance(self.module, (nn.RNNBase, nn.modules.rnn.RNNBase)):
self.module.flatten_parameters = self._do_nothing
def _setweights(self):
"Apply dropout to the raw weights."
for layer in self.layer_names:
raw_w = getattr(self, f'{layer}_raw')
setattr(self.module, layer, F.dropout(raw_w.data, p=self.weight_p, training=self.training))
def forward(self, *args):
self._setweights()
with warnings.catch_warnings():
#To avoid the warning that comes because the weights aren't flattened.
warnings.simplefilter("ignore")
return self.module.forward(*args)
def reset(self):
for layer in self.layer_names:
raw_w = getattr(self, f'{layer}_raw')
setattr(self.module, layer,
F.dropout(raw_w.data, p=self.weight_p, training=False))
if hasattr(self.module, 'reset'): self.module.reset()
def _do_nothing(self): pass
以上代码在其中将权重随机放入隐藏层的weight
矩阵中。我主要感兴趣的是,
def _setweights(self):
"Apply dropout to the raw weights."
for layer in self.layer_names:
raw_w = getattr(self, f'{layer}_raw')
setattr(self.module, layer, F.dropout(raw_w.data, p=self.weight_p, training=self.training))
我的问题是,这种改变权重的操作是否记录在梯度计算中。
否,在计算图中未跟踪分配新的权重,因为分配没有导数,因此不可能通过它获得梯度。
然后为什么该代码有效?该模型不会覆盖实际参数,但会使用修改后的版本进行计算,同时保持原始权重不变。有点晦涩,但是最重要的部分是在创建模型时复制参数的时间:
#Makes a copy of the weights of the selected layers.
w = getattr(self.module, layer)
delattr(self.module, layer)
self.register_parameter(f'{layer}_raw', nn.Parameter(w.data))
这里发生的是,对于每个参数,您创建一个以_raw
结尾的副本。例如,如果模型上有一个线性层(例如self.linear1 = nn.Linear(2, 4)
,则有两个名称分别为linear1.weight
和linear1.bias
的参数。现在将它们复制到linear1.weight_raw
和linear1.bias_raw
。准确地说,它们不会被复制,而是被重新分配给*_raw
属性,然后删除原始属性,因此将它们从原始版本移到原始版本。由于它们不再是参数,因此需要删除原始版本。优化/学习)。
[之后,当应用丢弃时,经过优化/学习的参数(*_raw
版本)不变,但是用于实际计算的权重是一些权重被随机丢弃的权重。在带有线性层的示例中,如果您手动进行计算,则如下所示:
# A dummy input
input = torch.randn(1, 2)
# The raw parameters of the linear layer, randomly initialised
weight_raw = nn.Parameter(torch.randn(4, 2))
bias_raw = nn.Parameter(torch.randn(4))
# Randomly dropping elements of the parameters with 50% probability
weight = F.dropout(weight_raw, p=0.5)
bias = F.dropout(bias_raw, p=0.5)
# Calculation of the linear layer (forward)
output = torch.matmul(input, weight.transpose(0, 1)) + bias
由此您可以看到,没有实际的重新分配,只有您熟悉的常规计算流程。
现在您可能想知道为什么要创建这些*_raw
参数,而不是在前向传递中应用dropout(就像上面的示例一样)。这样做的原因是避免必须重新实现前向传递,否则每个模块都需要修改其前向方法,但是由于它们在模块之间差异很大,因此无法以通用方式完成。这种方法实质上劫持了参数,因此前向传递使用它们的修改版本。
从上面继续示例:
# Using the actual module for the same calculation
linear1 = nn.Linear(2, 4)
# Delete the parameters, so that regular tensors can be assigned to them
# Otherwise it throws an error that the tensor is not an nn.Parameter
del linear1.weight
del linear1.bias
# Assign the parameters with dropped elements
linear1.weight = weight
linear1.bias = bias
# Run the forward pass directly
output_linear1 = linear1(input)
torch.equal(output, output_linear1) # => True
最重要的是,这些参数是从模块中提取的,并且向前传递使用修改后的版本(在删除之后)进行计算,它们不再是参数,而是中间结果。