pytorch中设置权重的计算图

问题描述 投票:1回答:1

我需要澄清为FastAI2库中的某些功能编写的代码。

这是用FastAI2库编写的代码WeightDropout。

 class WeightDropout(Module):
        "A module that warps another layer in which some weights will be replaced by 0 during training."

        def __init__(self, module, weight_p, layer_names='weight_hh_l0'):
            self.module,self.weight_p,self.layer_names = module,weight_p,L(layer_names)
            for layer in self.layer_names:
                #Makes a copy of the weights of the selected layers.
                w = getattr(self.module, layer)
                delattr(self.module, layer)
                self.register_parameter(f'{layer}_raw', nn.Parameter(w.data))
                setattr(self.module, layer, F.dropout(w.data, p=self.weight_p, training=False))
                if isinstance(self.module, (nn.RNNBase, nn.modules.rnn.RNNBase)):
                    self.module.flatten_parameters = self._do_nothing

        def _setweights(self):
            "Apply dropout to the raw weights."
            for layer in self.layer_names:
                raw_w = getattr(self, f'{layer}_raw')
                setattr(self.module, layer, F.dropout(raw_w.data, p=self.weight_p, training=self.training))

        def forward(self, *args):
            self._setweights()
            with warnings.catch_warnings():
                #To avoid the warning that comes because the weights aren't flattened.
                warnings.simplefilter("ignore")
                return self.module.forward(*args)

        def reset(self):
            for layer in self.layer_names:
                raw_w = getattr(self, f'{layer}_raw')
                setattr(self.module, layer, 
    F.dropout(raw_w.data, p=self.weight_p, training=False))
            if hasattr(self.module, 'reset'): self.module.reset()

        def _do_nothing(self): pass

以上代码在其中将权重随机放入隐藏层的weight矩阵中。我主要感兴趣的是,

 def _setweights(self):
                "Apply dropout to the raw weights."
                for layer in self.layer_names:
                    raw_w = getattr(self, f'{layer}_raw')
                    setattr(self.module, layer, F.dropout(raw_w.data, p=self.weight_p, training=self.training))

我的问题是,这种改变权重的操作是否记录在梯度计算中。

neural-network pytorch gradient autograd
1个回答
1
投票

否,在计算图中未跟踪分配新的权重,因为分配没有导数,因此不可能通过它获得梯度。

然后为什么该代码有效?该模型不会覆盖实际参数,但会使用修改后的版本进行计算,同时保持原始权重不变。有点晦涩,但是最重要的部分是在创建模型时复制参数的时间:

#Makes a copy of the weights of the selected layers.
w = getattr(self.module, layer)
delattr(self.module, layer)
self.register_parameter(f'{layer}_raw', nn.Parameter(w.data))

这里发生的是,对于每个参数,您创建一个以_raw结尾的副本。例如,如果模型上有一个线性层(例如self.linear1 = nn.Linear(2, 4),则有两个名称分别为linear1.weightlinear1.bias的参数。现在将它们复制到linear1.weight_rawlinear1.bias_raw。准确地说,它们不会被复制,而是被重新分配给*_raw属性,然后删除原始属性,因此将它们从原始版本移到原始版本。由于它们不再是参数,因此需要删除原始版本。优化/学习)。

[之后,当应用丢弃时,经过优化/学习的参数(*_raw版本)不变,但是用于实际计算的权重是一些权重被随机丢弃的权重。在带有线性层的示例中,如果您手动进行计算,则如下所示:

# A dummy input
input = torch.randn(1, 2)

# The raw parameters of the linear layer, randomly initialised
weight_raw = nn.Parameter(torch.randn(4, 2))
bias_raw = nn.Parameter(torch.randn(4))

# Randomly dropping elements of the parameters with 50% probability
weight = F.dropout(weight_raw, p=0.5)
bias = F.dropout(bias_raw, p=0.5)

# Calculation of the linear layer (forward)
output = torch.matmul(input, weight.transpose(0, 1)) + bias

由此您可以看到,没有实际的重新分配,只有您熟悉的常规计算流程。

现在您可能想知道为什么要创建这些*_raw参数,而不是在前向传递中应用dropout(就像上面的示例一样)。这样做的原因是避免必须重新实现前向传递,否则每个模块都需要修改其前向方法,但是由于它们在模块之间差异很大,因此无法以通用方式完成。这种方法实质上劫持了参数,因此前向传递使用它们的修改版本。

从上面继续示例:

# Using the actual module for the same calculation
linear1 = nn.Linear(2, 4)

# Delete the parameters, so that regular tensors can be assigned to them
# Otherwise it throws an error that the tensor is not an nn.Parameter
del linear1.weight
del linear1.bias

# Assign the parameters with dropped elements
linear1.weight = weight
linear1.bias = bias

# Run the forward pass directly
output_linear1 = linear1(input)

torch.equal(output, output_linear1) # => True

最重要的是,这些参数是从模块中提取的,并且向前传递使用修改后的版本(在删除之后)进行计算,它们不再是参数,而是中间结果。

© www.soinside.com 2019 - 2024. All rights reserved.