我有一些来自不同领域的任务,并且有一个名为
MetaModel
的基本模型可以执行回归。我想在训练不同任务时向基本模型的参数添加特定于任务的偏差MetaModel
。目前,我使用的是下面代码中所示的方法,但是在梯度反向传播期间,MetaModelWithBias.biases
的梯度始终为None
。我想知道如何更新 MetaModelWithBias.biases
的参数以使它们可学习。
任务特定偏差的关键代码如下:
for param in self.meta_model.parameters():
param.data += self.biases[task_id]
完整代码:
import torch
import torch.nn as nn
import torch.optim as optim
class MetaModelWithBias(nn.Module):
def __init__(self, meta_model, num_tasks):
super(MetaModelWithBias, self).__init__()
self.meta_model = meta_model
self.biases = nn.ParameterList([nn.Parameter(torch.randn(1)) for _ in range(num_tasks)])
def forward(self, x, task_id):
with torch.no_grad():
for param in self.meta_model.parameters():
param.data += self.biases[task_id]
output = self.meta_model(x)
return output
class MetaModel(nn.Module):
def __init__(self):
super(MetaModel, self).__init__()
self.fc = nn.Linear(10, 1)
def forward(self, x):
return self.fc(x)
# random some data
num_tasks = 5
input_size = 10
num_samples = 100
num_tasks = 5
X = torch.randn(num_samples, input_size)
task_ids = torch.randint(0, num_tasks, (num_samples,))
# create model
meta_model = MetaModel()
meta_model_with_bias = MetaModelWithBias(meta_model, num_tasks)
# training
num_epochs=10
optimizer = optim.SGD([
{'params': meta_model_with_bias.meta_model.parameters()},
{'params': meta_model_with_bias.biases.parameters() }
], lr=0.01)
for epoch in range(num_epochs):
outputs =0
for i in range(num_samples):
task_id = task_ids[i]
output = meta_model_with_bias(X[i].unsqueeze(0), task_id)
outputs+=output
criterion = nn.MSELoss()
targets = torch.randn(1, 1)
loss = criterion(outputs, targets)
print('\tMetaModelWithBias biases before backward',[parms for parms in meta_model_with_bias.biases],[parms.grad for parms in meta_model_with_bias.biases])
optimizer.zero_grad()
loss.backward()
optimizer.step()
print('\tMetaModelWithBias biases after backward',[parms for parms in meta_model_with_bias.biases],[parms.grad for parms in meta_model_with_bias.biases])
print('meta_model_with_bias params')
for name, parms in meta_model_with_bias.named_parameters():
print('-->name:', name, '-->grad_requirs:', parms.requires_grad,
' -->parameter:', parms,
' -->grad_value:', parms.grad)
然后我得到:
MetaModelWithBias biases before backward [Parameter containing:
tensor([-0.2255], requires_grad=True), Parameter containing:
tensor([-2.2697], requires_grad=True), Parameter containing:
tensor([-0.7426], requires_grad=True), Parameter containing:
tensor([0.0925], requires_grad=True), Parameter containing:
tensor([-0.0564], requires_grad=True)] [None, None, None, None, None]
MetaModelWithBias biases after backward [Parameter containing:
tensor([-0.2255], requires_grad=True), Parameter containing:
tensor([-2.2697], requires_grad=True), Parameter containing:
tensor([-0.7426], requires_grad=True), Parameter containing:
tensor([0.0925], requires_grad=True), Parameter containing:
tensor([-0.0564], requires_grad=True)] [None, None, None, None, None]
感谢您的宝贵时间!
在您提出的版本中,有两个问题表明您的
biases
没有获得任何渐变。首先,您使用的是 torch.no_grad
,这意味着不允许进行梯度计算,因此在此范围内,所有 requires_grad
标志均按设计设置为 False
。即使删除它,您也会故意绕过梯度计算,因为参数是通过其 data
属性覆盖的,这意味着不会跟踪梯度!
更不用说您正在累积偏差,即。经过 10 次迭代后,您将得到
params += 10*biases
,这似乎是不正确的。换句话说,在您的实现中,没有“重置”参数...
现在,当然,您可能这样做是为了克服可怕的 “叶变量在就地操作中使用” 错误。当您尝试就地操作参数时会发生这种情况,因此会失败:
param += self.biases[task_id]
将其放错位置是没有意义的,因为
param
只是一个作用域变量,底层参数保持不变。
经过思考,似乎使梯度跟踪回到添加偏差的唯一方法是通过“功能”方法手动应用该层。您无需在每次迭代时修改参数,而是在应用之前进行修改(这解决了上面以粗体突出显示的问题)。 这是一种可能的实现,可确保对偏差进行梯度计算:
class MetaModelWithBias(nn.Module):
def __init__(self, meta_model, num_tasks):
super(MetaModelWithBias, self).__init__()
self.meta_model = meta_model
self.biases = nn.ParameterList([
nn.Parameter(torch.randn(1)) for _ in range(num_tasks)])
def forward(self, x, task_id):
output = self.meta_model(x, self.biases[task_id])
return output
class MetaModel(nn.Module):
def __init__(self):
super(MetaModel, self).__init__()
self.fc = nn.Linear(10, 1)
def forward(self, x, bias):
return F.linear(x, self.fc.weight+bias, self.fc.bias+bias)