我正在训练一个序列到序列(seq2seq)模型,并且我需要为input_sequence_length
训练不同的值。
对于值10
和15
,我得到了可接受的结果,但是当我尝试使用20
进行训练时,我得到了[[内存错误,因此我将训练切换为分批训练,但是模型over -fit,并且验证损失激增,即使使用累积的梯度,我也会得到相同的行为,因此我在寻找提示,并找到了进行更新的更准确的方法。
if batch_size is not None:
k=len(list(np.arange(0,(X_train_tensor_1.size()[0]//batch_size-1), batch_size )))
for epoch in range(num_epochs):
optimizer.zero_grad()
epoch_loss=0
for i in list(np.arange(0,(X_train_tensor_1.size()[0]//batch_size-1), batch_size )): # by using equidistant batch till the last one it becomes much faster than using the X.size()[0] directly
sequence = X_train_tensor[i:i+batch_size,:,:].reshape(-1, sequence_length, input_size).to(device)
labels = y_train_tensor[i:i+batch_size,:,:].reshape(-1, sequence_length, output_size).to(device)
# Forward pass
outputs = model(sequence)
loss = criterion(outputs, labels)
epoch_loss+=loss.item()
# Backward and optimize
loss.backward()
optimizer.step()
epoch_loss=epoch_loss/k
model.eval
validation_loss,_= evaluate(model,X_test_hard_tensor_1,y_test_hard_tensor_1)
model.train()
training_loss_log.append(epoch_loss)
print ('Epoch [{}/{}], Train MSELoss: {}, Validation : {} {}'.format(epoch+1, num_epochs,epoch_loss,validation_loss))
编辑:这是我正在训练的参数:
optimizer=torch.optim.Adam(model.parameters(), lr=learning_rate) criterion = nn.MSELoss(reduction='mean') batch_size = 1024 num_epochs = 25000 learning_rate = 10e-04
我也对您的学习率感到好奇。每次调用loss.backward()
都会累积梯度。如果您将学习速度设置为一次只能预期一个示例,而没有降低它来说明批量累积,那么将发生以下两种情况之一。