我尝试实现提前停止功能以避免我的神经网络模型过度拟合。我很确定逻辑是正确的,但由于某种原因,它不起作用。 我希望当验证损失大于某些时期的训练损失时,早期停止函数返回 True。但它始终返回 False,即使验证损失变得比训练损失大得多。请问您能看出问题出在哪里吗?
def early_stopping(train_loss, validation_loss, min_delta, tolerance):
counter = 0
if (validation_loss - train_loss) > min_delta:
counter +=1
if counter >= tolerance:
return True
for i in range(epochs):
print(f"Epoch {i+1}")
epoch_train_loss, pred = train_one_epoch(model, train_dataloader, loss_func, optimiser, device)
train_loss.append(epoch_train_loss)
# validation
with torch.no_grad():
epoch_validate_loss = validate_one_epoch(model, validate_dataloader, loss_func, device)
validation_loss.append(epoch_validate_loss)
# early stopping
if early_stopping(epoch_train_loss, epoch_validate_loss, min_delta=10, tolerance = 20):
print("We are at epoch:", i)
break
编辑2:
def train_validate (model, train_dataloader, validate_dataloader, loss_func, optimiser, device, epochs):
preds = []
train_loss = []
validation_loss = []
min_delta = 5
for e in range(epochs):
print(f"Epoch {e+1}")
epoch_train_loss, pred = train_one_epoch(model, train_dataloader, loss_func, optimiser, device)
train_loss.append(epoch_train_loss)
# validation
with torch.no_grad():
epoch_validate_loss = validate_one_epoch(model, validate_dataloader, loss_func, device)
validation_loss.append(epoch_validate_loss)
# early stopping
early_stopping = EarlyStopping(tolerance=2, min_delta=5)
early_stopping(epoch_train_loss, epoch_validate_loss)
if early_stopping.early_stop:
print("We are at epoch:", e)
break
return train_loss, validation_loss
虽然@KarelZe的回应充分且优雅地解决了您的问题,但我想提供一种可以说更好的替代早期停止标准。
您的早期停止标准基于验证损失与训练损失的偏离程度(以及持续时间)。当验证损失确实减少但通常不够接近训练损失时,这种情况就会被打破。训练模型的目标是鼓励减少验证损失,而不是减少训练损失和验证损失之间的差距。
因此,我认为更好的早期停止标准是单独观察验证损失的趋势,即,如果训练没有导致验证损失的降低,则终止它。这是一个示例实现:
class EarlyStopper:
def __init__(self, patience=1, min_delta=0):
self.patience = patience
self.min_delta = min_delta
self.counter = 0
self.min_validation_loss = float('inf')
def early_stop(self, validation_loss):
if validation_loss < self.min_validation_loss:
self.min_validation_loss = validation_loss
self.counter = 0
elif validation_loss > (self.min_validation_loss + self.min_delta):
self.counter += 1
if self.counter >= self.patience:
return True
return False
使用方法如下:
early_stopper = EarlyStopper(patience=3, min_delta=10)
for epoch in np.arange(n_epochs):
train_loss = train_one_epoch(model, train_loader)
validation_loss = validate_one_epoch(model, validation_loader)
if early_stopper.early_stop(validation_loss):
break
您的实现的问题是,每当您调用
early_stopping()
时,计数器都会用 0
重新初始化。
这里是使用面向 oo 的方法的可行解决方案,并使用
__call__()
和 __init__()
代替:
class EarlyStopping:
def __init__(self, tolerance=5, min_delta=0):
self.tolerance = tolerance
self.min_delta = min_delta
self.counter = 0
self.early_stop = False
def __call__(self, train_loss, validation_loss):
if (validation_loss - train_loss) > self.min_delta:
self.counter +=1
if self.counter >= self.tolerance:
self.early_stop = True
这样称呼它:
early_stopping = EarlyStopping(tolerance=5, min_delta=10)
for i in range(epochs):
print(f"Epoch {i+1}")
epoch_train_loss, pred = train_one_epoch(model, train_dataloader, loss_func, optimiser, device)
train_loss.append(epoch_train_loss)
# validation
with torch.no_grad():
epoch_validate_loss = validate_one_epoch(model, validate_dataloader, loss_func, device)
validation_loss.append(epoch_validate_loss)
# early stopping
early_stopping(epoch_train_loss, epoch_validate_loss)
if early_stopping.early_stop:
print("We are at epoch:", i)
break
示例:
early_stopping = EarlyStopping(tolerance=2, min_delta=5)
train_loss = [
642.14990234,
601.29278564,
561.98400879,
530.01501465,
497.1098938,
466.92709351,
438.2364502,
413.76028442,
391.5090332,
370.79074097,
]
validate_loss = [
509.13619995,
497.3125,
506.17315674,
497.68960571,
505.69918823,
459.78610229,
480.25592041,
418.08630371,
446.42675781,
372.09902954,
]
for i in range(len(train_loss)):
early_stopping(train_loss[i], validate_loss[i])
print(f"loss: {train_loss[i]} : {validate_loss[i]}")
if early_stopping.early_stop:
print("We are at epoch:", i)
break
输出:
loss: 642.14990234 : 509.13619995
loss: 601.29278564 : 497.3125
loss: 561.98400879 : 506.17315674
loss: 530.01501465 : 497.68960571
loss: 497.1098938 : 505.69918823
loss: 466.92709351 : 459.78610229
loss: 438.2364502 : 480.25592041
We are at epoch: 6
它可能对像我这样的人有帮助,我想补充之前的答案。
提供的两个答案对 min_delta 参数都有不同的解释。在@KarelZe的答案中,min_delta用作train_loss和validation_loss之间的差距:
if (validation_loss - train_loss) > self.min_delta:
self.counter +=1
另一方面,在@isle_of_gods的回答中,当新的验证损失至少min_delta大于当前的最小验证损失时,min_delta用于增加计数器:
elif validation_loss > (self.min_validation_loss + self.min_delta):
self.counter += 1
虽然这些答案都不是错误的,但因为这取决于个人的需求,但我认为将 min_delta 视为认为模型改进所需的最小变化更直观。 Keras 的文档(与 PyTorch 一样受欢迎)在其早期停止机制中定义了 min_delta 参数,如下所示:
min_delta: 监控数量的最小变化有资格作为改进,即绝对变化小于 min_delta,将被视为没有改进。
这意味着,任何减少的验证损失值都不会被算作减少除非减少量大于min_delta
为了与 Keras 文档保持一致,@isle_of_gods 的代码可以修改如下:
class ValidationLossEarlyStopping:
def __init__(self, patience=1, min_delta=0.0):
self.patience = patience # number of times to allow for no improvement before stopping the execution
self.min_delta = min_delta # the minimum change to be counted as improvement
self.counter = 0 # count the number of times the validation accuracy not improving
self.min_validation_loss = np.inf
# return True when encountering _patience_ times decrease in validation loss
def early_stop_check(self, validation_loss):
if ((validation_loss+self.min_delta) < self.min_validation_loss):
self.min_validation_loss = validation_loss
self.counter = 0 # reset the counter if validation loss decreased at least by min_delta
elif ((validation_loss+self.min_delta) > self.min_validation_loss):
self.counter += 1 # increase the counter if validation loss is not decreased by the min_delta
if self.counter >= self.patience:
return True
return False