所以我正在尝试训练一个 UNET 模型来从图像中分割人类。我正在使用回调,其中之一是 ReduceLROnPlateau 和 Early Stopping。然而,我的模型没有训练并且在损失 0.48 时停止,即使它没有正确分割。我很乐意分享这个问题的所有相关材料。
回调:
resume_callbacks= [
ModelCheckpoint('/content/gdrive/MyDrive/unet.h5', monitor= "val_loss", verbose= 1),
# Reduces lr when metric stops improving
ReduceLROnPlateau(monitor='val_loss', patience= 3, factor= 0.1, verbose= 1),
ModelCheckpoint('/content/gdrive/MyDrive/unet_checkpoint_latest.hdf5', monitor="val_loss", mode="min", save_best_only=True, verbose=1),
CSVLogger(csv_path),
EarlyStopping(monitor='val_loss', patience=8)]
合身功能:
# Starting/resuming training
model.fit(
train_dataset,
validation_data= test_dataset,
epochs= 100,
steps_per_epoch= train_steps,
validation_steps= valid_steps,
callbacks= resume_callbacks,
initial_epoch=50)
超参数:
input_shape= (256, 256, 3)
batch_size= 8
epochs= 100
lr= 1e-4
model_path= "/content/gdrive/MyDrive/unet.h5"
csv_path= "/content/gdrive/MyDrive/data.csv"
checkpoint_path= "/content/gdrive/MyDrive/unet_checkpoint_latest.hdf5"
输出信息:
Epoch 57: ReduceLROnPlateau reducing learning rate to 9.99999943962493e-12.
Epoch 57: val_loss did not improve from 0.48361
568/568 [==============================] - 86s 152ms/step - loss: 0.4759 - mean_io_u:
0.3723 - recall: 0.3187 - precision: 0.5634 - val_loss: 0.4836 - val_mean_io_u: 0.3718 -
val_recall: 0.3082 - val_precision: 0.5464 - lr: 1.0000e-10
Epoch 58/100
568/568 [==============================] - ETA: 0s - loss: 0.4759 - mean_io_u: 0.3723 -
recall: 0.3187 - precision: 0.5638
Epoch 58: saving model to /content/gdrive/MyDrive/unet.h5
Epoch 58: val_loss did not improve from 0.48361
568/568 [==============================] - 86s 152ms/step - loss: 0.4759 - mean_io_u:
0.3723 - recall: 0.3187 - precision: 0.5638 - val_loss: 0.4836 - val_mean_io_u: 0.3718 -
val_recall: 0.3094 - val_precision: 0.5461 - lr: 1.0000e-11
Epoch 59/100
568/568 [==============================] - ETA: 0s - loss: 0.4760 - mean_io_u: 0.3723 -
recall: 0.3185 - precision: 0.5635
Epoch 59: saving model to /content/gdrive/MyDrive/unet.h5
Epoch 59: val_loss did not improve from 0.48361
568/568 [==============================] - 89s 156ms/step - loss: 0.4760 - mean_io_u:
0.3723 - recall: 0.3185 - precision: 0.5635 - val_loss: 0.4836 - val_mean_io_u: 0.3718 -
val_recall: 0.3083 - val_precision: 0.5463 - lr: 1.0000e-11
Epoch 60/100
568/568 [==============================] - ETA: 0s - loss: 0.4760 - mean_io_u: 0.3723 -
recall: 0.3184 - precision: 0.5634
Epoch 60: saving model to /content/gdrive/MyDrive/unet.h5
Epoch 60: ReduceLROnPlateau reducing learning rate to 9.999999092680235e-13.
Epoch 60: val_loss did not improve from 0.48361
568/568 [==============================] - 86s 152ms/step - loss: 0.4760 - mean_io_u:
0.3723 - recall: 0.3184 - precision: 0.5634 - val_loss: 0.4836 - val_mean_io_u: 0.3718 -
val_recall: 0.3069 - val_precision: 0.5466 - lr: 1.0000e-11
Epoch 61/100
568/568 [==============================] - ETA: 0s - loss: 0.4758 - mean_io_u: 0.3723 -
recall: 0.3187 - precision: 0.5636
Epoch 61: saving model to /content/gdrive/MyDrive/unet.h5
Epoch 61: val_loss did not improve from 0.48361
568/568 [==============================] - 86s 151ms/step - loss: 0.4758 - mean_io_u:
0.3723 - recall: 0.3187 - precision: 0.5636 - val_loss: 0.4836 - val_mean_io_u: 0.3718 -
val_recall: 0.3079 - val_precision: 0.5464 - lr: 1.0000e-12
<keras.callbacks.History at 0x7f06bcffeaf0>