我训练了一个用于拼写纠正的 Tensorflow 模型。我训练了 > 60 个 epoch,准确率达到约 82.2%,损失为 0.3032。当我尝试使用模型进行预测时,它没有提供任何正确的预测。该模型使用超过 10 万个句子进行训练,其中约 20% 存在拼写错误。输出是二进制数据(不是 one-hot 编码的)。即使使用一些训练数据,该模型也无法提供正确的预测。模型如下:
def create_model():
tf.random.set_seed(42)
# input_layer = Input(shape=(x_train.shape[1],1),name='input_layer')#input layer for use without embedding
input_layer = Input(shape=(x_train.shape[1]),name='input_layer')#input layer for use with embedding
sp_embedding_layer = Embedding(len(char2idx),EMBEDDING_DIM,embeddings_initializer=initializers.Constant(embed_matrix),trainable=False)(input_layer)#embeddings_initializer=initializers.Constant(embed_matrix),
x = Bidirectional(LSTM(1024,return_sequences = False))(sp_embedding_layer)
x = Dense(1920, activation = 'relu',name='Dense_1')(x)
x = Dense(2048, activation = 'relu',name='Dense_2')(x)
x = Dense(2048, activation = 'relu',name='Dense_3')(x)
x = Dense(2048, activation = 'relu',name='Dense_4')(x)
x = Dense(2048, activation = 'relu',name='Dense_5')(x)
x = Dense(1024, activation = 'relu',name='Dense_6')(x)
x = Dense(1024, activation = 'relu',name='Dense_7')(x)
x = Dense(y_train.shape[1],name='Output',activation = 'sigmoid')(x)
model = models.Model(inputs = input_layer, outputs = x)
return model
优化器和回调如下:
sf = 4
spb = x_train.shape[0]/BATCH_SIZE #steps per batch
sf_epoch = spb * sf
myoptimizer = optimizers.Adam(learning_rate=0.000001)
filepath="/content/drive/MyDrive/NLP/Models/SCModels/weights.{epoch:02d}.tf"
checkpoint = EpochModelCheckpoint(filepath,frequency=sf,save_weights_only=True)
# ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=True,
# mode='max',save_freq = sf)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2,verbose=1,
patience=2)
callbacks_list = [checkpoint,reduce_lr]
训练代码:
history = n_model.fit(x_train,y_train,validation_data=(x_val,y_val),epochs=60,
callbacks=callbacks_list,batch_size=BATCH_SIZE)
预测代码如下:
test_model = create_model()
test_model.load_weights(selected_model)
test_optimizer = optimizers.Adam(learning_rate=0.0983)
test_model.compile(optimizer=test_optimizer,loss=tf.keras.losses.BinaryCrossentropy(),
metrics=['accuracy'])
print('input:',sample_sent)
input = np.array(encode(sample_sent,MAX_SENT_LEN+10)).reshape((1,-1))
answer = test_model.predict(input)
answer_dec = np.where(answer<0.5,0,1)
print(answer_dec[0].tolist())
print(decode(recover_int_seq(answer_dec[0].tolist())))
为什么预测不准确?我将不胜感激任何帮助。谢谢。
在我看来,你并不真的需要那么多 Dense 层。考虑添加 LayerNormalization 和 Dropout 层而不是一些 Dense 层。