我正在尝试在这个数据集上建立LSTM模型。https:/www.kaggle.comrtatmanspeech-accent-archive
这是我正在研究的模型。
def train_lstm_model(X_train, y_train, X_validation, y_validation, EPOCHS, batch_size=128):
# Get row, column, and class sizes
rows = X_train[0].shape[0]
cols = X_train[0].shape[1]
val_rows = X_validation[0].shape[0]
val_cols = X_validation[0].shape[1]
num_classes = len(y_train[0])
input_shape = (rows, cols)
X_train = X_train.reshape(X_train.shape[0], rows, cols)
X_validation = X_validation.reshape(X_validation.shape[0], val_rows, val_cols)
lstm = Sequential()
lstm.add(LSTM(64, return_sequences=True, stateful=False, input_shape=input_shape, activation='tanh'))
lstm.add(LSTM(64, return_sequences=True, stateful=False, activation='tanh'))
lstm.add(LSTM(64, stateful=False, activation='tanh'))
# add dropout to control for overfitting
lstm.add(Dropout(.25))
# squash output onto number of classes in probability space
lstm.add(Dense(num_classes, activation='softmax'))
# adam = optimizers.adam(lr=0.0001)
rmsprop = optimizers.adam(lr=0.002)
lstm.compile(loss='categorical_crossentropy', optimizer=rmsprop, metrics=["accuracy"])
es = EarlyStopping(monitor='acc', min_delta=.005, patience=10, verbose=1, mode='auto')
# Creates log file for graphical interpretation using TensorBoard
tb = TensorBoard(log_dir=LOG_DIR, histogram_freq=0, batch_size=32, write_graph=True, write_grads=True,
write_images=True, embeddings_freq=0, embeddings_layer_names=None,
embeddings_metadata=None)
lstm.fit(X_train, y_train, batch_size=batch_size,
epochs=EPOCHS, validation_data=(X_validation,y_validation),
callbacks=[es,tb])
return lstm
当我运行15个纪元时,我得到了这个验证数据的损失曲线。https:/imgur.comahB4uK
而这是验证数据上的准确性。https:/imgur.coma9knGD
这就是训练精度。https:/imgur.comaHBfgF而这就是训练的损失。https:/imgur.comaJRdQ9。
我只用了数据集中的三个类。
有什么建议,我可以在模型中改进什么?
这些是我遵循的步骤:1. 读取wav文件[每类只读取90个样本]2.计算mel-spec图3.将mel-spec分割成段。[这样可以得到大约11k个样本]3.对mel-spec进行标准化。4.送入网络。