Keras K 折交叉验证具有更高的 MSE

问题描述 投票:0回答:1

我正在尝试在我的数据集上获得准确的 MSE。使用以下代码:

# Selecting inputs and outputs
inputs = data[input_columns].select_dtypes(include=[np.number])
outputs = data[output_columns].select_dtypes(include=[np.number])

# Initialize the scaler for inputs and outputs
scaler = MinMaxScaler()
output_scaler = MinMaxScaler()

# KFold
kf = KFold(n_splits=5, shuffle=True, random_state=42)

# Metrics
test_losses, test_maes, test_mses, test_mapes, test_r2 = [], [], [], [], []

# Store metrics history for plots
history_list = []

for train, test in kf.split(inputs, outputs):
    # Splitting data
    X_train = inputs.iloc[train]
    X_test = inputs.iloc[test]
    y_train = outputs.iloc[train]
    y_test = outputs.iloc[test]

    # Normalizing inputs
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)

    # Normalizing outputs
    y_train_scaled = output_scaler.fit_transform(y_train)
    y_test_scaled = output_scaler.transform(y_test)

    # Model definition
    model = Sequential([
    Dense(64, activation='relu', input_shape=(X_train_scaled.shape[1],)),
    Dense(64, activation='relu'),
    Dense(y_train_scaled.shape[1])  # Match number of outputs
    ])

    # Compile the model
    model.compile(optimizer='adam', loss='mse', metrics=['mae', 'mean_squared_error', 'mean_absolute_percentage_error', r_squared])

    # Train the model
    history = model.fit(X_train_scaled, y_train_scaled, epochs=500, batch_size=32, verbose=0, validation_split=0.2)
    history_list.append(history)

    # Evaluation
    test_loss, test_mae, test_mape, test_mse, test_r2_data = model.evaluate(X_test_scaled, y_test_scaled, verbose=0)
    print(test_loss)
    test_losses.append(test_loss)
    test_maes.append(test_mae)
    test_mses.append(test_mse)
    test_mapes.append(test_mape)
    test_r2.append(test_r2_data)

# Calculate average metrics
avg_test_loss = np.mean(test_losses)
avg_test_mae = np.mean(test_maes)
avg_test_mse = np.mean(test_mses)
avg_test_mape = np.mean(test_mapes)
avg_test_r2 = np.mean(test_r2)

但是当我运行这个时,我的平均 MSE 是 0.030。而当我不使用 K 折时,它低至 0.014。现在,甚至没有一次迭代的 MSE 低于 0.020。

这是 Keras 中 K 折交叉验证的特征还是我做错了什么?

提前致谢!

更新:当我不使用 K 折叠时的代码(尝试制作几乎完全相同的副本):

# Selecting inputs and outputs
inputs = data[input_columns].select_dtypes(include=[np.number])
outputs = data[output_columns].select_dtypes(include=[np.number])

# Initialize the scaler for inputs and outputs
scaler = MinMaxScaler()
output_scaler = MinMaxScaler()


if True == True:

    # Splitting data
    X_train, X_test, y_train, y_test = train_test_split(inputs, outputs, test_size=0.2, random_state=42)

    # Normalizing inputs
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)

    # Normalizing outputs
    y_train_scaled = output_scaler.fit_transform(y_train)
    y_test_scaled = output_scaler.transform(y_test)

    # Model definition
    model = Sequential([
    Dense(64, activation='relu', input_shape=(X_train_scaled.shape[1],)),
    Dense(64, activation='relu'),
    Dense(y_train_scaled.shape[1])  # Match number of outputs
    ])

    # Compile the model
    model.compile(optimizer='adam', loss='mse', metrics=['mae', 'mean_squared_error', 'mean_absolute_percentage_error', r_squared])

    # Train the model
    history = model.fit(X_train_scaled, y_train_scaled, epochs=500, batch_size=32, verbose=0, validation_split=0.2)

    # Evaluation
    test_loss, test_mae, test_mape, test_mse, test_r2_data = model.evaluate(X_test_scaled, y_test_scaled, verbose=0)
    print(test_loss)
keras
1个回答
0
投票

我认为这是因为你使用了

KFlod
,在 scikit-learn 文档中,它说:“KFold 将所有样本划分为样本组,称为大小相等的折叠 [...](如果可能)。”

因此,当您进行交叉验证时,您的训练规模较小,这可能解释了较高的模型误差。

希望有帮助!

© www.soinside.com 2019 - 2024. All rights reserved.