Keras 网络使用 Scikit-Learn 管道导致 ValueError

Question

我正在努力使用 Keras 和 Scikit-Learn Pipeline 构建一个神经网络进行预处理。到目前为止，我已经能够构建管道和初始模型架构（非常基本），但在合并两者时遇到了问题。我能够将管道用于其他机器学习模型（问题在于深度学习）。

我继续收到以下值错误：

ValueError：层equential_53的输入0与层：输入形状的预期轴 -1 的值为 5，但已收到输入形状（无，49）

当我更新模型的 input_dim 以解决初始错误时，我在第一个纪元完成运行后收到类似的错误：

纪元 1/100 283/300 [============================>..] - ETA：0 秒 - 损失： 0.5751 - 二进制精度：0.7925 -------------------------------------------------- -------------------------- ValueError Traceback（最近调用最后）在 1#适合型号 ----> 2 历史 = pipeline.fit(X_train, y_train)

ValueError：层equential_54的输入0与层：输入形状的预期轴 -1 的值为 49，但已收到输入形状（无，5）

将 keras 神经网络嵌入到 sklearn 管道中的最佳方法是什么（需要 one-hot 编码分类变量？

以下代码摘要：

# Preprocessing Pipeline
numeric_features = list(X.select_dtypes(include=['number']))
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('normalize', MinMaxScaler(feature_range=(0,1)))])

categorical_features = list(X.select_dtypes(include=['category']))
categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)]
)

# Split Data into Training and Test Sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, shuffle=True)

# Split Training Data into Training and Validation Sets
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=42, shuffle=True)

def CreateModel():
    # Define Model
    model = Sequential([
        layers.Dense(units=32, activation='relu', input_dim=X.shape[-1]),
        
        layers.Dense(units=16, activation='relu'),
        
        layers.Dense(units=1, activation='sigmoid')
    ])

    # Specify Optimizer
    optimizer = optimizers.Adam(epsilon=0.01)

    # Compile the Model
    model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['binary_accuracy'])

    return model

# Add Early Stopping
early_stopping = EarlyStopping(monitor='val_loss', patience=10, min_delta=0.001, restore_best_weights=True)

# Instantiate Baseline Classification Models
clf = KerasClassifier(build_fn=CreateModel, verbose=1, epochs=100, batch_size=16, validation_data=(X_val, y_val), callbacks=[early_stopping])

# Fit to the training set
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('classifier', clf)
])

# Fit Model
history = pipeline.fit(X_train, y_train)

Answer 1

您的输入形状显然有问题。

您正在传递具有 49 个特征的数据，同时您已在第一层中将输入特征指定为 X.shape[-1] (input_dim=X.shape[-1])。将输入暗淡更改为 input_dim=49 或

input_dim=X_train.shape[-1]

。

Answer 2

输入形状存在问题（但不仅仅是更改为 X_train.shape[-1]）。

您需要在预处理后获取形状，因为管道中有独热编码器。

Keras 网络使用 Scikit-Learn 管道导致 ValueError

问题描述投票：0回答：2

2个回答

最新问题

Keras 网络使用 Scikit-Learn 管道导致 ValueError

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2