GridSearchCV 错误:ValueError:顺序模型“顺序”尚未定义输出

问题描述 投票:0回答:1

我正在尝试在我已经完成特征工程的数据集上微调我的深度学习神经网络的超参数。我只保留了相关特征,并对数据进行了标准化(使用 MinMaxScaler)。我按照网上看到的步骤找到了最佳参数:

  1. 特征工程/数据标准化(预处理)
  2. 创建神经网络的构建函数
  3. 使用该神经网络创建 KerasRegressor 对象
  4. 创建我想测试的参数字典
  5. 创建一个 GridSearchCV 对象,其中 KerasRegressor 对象作为估计器,param_grid 作为参数字典
  6. 使用训练集(来自train_test_split)拟合数据
  7. 打印 best_params_

但是我遇到了一个错误:

Traceback (most recent call last):
  File "C:\Users\vishv\anaconda3\Lib\site-packages\joblib\externals\loky\process_executor.py", line 428, in _process_worker
    r = call_item()
        ^^^^^^^^^^^
  File "C:\Users\vishv\anaconda3\Lib\site-packages\joblib\externals\loky\process_executor.py", line 275, in __call__
    return self.fn(*self.args, **self.kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vishv\anaconda3\Lib\site-packages\joblib\_parallel_backends.py", line 620, in __call__
    return self.func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vishv\anaconda3\Lib\site-packages\joblib\parallel.py", line 288, in __call__
    return [func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vishv\anaconda3\Lib\site-packages\joblib\parallel.py", line 288, in <listcomp>
    return [func(*args, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vishv\anaconda3\Lib\site-packages\sklearn\utils\parallel.py", line 127, in __call__
    return self.function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vishv\anaconda3\Lib\site-packages\sklearn\model_selection\_validation.py", line 732, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\vishv\anaconda3\Lib\site-packages\scikeras\wrappers.py", line 760, in fit
    self._fit(
  File "C:\Users\vishv\anaconda3\Lib\site-packages\scikeras\wrappers.py", line 926, in _fit
    self._check_model_compatibility(y)
  File "C:\Users\vishv\anaconda3\Lib\site-packages\scikeras\wrappers.py", line 549, in _check_model_compatibility
    if self.n_outputs_expected_ != len(self.model_.outputs):
                                       ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vishv\anaconda3\Lib\site-packages\keras\src\models\sequential.py", line 277, in outputs
    raise ValueError(
ValueError: Sequential model 'sequential' has no defined outputs yet.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\Users\vishv\OneDrive\Documents\Projects and Personal Learning\Spotify Top 200 Chart Analysis\prediction_test.py", line 100, in <module>
    grid = grid.fit(X_train,y_train)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vishv\anaconda3\Lib\site-packages\sklearn\base.py", line 1151, in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vishv\anaconda3\Lib\site-packages\sklearn\model_selection\_search.py", line 898, in fit
    self._run_search(evaluate_candidates)
  File "C:\Users\vishv\anaconda3\Lib\site-packages\sklearn\model_selection\_search.py", line 1419, in _run_search
    evaluate_candidates(ParameterGrid(self.param_grid))
  File "C:\Users\vishv\anaconda3\Lib\site-packages\sklearn\model_selection\_search.py", line 845, in evaluate_candidates
    out = parallel(
          ^^^^^^^^^
  File "C:\Users\vishv\anaconda3\Lib\site-packages\sklearn\utils\parallel.py", line 65, in __call__
    return super().__call__(iterable_with_config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vishv\anaconda3\Lib\site-packages\joblib\parallel.py", line 1098, in __call__
    self.retrieve()
  File "C:\Users\vishv\anaconda3\Lib\site-packages\joblib\parallel.py", line 975, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vishv\anaconda3\Lib\site-packages\joblib\_parallel_backends.py", line 567, in wrap_future_result
    return future.result(timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vishv\anaconda3\Lib\concurrent\futures\_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\vishv\anaconda3\Lib\concurrent\futures\_base.py", line 401, in __get_result
    raise self._exception
ValueError: Sequential model 'sequential' has no defined outputs yet.

下面是我的代码。请注意,我对机器学习和神经网络相当陌生:

# DataFrame Libraries
import pandas as pd
import numpy as np
import random as rnd

# Visualization Libraries
import matplotlib.pyplot as plt
from pandasgui import show
import seaborn as sns

# Machine Learning Libraries
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import r2_score
from sklearn.preprocessing import MinMaxScaler
import tensorflow as tf
from tensorflow.keras.models import Sequential
from scikeras.wrappers import KerasRegressor
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.metrics import R2Score
from tensorflow.keras.callbacks import EarlyStopping


# Read in Data
spotify_df = pd.read_csv('spotify_top_songs_audio_features.csv',index_col="id")

# Clean Data
    # Dropping source, mode, key, time_signature (no/little correlation to features)
spotify_df.drop(['source','mode', 'key', 'time_signature'],axis=1,inplace=True)

    # Mapping outlier in artist_names (Tyler, The Creator -> Tyler The Creator) 
def tyler_map(artist_names):
    if 'Tyler, The Creator' in artist_names:
        return artist_names.replace('Tyler, The Creator','Tyler The Creator')
    else:
        return artist_names

spotify_df['artist_names'] = spotify_df['artist_names'].apply(tyler_map)

    # Splitting artist names into lists of each artist + making dummies for each artist
spotify_df['artist_names'] = spotify_df['artist_names'].apply(lambda x:x.split(", "))

artist_dummy = pd.get_dummies(data=spotify_df['artist_names'].explode(),drop_first=True).groupby(level=0).sum()

    # Concat dummies to original list (without artist_names)
spotify_df = pd.concat([spotify_df.drop('artist_names',axis=1),artist_dummy],axis=1)

X = spotify_df.iloc[:,13:]
y = spotify_df['weeks_on_chart']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

scaler = MinMaxScaler()

X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

early_stop = EarlyStopping(monitor='val_loss', mode='min', verbose=0, patience=25)

def buildModel(optimizer='adam'):
    model = Sequential()
    

    model.add(Dense(234, activation = 'relu'))
    model.add(Dropout(0.1))

    for i in range(2):
        model.add(Dense(78, activation = 'relu'))
        model.add(Dropout(0.1))

        model.add(Dense(78, activation = 'relu'))
        model.add(Dropout(0.2))

    for i in range(5):
        model.add(Dense(39, activation = 'relu'))
        model.add(Dropout(0.1))

        model.add(Dense(39, activation = 'relu'))
        model.add(Dropout(0.2))

    for i in range(3):
        model.add(Dense(13, activation = 'relu'))
        model.add(Dropout(0.1))

        model.add(Dense(13, activation = 'relu'))
        model.add(Dropout(0.2))

    model.add(Dense(1, activation = 'linear'))

    model.compile(optimizer=optimizer,loss='mean_absolute_error',metrics=['mean_absolute_error'])

    return model

nn = KerasRegressor(model=buildModel,epochs=600,callbacks=[early_stop])

parameters = {'batch_size':[30,40,50,60,70],
              'optimizer':['adam','rmsprop','adamw']}

grid = GridSearchCV(estimator=nn,param_grid=parameters,scoring='neg_mean_absolute_error',cv=3)

grid = grid.fit(X_train,y_train)

print(grid.best_params_)
python tensorflow keras neural-network gridsearchcv
1个回答
0
投票

如果您想使用GridSearchCV,我建议您使用Scikit-Learn API中的

MLPRegressor
,它会更兼容。 (如果您开始有很多超参数需要调整,则可以使用
RandomSearchCV
)。

另请查看 Scikit-Learn 中的管道此处

否则,您可以使用专用的超参数框架,例如Optuna,它很好地支持TensorFlow的使用。

© www.soinside.com 2019 - 2024. All rights reserved.