使用 Keras 和 scikit-learn 进行分类时出现无法克隆对象错误(尝试将交叉验证添加到工作模型中)(训练分割有效)

问题描述 投票:0回答:4

我有一个现有的 NN 模型(顺序模型),带有列车分割测试。我需要向我的数据集添加交叉验证;实施交叉验证后,出现以下错误。

TypeError: Cannot clone object '<tensorflow.python.keras.engine.sequential.Sequential object at 0x000001B5D2100108>' (type <class 'tensorflow.python.keras.engine.sequential.Sequential'>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' methods.

当我将交叉验证添加到正在运行的现有训练测试拆分中时,模型的代码如下。

数据集分割

from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score, cross_val_predict # For Cross validation I have added this
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.25,random_state=101)

from sklearn.model_selection import cross_val_score, cross_val_predict # For Cross validation I have added this
from sklearn import metrics # For Cross validation, I have added this

缩放数据

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

创建模型

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation,Dropout

训练模型

from tensorflow.keras.layers import Dropout
model = Sequential()
model.add(Dense(units=70,activation='relu'))
model.add(Dropout(0.7))

model.add(Dense(units=15,activation='relu'))
model.add(Dropout(0.7))

model.add(Dense(units=1,activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam')

from tensorflow.keras.callbacks import EarlyStopping
early_stop = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=25)
model.fit(x=X_train, 
          y=y_train, 
          epochs=600,
          validation_data=(X_test, y_test), verbose=1,
          callbacks=[early_stop]
          )

这就是添加交叉验证预测器会产生错误的地方

predictions = cross_val_predict(model, X_test, y_test, cv=3) # for cross validation ** (model, df, y, cv=3)
model_loss = pd.DataFrame(model.history.history)
model_loss.plot()

完整错误

---------------------------------------------------------------------------
Empty                                     Traceback (most recent call last)
~\anaconda3\lib\site-packages\joblib\parallel.py in dispatch_one_batch(self, iterator)
    796             try:
--> 797                 tasks = self._ready_batches.get(block=False)
    798             except queue.Empty:

~\anaconda3\lib\queue.py in get(self, block, timeout)
    166                 if not self._qsize():
--> 167                     raise Empty
    168             elif timeout is None:

Empty: 

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-32-2b7d023d5ca4> in <module>
----> 1 predictions = cross_val_predict(model, X_test, y_test, cv=3) # for cross validation ** (model, df, y, cv=3)

~\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py in cross_val_predict(estimator, X, y, groups, cv, n_jobs, verbose, fit_params, pre_dispatch, method)
    753     prediction_blocks = parallel(delayed(_fit_and_predict)(
    754         clone(estimator), X, y, train, test, verbose, fit_params, method)
--> 755         for train, test in cv.split(X, y, groups))
    756 
    757     # Concatenate the predictions

~\anaconda3\lib\site-packages\joblib\parallel.py in __call__(self, iterable)
   1002             # remaining jobs.
   1003             self._iterating = False
-> 1004             if self.dispatch_one_batch(iterator):
   1005                 self._iterating = self._original_iterator is not None
   1006 

~\anaconda3\lib\site-packages\joblib\parallel.py in dispatch_one_batch(self, iterator)
    806                 big_batch_size = batch_size * n_jobs
    807 
--> 808                 islice = list(itertools.islice(iterator, big_batch_size))
    809                 if len(islice) == 0:
    810                     return False

~\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py in <genexpr>(.0)
    753     prediction_blocks = parallel(delayed(_fit_and_predict)(
    754         clone(estimator), X, y, train, test, verbose, fit_params, method)
--> 755         for train, test in cv.split(X, y, groups))
    756 
    757     # Concatenate the predictions

~\anaconda3\lib\site-packages\sklearn\base.py in clone(estimator, safe)
     65                             "it does not seem to be a scikit-learn estimator "
     66                             "as it does not implement a 'get_params' methods."
---> 67                             % (repr(estimator), type(estimator)))
     68     klass = estimator.__class__
     69     new_object_params = estimator.get_params(deep=False)

类型错误是

TypeError: Cannot clone object '<tensorflow.python.keras.engine.sequential.Sequential object at 0x000001577B632148>' (type <class 'tensorflow.python.keras.engine.sequential.Sequential'>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' methods.
python keras scikit-learn neural-network sequential
4个回答
4
投票

重点是,您的模型不是 sklearn 估计器,如错误所示(特别是,它缺少

.get_params()
方法),而
cross_val_predict()
需要将 sklearn 估计器 传递给它。

解决此问题的一种方法是将 Keras 模型包装在一个对象中,该对象通过

scikeras.wrappers.KerasClassifier
对象模仿常规 sklearn 估计器。一旦定义了
KerasClassifier
,您就可以将其用作经典的 sklearn 分类器,因此将其传递给
cross_val_predict()

这是一个从您的代码片段开始的工作示例:

!pip install scikeras

from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score, cross_val_predict
from sklearn.datasets import make_classification
from sklearn import metrics
from sklearn.preprocessing import MinMaxScaler
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout
from tensorflow.keras.callbacks import EarlyStopping

X, y = make_classification(n_samples=10000, n_features=70, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=101)

scaler = MinMaxScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

model = Sequential()
model.add(Dense(units=70,activation='relu'))
model.add(Dropout(0.7))

model.add(Dense(units=15,activation='relu'))
model.add(Dropout(0.7))

model.add(Dense(units=1,activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam')

early_stop = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=25)
model.fit(x=X_train, 
      y=y_train, 
      epochs=600,
      validation_data=(X_test, y_test), verbose=1,
      callbacks=[early_stop]
)

# define the KerasClassifier object and use it in cross_val_predict
keras_clf = KerasClassifier(model = model, optimizer="adam", epochs=100, verbose=0)

predictions = cross_val_predict(keras_clf, X_train, y_train, cv=3)

0
投票

对于那些由于上述语句而出现以下错误的人:无法克隆对象,因为构造函数不设置或修改参数层

将图层从列表数组更改为元组数组:layers => [(20,), (45, 30, 15), (40, 20)] 不要忘记在 (20,) 之后添加逗号,否则会出现另一个错误/警告 - FitFailedWarning: Estimator fit failed。这些参数的训练测试分区的分数将设置为 nan。详细信息: TypeError: 'int' object is not iterable 因为不带逗号的单个元组被视为 int。


0
投票

从 scikeras.wrappers 导入 KerasClassifier、KerasRegressor 试试这个


-2
投票

我认为你需要使用Keras参数

© www.soinside.com 2019 - 2024. All rights reserved.