在 randomForestClassifier 上使用 GridsearchCV 时遇到的问题

Question

我正在使用 RandomForestClassifier 处理与心脏病相关的分类问题。在对 RandomForestClassifier 执行超参数调整时，我面临以下问题。我正在使用

sklearn

Pipeline

和

ColumnTransformer for preprocessing

。

Error: 720 fits failed out of a total of 2160.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.
UserWarning: One or more of the test scores are non-finite

numerical_pipeline = Pipeline(
steps=[('scaler',StandardScaler())]
)

categorical_pipeline = Pipeline(
steps=[('encoder',OneHotEncoder(handle_unknown='ignore'))]  
)

preprocessor = ColumnTransformer(
[('numerical_pipeline',numerical_pipeline,numerical_features),
 ('categorical_pipeline',categorical_pipeline,categorical_features)]`

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3)`

scaled_X_train = preprocessor.fit_transform(X_train)
scaled_X_test = preprocessor.transform(X_test)`

param_grid={'max_depth':[3,5,10,None],
          'n_estimators':[10,100,200],
          'max_features':[1,3,5,7],
          'min_samples_leaf':[1,2,3],
          'min_samples_split':[1,2,3]
       }

grid = GridSearchCV(RandomForestClassifier(),param_grid=param_grid,cv=5,scoring='accuracy',verbose=True,n_jobs=-1)
grid.fit(scaled_X_train,y_train)

Answer 1

从错误消息看来，某些超参数组合可能会导致错误情况。您的某些配合运行良好，但一部分失败。从

的值列表中删除

min_samples_split

，因为它必须为 2 或更大。

如果这不能解决错误，请将

error_score='raise'

添加到

GridSearchCV

，以便在遇到错误时打印完整的堆栈跟踪。

在 randomForestClassifier 上使用 GridsearchCV 时遇到的问题

问题描述投票：0回答：1

1个回答

最新问题

在 randomForestClassifier 上使用 GridsearchCV 时遇到的问题

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1