我正在使用 RandomForestClassifier 处理与心脏病相关的分类问题。在对 RandomForestClassifier 执行超参数调整时,我面临以下问题。我正在使用
sklearn
Pipeline
和 ColumnTransformer for preprocessing
。
Error: 720 fits failed out of a total of 2160.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.
UserWarning: One or more of the test scores are non-finite
numerical_pipeline = Pipeline(
steps=[('scaler',StandardScaler())]
)
categorical_pipeline = Pipeline(
steps=[('encoder',OneHotEncoder(handle_unknown='ignore'))]
)
preprocessor = ColumnTransformer(
[('numerical_pipeline',numerical_pipeline,numerical_features),
('categorical_pipeline',categorical_pipeline,categorical_features)]`
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3)`
scaled_X_train = preprocessor.fit_transform(X_train)
scaled_X_test = preprocessor.transform(X_test)`
param_grid={'max_depth':[3,5,10,None],
'n_estimators':[10,100,200],
'max_features':[1,3,5,7],
'min_samples_leaf':[1,2,3],
'min_samples_split':[1,2,3]
}
grid = GridSearchCV(RandomForestClassifier(),param_grid=param_grid,cv=5,scoring='accuracy',verbose=True,n_jobs=-1)
grid.fit(scaled_X_train,y_train)
从错误消息看来,某些超参数组合可能会导致错误情况。您的某些配合运行良好,但一部分失败。从
1
的值列表中删除 min_samples_split
,因为它必须为 2 或更大。
如果这不能解决错误,请将
error_score='raise'
添加到 GridSearchCV
,以便在遇到错误时打印完整的堆栈跟踪。