在 randomForestClassifier 上使用 GridsearchCV 时遇到的问题

问题描述 投票:0回答:1

我正在使用 RandomForestClassifier 处理与心脏病相关的分类问题。在对 RandomForestClassifier 执行超参数调整时,我面临以下问题。我正在使用

sklearn
Pipeline
ColumnTransformer for preprocessing

Error: 720 fits failed out of a total of 2160.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.
UserWarning: One or more of the test scores are non-finite
numerical_pipeline = Pipeline(
steps=[('scaler',StandardScaler())]
)

categorical_pipeline = Pipeline(
steps=[('encoder',OneHotEncoder(handle_unknown='ignore'))]  
)

preprocessor = ColumnTransformer(
[('numerical_pipeline',numerical_pipeline,numerical_features),
 ('categorical_pipeline',categorical_pipeline,categorical_features)]`

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3)`

scaled_X_train = preprocessor.fit_transform(X_train)
scaled_X_test = preprocessor.transform(X_test)`

param_grid={'max_depth':[3,5,10,None],
          'n_estimators':[10,100,200],
          'max_features':[1,3,5,7],
          'min_samples_leaf':[1,2,3],
          'min_samples_split':[1,2,3]
       }

grid = GridSearchCV(RandomForestClassifier(),param_grid=param_grid,cv=5,scoring='accuracy',verbose=True,n_jobs=-1)
grid.fit(scaled_X_train,y_train)
python machine-learning scikit-learn random-forest grid-search
1个回答
0
投票

从错误消息看来,某些超参数组合可能会导致错误情况。您的某些配合运行良好,但一部分失败。从

1
的值列表中删除
min_samples_split
,因为它必须为 2 或更大。

如果这不能解决错误,请将

error_score='raise'
添加到
GridSearchCV
,以便在遇到错误时打印完整的堆栈跟踪。

© www.soinside.com 2019 - 2024. All rights reserved.