n_estimators 总能提高随机森林的性能?

问题描述 投票:0回答:1

我因 n_estimators 的最低值而获得最高分。据我了解,更多的树应该总是会提高性能。谁能解释一下这里发生了什么?

输入:

# estimate n_estimators

param_test1 = {'n_estimators': range(20, 800, 30)}

clf = RandomForestClassifier(random_state = 10,
                         oob_score = True,
                         max_depth = 6, 
                         max_features = 'sqrt')

gsearch1 = GridSearchCV(
    estimator=clf, 
    param_grid=param_test1,
    scoring='roc_auc',
    iid=False,
    cv=5)

gsearch1.fit(X, y)
gsearch1.grid_scores_, gsearch1.best_params_, gsearch1.best_score_

输出:

([mean: 0.87685, std: 0.03149, params: {u'n_estimators': 20},
  mean: 0.87551, std: 0.02979, params: {u'n_estimators': 50},
  mean: 0.87588, std: 0.02970, params: {u'n_estimators': 80},
  mean: 0.87545, std: 0.03043, params: {u'n_estimators': 110},
  mean: 0.87593, std: 0.02979, params: {u'n_estimators': 140},
  mean: 0.87506, std: 0.02913, params: {u'n_estimators': 170},
  mean: 0.87599, std: 0.02890, params: {u'n_estimators': 200},
  mean: 0.87559, std: 0.02875, params: {u'n_estimators': 230},
  mean: 0.87561, std: 0.02890, params: {u'n_estimators': 260},
  mean: 0.87500, std: 0.02867, params: {u'n_estimators': 290},
  mean: 0.87476, std: 0.02848, params: {u'n_estimators': 320},
  mean: 0.87434, std: 0.02800, params: {u'n_estimators': 350},
  mean: 0.87408, std: 0.02823, params: {u'n_estimators': 380},
  mean: 0.87461, std: 0.02789, params: {u'n_estimators': 410},
  mean: 0.87452, std: 0.02764, params: {u'n_estimators': 440},
  mean: 0.87466, std: 0.02775, params: {u'n_estimators': 470},
  mean: 0.87498, std: 0.02805, params: {u'n_estimators': 500},
  mean: 0.87530, std: 0.02797, params: {u'n_estimators': 530},
  mean: 0.87519, std: 0.02760, params: {u'n_estimators': 560},
  mean: 0.87498, std: 0.02789, params: {u'n_estimators': 590},
  mean: 0.87529, std: 0.02784, params: {u'n_estimators': 620},
  mean: 0.87526, std: 0.02792, params: {u'n_estimators': 650},
  mean: 0.87553, std: 0.02807, params: {u'n_estimators': 680},
  mean: 0.87540, std: 0.02794, params: {u'n_estimators': 710},
  mean: 0.87561, std: 0.02786, params: {u'n_estimators': 740},
  mean: 0.87554, std: 0.02814, params: {u'n_estimators': 770}],
 {u'n_estimators': 20},
 0.87684895838888188)
python machine-learning scikit-learn classification random-forest
1个回答
0
投票

在随机森林中使用较少数量的树(n_估计器)获得最高分可能会由于各种原因而发生,例如过度拟合、有利于更简单模型的数据集特征、超参数的交互、训练过程中的随机性、交叉验证的可变性,以及网格搜索在有效探索超参数空间方面的局限性。

© www.soinside.com 2019 - 2024. All rights reserved.