GridSearchCV和树分类器

问题描述 投票:0回答:1

在此post中被提及

  param_grid = {'max_depth': np.arange(3, 10)}
  tree = GridSearchCV(DecisionTreeClassifier(), param_grid)
  tree.fit(xtrain, ytrain)
  tree_preds = tree.predict_proba(xtest)[:, 1]
  tree_performance = roc_auc_score(ytest, tree_preds)

Q1:执行上述步骤并获得最佳参数后,是否需要为所有数据(训练+验证)和学习的参数拟合一棵树?

Q2:max_depth在参数中特别提到,可以通过访问tree.best_params_来获得它,网格找到的其他参数呢?如何访问那些以构建好的树?

python scikit-learn tree classification gridsearchcv
1个回答
0
投票

回答第一个问题,当您创建GridSearchCV对象时,可以将参数refit设置为True(默认值为True),该参数将使用整个数据集上发现的最佳参数返回一个估计量,可以通过best_estimator_属性访问。它的行为类似于常规估计器,并且像其他任何sklearn估计器一样支持.predict方法。

现在回答您的第二个问题,您可以使用best_estimator_属性本身来访问决策树模型的所有参数,该参数用于拟合最终估计量,但是正如我之前所说,您不需要为新分类器选择最佳参数,因为refit=True会为您完成。

请按照下面的示例代码来更好地理解:

import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import GridSearchCV
from sklearn.tree import DecisionTreeClassifier

X, y = make_classification(random_state=0)
param_grid = {'max_depth': np.arange(3, 10), 'min_samples_leaf':np.arange(2,10)}
tree = GridSearchCV(DecisionTreeClassifier(), param_grid)
tree.fit(X, y)
GridSearchCV(cv=None, error_score=nan,
             estimator=DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None,
                                              criterion='gini', max_depth=None,
                                              max_features=None,
                                              max_leaf_nodes=None,
                                              min_impurity_decrease=0.0,
                                              min_impurity_split=None,
                                              min_samples_leaf=1,
                                              min_samples_split=2,
                                              min_weight_fraction_leaf=0.0,
                                              presort='deprecated',
                                              random_state=None,
                                              splitter='best'),
             iid='deprecated', n_jobs=None,
             param_grid={'max_depth': array([3, 4, 5, 6, 7, 8, 9]),
                         'min_samples_leaf': array([2, 3, 4, 5, 6, 7, 8, 9])},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring=None, verbose=0)

# This is how your best estimator looks like
print(tree.best_estimator_)
DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',
                       max_depth=3, max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=6, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort='deprecated',
                       random_state=None, splitter='best')

# you can directly use it for prediction as shown below
tree.best_estimator_.predict(X) 
array([0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1,
       0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0,
       0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1,
       1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1,
       0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0])

希望这会有所帮助!

© www.soinside.com 2019 - 2024. All rights reserved.