在此post中被提及
param_grid = {'max_depth': np.arange(3, 10)}
tree = GridSearchCV(DecisionTreeClassifier(), param_grid)
tree.fit(xtrain, ytrain)
tree_preds = tree.predict_proba(xtest)[:, 1]
tree_performance = roc_auc_score(ytest, tree_preds)
Q1:执行上述步骤并获得最佳参数后,是否需要为所有数据(训练+验证)和学习的参数拟合一棵树?
Q2:max_depth在参数中特别提到,可以通过访问tree.best_params_来获得它,网格找到的其他参数呢?如何访问那些以构建好的树?
回答第一个问题,当您创建GridSearchCV
对象时,可以将参数refit
设置为True
(默认值为True
),该参数将使用整个数据集上发现的最佳参数返回一个估计量,可以通过best_estimator_
属性访问。它的行为类似于常规估计器,并且像其他任何sklearn估计器一样支持.predict
方法。
现在回答您的第二个问题,您可以使用best_estimator_
属性本身来访问决策树模型的所有参数,该参数用于拟合最终估计量,但是正如我之前所说,您不需要为新分类器选择最佳参数,因为refit=True
会为您完成。
请按照下面的示例代码来更好地理解:
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import GridSearchCV
from sklearn.tree import DecisionTreeClassifier
X, y = make_classification(random_state=0)
param_grid = {'max_depth': np.arange(3, 10), 'min_samples_leaf':np.arange(2,10)}
tree = GridSearchCV(DecisionTreeClassifier(), param_grid)
tree.fit(X, y)
GridSearchCV(cv=None, error_score=nan,
estimator=DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None,
criterion='gini', max_depth=None,
max_features=None,
max_leaf_nodes=None,
min_impurity_decrease=0.0,
min_impurity_split=None,
min_samples_leaf=1,
min_samples_split=2,
min_weight_fraction_leaf=0.0,
presort='deprecated',
random_state=None,
splitter='best'),
iid='deprecated', n_jobs=None,
param_grid={'max_depth': array([3, 4, 5, 6, 7, 8, 9]),
'min_samples_leaf': array([2, 3, 4, 5, 6, 7, 8, 9])},
pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
scoring=None, verbose=0)
# This is how your best estimator looks like
print(tree.best_estimator_)
DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',
max_depth=3, max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=6, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort='deprecated',
random_state=None, splitter='best')
# you can directly use it for prediction as shown below
tree.best_estimator_.predict(X)
array([0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1,
0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0,
0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1,
1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1,
0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0])
希望这会有所帮助!