是否可以通过一些解决方法从 cross_val_score 获取分类报告?我正在使用嵌套交叉验证,我可以在这里获得模型的各种分数,但是,我想查看外循环的分类报告。有什么建议吗?
# Choose cross-validation techniques for the inner and outer loops,
# independently of the dataset.
# E.g "LabelKFold", "LeaveOneOut", "LeaveOneLabelOut", etc.
inner_cv = KFold(n_splits=4, shuffle=True, random_state=i)
outer_cv = KFold(n_splits=4, shuffle=True, random_state=i)
# Non_nested parameter search and scoring
clf = GridSearchCV(estimator=svr, param_grid=p_grid, cv=inner_cv)
# Nested CV with parameter optimization
nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv)
我想在此处查看分类报告以及分数值。 http://scikit-learn.org/stable/modules/ generated/sklearn.metrics.classification_report.html
我们可以定义自己的评分函数如下:
from sklearn.metrics import classification_report, accuracy_score, make_scorer
def classification_report_with_accuracy_score(y_true, y_pred):
print classification_report(y_true, y_pred) # print classification report
return accuracy_score(y_true, y_pred) # return accuracy score
现在,只需使用我们的新评分函数调用
cross_val_score
,使用 make_scorer
:
# Nested CV with parameter optimization
nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv, \
scoring=make_scorer(classification_report_with_accuracy_score))
print nested_score
它将以文本形式打印分类报告,同时以数字形式返回
nested_score
。
http://scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html使用此新评分函数运行时的示例,输出的最后几行将如下所示:
# precision recall f1-score support
#0 1.00 1.00 1.00 14
#1 1.00 1.00 1.00 14
#2 1.00 1.00 1.00 9
#avg / total 1.00 1.00 1.00 37
#[ 0.94736842 1. 0.97297297 1. ]
#Average difference of 0.007742 with std. dev. of 0.007688.
这只是对 Sandipan 答案的补充,因为我无法编辑它。如果我们想计算整个交叉验证运行的平均分类报告而不是单个折叠,我们可以使用以下代码:
# Variables for average classification report
originalclass = []
predictedclass = []
#Make our customer score
def classification_report_with_accuracy_score(y_true, y_pred):
originalclass.extend(y_true)
predictedclass.extend(y_pred)
return accuracy_score(y_true, y_pred) # return accuracy score
inner_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=i)
outer_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=i)
# Non_nested parameter search and scoring
clf = GridSearchCV(estimator=svr, param_grid=p_grid, cv=inner_cv)
# Nested CV with parameter optimization
nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv, scoring=make_scorer(classification_report_with_accuracy_score))
# Average values in classification report for all folds in a K-fold Cross-validation
print(classification_report(originalclass, predictedclass))
现在 Sandipan 的答案中示例的结果将如下所示:
precision recall f1-score support
0 1.00 1.00 1.00 50
1 0.96 0.94 0.95 50
2 0.94 0.96 0.95 50
avg / total 0.97 0.97 0.97 150
我们可以收集所有折叠的预测并将其提供给
classification_report
from sklearn.metrics import classification_report
from sklearn.model_selection import RepeatedStratifiedKFold
import numpy as np
# pretend we have defined model
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=42)
y_true = []
y_pred = []
for train,test in cv.split(X,y):
X_train = X[train]
X_test = X[test]
y_train = y[train]
y_test = y[test]
y_test_pred = model.fit(X_train,y_train).predict(X_test)
y_true.append(y_test)
y_pred.append(y_test_pred)
y_true=np.concatenate(y_true)
y_pred=np.concatenate(y_pred)
# y_classes contains names for classes used
print(classification_report(y_true,y_pred,target_names=y_classes))