KFolds交叉验证与train_test_split

Question

我今天刚刚建立了我的第一个random forest classifier，我正在努力提高它的性能。我正在阅读cross-validation如何避免数据的overfitting，从而获得更好的结果。我使用StratifiedKFold实现了sklearn，然而，令人惊讶的是这种方法不太准确。我读过很多帖子，表明cross-validating比train_test_split更有效率。

估算：

rf = RandomForestClassifier(n_estimators=100, random_state=42)

K-折：

ss = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)
for train_index, test_index in ss.split(features, labels):
    train_features, test_features = features[train_index], features[test_index]
    train_labels, test_labels = labels[train_index], labels[test_index]

TTS：

train_feature, test_feature, train_label, test_label = \
    train_test_split(features, labels, train_size=0.8, test_size=0.2, random_state=42)

以下是结果：

CV：

AUROC:  0.74
Accuracy Score:  74.74 %.
Specificity:  0.69
Precision:  0.75
Sensitivity:  0.79
Matthews correlation coefficient (MCC):  0.49
F1 Score:  0.77

TTS：

AUROC:  0.76
Accuracy Score:  76.23 %.
Specificity:  0.77
Precision:  0.79
Sensitivity:  0.76
Matthews correlation coefficient (MCC):  0.52
F1 Score:  0.77

这有可能吗？或者我错误地设置了我的模型？

另外，这是使用交叉验证的正确方法吗？

KFolds交叉验证与train_test_split

问题描述投票：2回答：2

2个回答

最新问题

KFolds交叉验证与train_test_split

问题描述 投票：2回答：2

2个回答

最新问题

问题描述投票：2回答：2