KFolds交叉验证与train_test_split

问题描述 投票:2回答:2

我今天刚刚建立了我的第一个random forest classifier,我正在努力提高它的性能。我正在阅读cross-validation如何避免数据的overfitting,从而获得更好的结果。我使用StratifiedKFold实现了sklearn,然而,令人惊讶的是这种方法不太准确。我读过很多帖子,表明cross-validatingtrain_test_split更有效率。

估算:

rf = RandomForestClassifier(n_estimators=100, random_state=42)

K-折:

ss = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)
for train_index, test_index in ss.split(features, labels):
    train_features, test_features = features[train_index], features[test_index]
    train_labels, test_labels = labels[train_index], labels[test_index]

TTS:

train_feature, test_feature, train_label, test_label = \
    train_test_split(features, labels, train_size=0.8, test_size=0.2, random_state=42)

以下是结果:

CV:

AUROC:  0.74
Accuracy Score:  74.74 %.
Specificity:  0.69
Precision:  0.75
Sensitivity:  0.79
Matthews correlation coefficient (MCC):  0.49
F1 Score:  0.77

TTS:

AUROC:  0.76
Accuracy Score:  76.23 %.
Specificity:  0.77
Precision:  0.79
Sensitivity:  0.76
Matthews correlation coefficient (MCC):  0.52
F1 Score:  0.77

这有可能吗?或者我错误地设置了我的模型?

另外,这是使用交叉验证的正确方法吗?

python machine-learning scikit-learn cross-validation
2个回答
© www.soinside.com 2019 - 2024. All rights reserved.