不同的交叉验证技术产生相同的评估指标

问题描述 投票:0回答:1

我实现了三种 ML 算法(K 最近邻、决策树和随机森林),并使用四种不同的交叉验证技术(Hold-Out 方法、留一方法、K 折叠交叉验证、分层 K-每个算法的折叠交叉验证)。目标是评估性能指标并比较技术和算法。我的代码可以运行,但不同技术的评估指标值是相同的。这些值相同是正常的还是我做错了什么?

这是我的代码的一部分:

# Initialize classifiers
knn = KNeighborsClassifier(n_neighbors=5, metric='minkowski', p=2)
dtree = DecisionTreeClassifier(random_state=42)
rf = RandomForestClassifier(n_estimators=20, criterion='entropy', random_state=0)

classifiers = {'KNN': knn, 'Decision Tree': dtree, 'Random Forest': rf}

# Define cross-validation methods
loo = LeaveOneOut()
kf = KFold(10)
skf = StratifiedKFold(n_splits=5)

cv_methods = {'Hold-Out Method': (X_train, X_test, y_train, y_test),
              'Leave-One-Out Method': loo,
              'K-Fold Cross-Validation': kf,
              'Stratified K-Fold Cross-Validation': skf}

# Perform classification and evaluation for each classifier and cross-validation method
for clf_name, clf in classifiers.items():
    print(f"Classifier: {clf_name}")
    for cv_name, cv_method in cv_methods.items():
        if cv_name == 'Hold-Out Method':
            X_train_cv, X_test_cv, y_train_cv, y_test_cv = cv_method
            clf.fit(X_train_cv, y_train_cv)
            y_pred = clf.predict(X_test_cv)
        else:
            scores = cross_val_score(clf, X, y, cv=cv_method, scoring='accuracy')
            

        # Calculate evaluation metrics
        accuracy = accuracy_score(y_test_cv, y_pred)
        precision = precision_score(y_test_cv, y_pred, average='weighted')
        recall = recall_score(y_test_cv, y_pred, average='weighted')
        f1 = f1_score(y_test_cv, y_pred, average='weighted')
        confusion = confusion_matrix(y_test_cv, y_pred)

这是输出,对于每个分类器和交叉验证方法来说都是相同的:

Classifier: KNN
Hold-Out Method Metrics for KNN:
Accuracy: 0.864620939
Precision: 0.8661
Recall: 0.8646
F1 Score: 0.8652
Confusion Matrix:
[[326  41]
 [ 34 153]]

Leave-One-Out Method Metrics for KNN:
Accuracy: 0.864620939
Precision: 0.8661
Recall: 0.8646
F1 Score: 0.8652
Confusion Matrix:
[[326  41]
 [ 34 153]]

K-Fold Cross-Validation Metrics for KNN:
Accuracy: 0.864620939
Precision: 0.8661
Recall: 0.8646
F1 Score: 0.8652
Confusion Matrix:
[[326  41]
 [ 34 153]]

Stratified K-Fold Cross-Validation Metrics for KNN:
Accuracy: 0.864620939
Precision: 0.8661
Recall: 0.8646
F1 Score: 0.8652
Confusion Matrix:
[[326  41]
 [ 34 153]]

Classifier: Decision Tree
Hold-Out Method Metrics for Decision Tree:
Accuracy: 0.980144404
Precision: 0.9801
Recall: 0.9801
F1 Score: 0.9801
Confusion Matrix:
[[363   4]
 [  7 180]]

Leave-One-Out Method Metrics for Decision Tree:
Accuracy: 0.980144404
Precision: 0.9801
Recall: 0.9801
F1 Score: 0.9801
Confusion Matrix:
[[363   4]
 [  7 180]]

K-Fold Cross-Validation Metrics for Decision Tree:
Accuracy: 0.980144404
Precision: 0.9801
Recall: 0.9801
F1 Score: 0.9801
Confusion Matrix:
[[363   4]
 [  7 180]]

Stratified K-Fold Cross-Validation Metrics for Decision Tree:
Accuracy: 0.980144404
Precision: 0.9801
Recall: 0.9801
F1 Score: 0.9801
Confusion Matrix:
[[363   4]
 [  7 180]]

Classifier: Random Forest
Hold-Out Method Metrics for Random Forest:
Accuracy: 0.981949458
Precision: 0.9820
Recall: 0.9819
F1 Score: 0.9819
Confusion Matrix:
[[364   3]
 [  7 180]]

Leave-One-Out Method Metrics for Random Forest:
Accuracy: 0.981949458
Precision: 0.9820
Recall: 0.9819
F1 Score: 0.9819
Confusion Matrix:
[[364   3]
 [  7 180]]

K-Fold Cross-Validation Metrics for Random Forest:
Accuracy: 0.981949458
Precision: 0.9820
Recall: 0.9819
F1 Score: 0.9819
Confusion Matrix:
[[364   3]
 [  7 180]]

Stratified K-Fold Cross-Validation Metrics for Random Forest:
Accuracy: 0.981949458
Precision: 0.9820
Recall: 0.9819
F1 Score: 0.9819
Confusion Matrix:
[[364   3]
 [  7 180]]

为什么会发生这种情况?

python machine-learning scikit-learn cross-validation
1个回答
0
投票

线路

scores = cross_val_score(clf, X, y, cv=cv_method, scoring='accuracy')

不修改

clf
对象以使其适合。 (我自己也犯过几次这样的错误。这有点误导,因为你会看到模型适合控制台。)

现在发生的情况是模型已安装在该部分中

if cv_name == 'Hold-Out Method':
            X_train_cv, X_test_cv, y_train_cv, y_test_cv = cv_method
            clf.fit(X_train_cv, y_train_cv)
            y_pred = clf.predict(X_test_cv)

并且您正在使用该模型 4 次来评估。

要对此进行测试,请从

cv_methods
中删除“Hold-Out Method”,您可能会收到模型尚未拟合的错误。

© www.soinside.com 2019 - 2024. All rights reserved.