当我尝试了所有测试并且得到了合理的分数时,为什么我的混淆矩阵是这样的?

问题描述 投票:0回答:1

我正在使用 sklearn 的随机森林分类,除了混淆矩阵之外,我在所有方面都得到了不错的结果,这里是代码和结果

The label distribution for the training and testing

The size of the train set

The model

Scores for the training model

Here is the issue

这不是我所期望的,特别是因为训练量仅为训练数据集中 677k 的训练量的 1/3,但在混淆矩阵中它只处理所有标签 0。

型号:

import time
# Record the starting time
start_time = time.time()

# Random Forest classifier
rf = RandomForestClassifier()

# Define the parameter grid
rf_param_grid = {'n_estimators': [45], 'criterion': ['entropy'], 'max_depth': [30]}

# Grid search
rf_cv = GridSearchCV(rf, rf_param_grid, cv=7)
rf_cv.fit(X_train, y_train)

# Record the ending time
end_time = time.time()

# Calculate the elapsed time
elapsed_time = end_time - start_time

# Print the results
print("Best Score:", rf_cv.best_score_)
print("Best Parameters:", rf_cv.best_params_)
print("Elapsed Time:", elapsed_time, "seconds")

我在这里每堂课都取得了98%以上的好成绩:

# Make predictions on the training data
y_train_pred = rf_cv.predict(X_train)

# Compute accuracy
accuracy = accuracy_score(y_train, y_train_pred)

# Compute precision, recall, and F1-score for each class
precision = precision_score(y_train, y_train_pred, average=None)
recall = recall_score(y_train, y_train_pred, average=None)
f1 = f1_score(y_train, y_train_pred, average=None)

# Compute macro-averaged precision, recall, and F1-score
macro_precision = precision_score(y_train, y_train_pred, average='macro')
macro_recall = recall_score(y_train, y_train_pred, average='macro')
macro_f1 = f1_score(y_train, y_train_pred, average='macro')

# Print the evaluation metrics
print("Accuracy:", accuracy)
print("Precision (Class 0, 1, 2):", precision)
print("Recall (Class 0, 1, 2):", recall)
print("F1-score (Class 0, 1, 2):", f1)
print("Macro-averaged Precision:", macro_precision)
print("Macro-averaged Recall:", macro_recall)
print("Macro-averaged F1-score:", macro_f1)

混淆矩阵,它不显示除 0 类之外的所有标签

# Generate the confusion matrix
conf_matrix = confusion_matrix(y_train, y_train_pred)

# Define class labels
class_labels = ['Class 0', 'Class 1', 'Class 2']

# Visualize the confusion matrix with class labels
plt.figure(figsize=(8, 6))
sns.heatmap(conf_matrix, annot=True, fmt="d", cmap="Blues", xticklabels=class_labels, yticklabels=class_labels)
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.title('Confusion Matrix')
plt.show()
python machine-learning scikit-learn random-forest confusion-matrix
1个回答
0
投票

问题似乎出在 matplotlib/seaborn 上。 (我无法重现它;也许您需要为我们提供一个可重现的示例,其中包含您编写的确切代码和数据集。)

您可以将混淆矩阵显示/打印为数据帧,而不是使用绘图。

import pandas as pd
from sklearn.metrics import confusion_matrix

def get_confusion_matrix_df(classifier, X, y):
    """Return the confusion matrix as a DataFrame."""
    labels = classifier.classes_
    columns_labels = pd.MultiIndex.from_product([["Predicted"], labels])
    index_labels = pd.MultiIndex.from_product([["Actual"], labels])
    prediction = classifier.predict(X)
    matrix = confusion_matrix(y, prediction, labels=labels)
    return pd.DataFrame(matrix, columns=columns_labels, index=index_labels)
get_confusion_matrix_df(rf_cv, X_train, y_train)

示例:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split 
from sklearn.ensemble import RandomForestClassifier

X, y = load_iris(return_X_y=True, as_frame=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

get_confusion_matrix_df(model, X_test, y_test)

结果:

© www.soinside.com 2019 - 2024. All rights reserved.