当我尝试了所有测试并且得到了合理的分数时，为什么我的混淆矩阵是这样的？

Question

我正在使用 sklearn 的随机森林分类，除了混淆矩阵之外，我在所有方面都得到了不错的结果，这里是代码和结果

这不是我所期望的，特别是因为训练量仅为训练数据集中 677k 的训练量的 1/3，但在混淆矩阵中它只处理所有标签 0。

型号：

import time
# Record the starting time
start_time = time.time()

# Random Forest classifier
rf = RandomForestClassifier()

# Define the parameter grid
rf_param_grid = {'n_estimators': [45], 'criterion': ['entropy'], 'max_depth': [30]}

# Grid search
rf_cv = GridSearchCV(rf, rf_param_grid, cv=7)
rf_cv.fit(X_train, y_train)

# Record the ending time
end_time = time.time()

# Calculate the elapsed time
elapsed_time = end_time - start_time

# Print the results
print("Best Score:", rf_cv.best_score_)
print("Best Parameters:", rf_cv.best_params_)
print("Elapsed Time:", elapsed_time, "seconds")

我在这里每堂课都取得了98%以上的好成绩：

# Make predictions on the training data
y_train_pred = rf_cv.predict(X_train)

# Compute accuracy
accuracy = accuracy_score(y_train, y_train_pred)

# Compute precision, recall, and F1-score for each class
precision = precision_score(y_train, y_train_pred, average=None)
recall = recall_score(y_train, y_train_pred, average=None)
f1 = f1_score(y_train, y_train_pred, average=None)

# Compute macro-averaged precision, recall, and F1-score
macro_precision = precision_score(y_train, y_train_pred, average='macro')
macro_recall = recall_score(y_train, y_train_pred, average='macro')
macro_f1 = f1_score(y_train, y_train_pred, average='macro')

# Print the evaluation metrics
print("Accuracy:", accuracy)
print("Precision (Class 0, 1, 2):", precision)
print("Recall (Class 0, 1, 2):", recall)
print("F1-score (Class 0, 1, 2):", f1)
print("Macro-averaged Precision:", macro_precision)
print("Macro-averaged Recall:", macro_recall)
print("Macro-averaged F1-score:", macro_f1)

混淆矩阵，它不显示除 0 类之外的所有标签

# Generate the confusion matrix
conf_matrix = confusion_matrix(y_train, y_train_pred)

# Define class labels
class_labels = ['Class 0', 'Class 1', 'Class 2']

# Visualize the confusion matrix with class labels
plt.figure(figsize=(8, 6))
sns.heatmap(conf_matrix, annot=True, fmt="d", cmap="Blues", xticklabels=class_labels, yticklabels=class_labels)
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.title('Confusion Matrix')
plt.show()

Answer 1

问题似乎出在 matplotlib/seaborn 上。（我无法重现它；也许您需要为我们提供一个可重现的示例，其中包含您编写的确切代码和数据集。）

您可以将混淆矩阵显示/打印为数据帧，而不是使用绘图。

import pandas as pd
from sklearn.metrics import confusion_matrix

def get_confusion_matrix_df(classifier, X, y):
    """Return the confusion matrix as a DataFrame."""
    labels = classifier.classes_
    columns_labels = pd.MultiIndex.from_product([["Predicted"], labels])
    index_labels = pd.MultiIndex.from_product([["Actual"], labels])
    prediction = classifier.predict(X)
    matrix = confusion_matrix(y, prediction, labels=labels)
    return pd.DataFrame(matrix, columns=columns_labels, index=index_labels)

get_confusion_matrix_df(rf_cv, X_train, y_train)

示例：

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split 
from sklearn.ensemble import RandomForestClassifier

X, y = load_iris(return_X_y=True, as_frame=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

get_confusion_matrix_df(model, X_test, y_test)

结果：

当我尝试了所有测试并且得到了合理的分数时，为什么我的混淆矩阵是这样的？

问题描述投票：0回答：1

1个回答

最新问题

当我尝试了所有测试并且得到了合理的分数时，为什么我的混淆矩阵是这样的？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1