我正在使用 sklearn 的随机森林分类,除了混淆矩阵之外,我在所有方面都得到了不错的结果,这里是代码和结果
这不是我所期望的,特别是因为训练量仅为训练数据集中 677k 的训练量的 1/3,但在混淆矩阵中它只处理所有标签 0。
型号:
import time
# Record the starting time
start_time = time.time()
# Random Forest classifier
rf = RandomForestClassifier()
# Define the parameter grid
rf_param_grid = {'n_estimators': [45], 'criterion': ['entropy'], 'max_depth': [30]}
# Grid search
rf_cv = GridSearchCV(rf, rf_param_grid, cv=7)
rf_cv.fit(X_train, y_train)
# Record the ending time
end_time = time.time()
# Calculate the elapsed time
elapsed_time = end_time - start_time
# Print the results
print("Best Score:", rf_cv.best_score_)
print("Best Parameters:", rf_cv.best_params_)
print("Elapsed Time:", elapsed_time, "seconds")
我在这里每堂课都取得了98%以上的好成绩:
# Make predictions on the training data
y_train_pred = rf_cv.predict(X_train)
# Compute accuracy
accuracy = accuracy_score(y_train, y_train_pred)
# Compute precision, recall, and F1-score for each class
precision = precision_score(y_train, y_train_pred, average=None)
recall = recall_score(y_train, y_train_pred, average=None)
f1 = f1_score(y_train, y_train_pred, average=None)
# Compute macro-averaged precision, recall, and F1-score
macro_precision = precision_score(y_train, y_train_pred, average='macro')
macro_recall = recall_score(y_train, y_train_pred, average='macro')
macro_f1 = f1_score(y_train, y_train_pred, average='macro')
# Print the evaluation metrics
print("Accuracy:", accuracy)
print("Precision (Class 0, 1, 2):", precision)
print("Recall (Class 0, 1, 2):", recall)
print("F1-score (Class 0, 1, 2):", f1)
print("Macro-averaged Precision:", macro_precision)
print("Macro-averaged Recall:", macro_recall)
print("Macro-averaged F1-score:", macro_f1)
混淆矩阵,它不显示除 0 类之外的所有标签
# Generate the confusion matrix
conf_matrix = confusion_matrix(y_train, y_train_pred)
# Define class labels
class_labels = ['Class 0', 'Class 1', 'Class 2']
# Visualize the confusion matrix with class labels
plt.figure(figsize=(8, 6))
sns.heatmap(conf_matrix, annot=True, fmt="d", cmap="Blues", xticklabels=class_labels, yticklabels=class_labels)
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.title('Confusion Matrix')
plt.show()
问题似乎出在 matplotlib/seaborn 上。 (我无法重现它;也许您需要为我们提供一个可重现的示例,其中包含您编写的确切代码和数据集。)
您可以将混淆矩阵显示/打印为数据帧,而不是使用绘图。
import pandas as pd
from sklearn.metrics import confusion_matrix
def get_confusion_matrix_df(classifier, X, y):
"""Return the confusion matrix as a DataFrame."""
labels = classifier.classes_
columns_labels = pd.MultiIndex.from_product([["Predicted"], labels])
index_labels = pd.MultiIndex.from_product([["Actual"], labels])
prediction = classifier.predict(X)
matrix = confusion_matrix(y, prediction, labels=labels)
return pd.DataFrame(matrix, columns=columns_labels, index=index_labels)
get_confusion_matrix_df(rf_cv, X_train, y_train)
示例:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
X, y = load_iris(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
get_confusion_matrix_df(model, X_test, y_test)
结果: