如何在 Python 中绘制 ROC 曲线

Question

我正在尝试绘制 ROC 曲线来评估我使用逻辑回归包在 Python 中开发的预测模型的准确性。我计算了真阳性率和假阳性率；但是，我无法弄清楚如何使用

matplotlib

正确绘制这些并计算 AUC 值。我怎么能那样做？

Answer 1

假设您的

model

是 sklearn 预测器，您可以尝试以下两种方法：

import sklearn.metrics as metrics
# calculate the fpr and tpr for all thresholds of the classification
probs = model.predict_proba(X_test)
preds = probs[:,1]
fpr, tpr, threshold = metrics.roc_curve(y_test, preds)
roc_auc = metrics.auc(fpr, tpr)

# method I: plt
import matplotlib.pyplot as plt
plt.title('Receiver Operating Characteristic')
plt.plot(fpr, tpr, 'b', label = 'AUC = %0.2f' % roc_auc)
plt.legend(loc = 'lower right')
plt.plot([0, 1], [0, 1],'r--')
plt.xlim([0, 1])
plt.ylim([0, 1])
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show()

# method II: ggplot
from ggplot import *
df = pd.DataFrame(dict(fpr = fpr, tpr = tpr))
ggplot(df, aes(x = 'fpr', y = 'tpr')) + geom_line() + geom_abline(linetype = 'dashed')

或尝试

ggplot(df, aes(x = 'fpr', ymin = 0, ymax = 'tpr')) + geom_line(aes(y = 'tpr')) + geom_area(alpha = 0.2) + ggtitle("ROC Curve w/ AUC = %s" % str(roc_auc))

Answer 2

这是绘制 ROC 曲线的最简单方法，给定一组基本事实标签和预测概率。最好的部分是，它绘制了所有类别的 ROC 曲线，因此您也可以获得多条漂亮的曲线

import scikitplot as skplt
import matplotlib.pyplot as plt

y_true = # ground truth labels
y_probas = # predicted probabilities generated by sklearn classifier
skplt.metrics.plot_roc_curve(y_true, y_probas)
plt.show()

这是 plot_roc_curve 生成的示例曲线。我使用了 scikit-learn 中的样本数字数据集，所以有 10 个类。请注意，为每个类别绘制了一条 ROC 曲线。

免责声明：请注意，这使用了我构建的 scikit-plot 库。

Answer 3

使用 matplotlib 进行二元分类的 AUC 曲线

from sklearn import svm, datasets
from sklearn import metrics
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
import matplotlib.pyplot as plt

加载乳腺癌数据集

breast_cancer = load_breast_cancer()

X = breast_cancer.data
y = breast_cancer.target

拆分数据集

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.33, random_state=44)

型号

clf = LogisticRegression(penalty='l2', C=0.1)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

准确性

print("Accuracy", metrics.accuracy_score(y_test, y_pred))

AUC 曲线

y_pred_proba = clf.predict_proba(X_test)[::,1]
fpr, tpr, _ = metrics.roc_curve(y_test,  y_pred_proba)
auc = metrics.roc_auc_score(y_test, y_pred_proba)
plt.plot(fpr,tpr,label="data 1, auc="+str(auc))
plt.legend(loc=4)
plt.show()

Answer 4

完全不清楚这里的问题是什么，但是如果你有一个数组

true_positive_rate

和一个数组

false_positive_rate

，那么绘制ROC曲线并获得AUC就很简单：

import matplotlib.pyplot as plt
import numpy as np

x = # false_positive_rate
y = # true_positive_rate 

# This is the ROC curve
plt.plot(x,y)
plt.show() 

# This is the AUC
auc = np.trapz(y,x)

Answer 5

这里是用于计算 ROC 曲线（作为散点图）的 python 代码：

import matplotlib.pyplot as plt
import numpy as np

score = np.array([0.9, 0.8, 0.7, 0.6, 0.55, 0.54, 0.53, 0.52, 0.51, 0.505, 0.4, 0.39, 0.38, 0.37, 0.36, 0.35, 0.34, 0.33, 0.30, 0.1])
y = np.array([1,1,0, 1, 1, 1, 0, 0, 1, 0, 1,0, 1, 0, 0, 0, 1 , 0, 1, 0])

# false positive rate
fpr = []
# true positive rate
tpr = []
# Iterate thresholds from 0.0, 0.01, ... 1.0
thresholds = np.arange(0.0, 1.01, .01)

# get number of positive and negative examples in the dataset
P = sum(y)
N = len(y) - P

# iterate through all thresholds and determine fraction of true positives
# and false positives found at this threshold
for thresh in thresholds:
    FP=0
    TP=0
    for i in range(len(score)):
        if (score[i] > thresh):
            if y[i] == 1:
                TP = TP + 1
            if y[i] == 0:
                FP = FP + 1
    fpr.append(FP/float(N))
    tpr.append(TP/float(P))

plt.scatter(fpr, tpr)
plt.show()

Answer 6

from sklearn import metrics
import numpy as np
import matplotlib.pyplot as plt

y_true = # true labels
y_probas = # predicted results
fpr, tpr, thresholds = metrics.roc_curve(y_true, y_probas, pos_label=0)

# Print ROC curve
plt.plot(fpr,tpr)
plt.show() 

# Print AUC
auc = np.trapz(tpr,fpr)
print('AUC:', auc)

Answer 7

基于来自 stackoverflow、scikit-learn 文档和其他一些文档的多条评论，我制作了一个 python 包，以非常简单的方式绘制 ROC 曲线（和其他指标）。

安装包：

pip install plot-metric

（更多信息在帖子末尾）

绘制 ROC 曲线（示例来自文档）：

二元分类

让我们加载一个简单的数据集并制作训练和测试集：

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
X, y = make_classification(n_samples=1000, n_classes=2, weights=[1,1], random_state=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=2)

训练分类器并预测测试集：

from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(n_estimators=50, random_state=23)
model = clf.fit(X_train, y_train)

# Use predict_proba to predict probability of the class
y_pred = clf.predict_proba(X_test)[:,1]

您现在可以使用 plot_metric 绘制 ROC 曲线：

from plot_metric.functions import BinaryClassification
# Visualisation with plot_metric
bc = BinaryClassification(y_test, y_pred, labels=["Class 1", "Class 2"])

# Figures
plt.figure(figsize=(5,5))
bc.plot_roc_curve()
plt.show()

结果：

您可以在 github 和包的文档中找到更多示例：

Github：https://github.com/yohann84L/plot_metric
文档：https://plot-metric.readthedocs.io/en/latest/

Answer 8

前面的答案假设您确实自己计算了 TP/Sens。手动执行此操作是个坏主意，计算很容易出错，而是使用库函数来完成所有这些操作。

scikit_lean 中的 plot_roc 函数完全满足您的需求： http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html

代码的本质部分是：

  for i in range(n_classes):
      fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i])
      roc_auc[i] = auc(fpr[i], tpr[i])

Answer 9

有一个名为 metriculous 的图书馆可以为您做这件事：

$ pip install metriculous

让我们首先模拟一些数据，这通常来自测试数据集和模型：

import numpy as np

def normalize(array2d: np.ndarray) -> np.ndarray:
    return array2d / array2d.sum(axis=1, keepdims=True)

class_names = ["Cat", "Dog", "Pig"]
num_classes = len(class_names)
num_samples = 500

# Mock ground truth
ground_truth = np.random.choice(range(num_classes), size=num_samples, p=[0.5, 0.4, 0.1])

# Mock model predictions
perfect_model = np.eye(num_classes)[ground_truth]
noisy_model = normalize(
    perfect_model + 2 * np.random.random((num_samples, num_classes))
)
random_model = normalize(np.random.random((num_samples, num_classes)))

现在我们可以使用 metriculous 生成包含各种指标和图表的表格，包括 ROC 曲线：

import metriculous

metriculous.compare_classifiers(
    ground_truth=ground_truth,
    model_predictions=[perfect_model, noisy_model, random_model],
    model_names=["Perfect Model", "Noisy Model", "Random Model"],
    class_names=class_names,
    one_vs_all_figures=True, # This line is important to include ROC curves in the output
).save_html("model_comparison.html").display()

输出中的ROC曲线：

绘图可缩放和拖动，将鼠标悬停在绘图上时您会获得更多详细信息：

Answer 10

也可以按照scikit的官方文档形式：

https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html#sphx-glr-auto-examples-model-selection-plot-roc-py

Answer 11

我已经为 ROC 曲线制作了一个包含在包中的简单函数。我刚开始练习机器学习所以如果这段代码有任何问题也请告诉我！

查看 github 自述文件以获取更多详细信息！ :)

https://github.com/bc123456/ROC

from sklearn.metrics import confusion_matrix, accuracy_score, roc_auc_score, roc_curve
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

def plot_ROC(y_train_true, y_train_prob, y_test_true, y_test_prob):
    '''
    a funciton to plot the ROC curve for train labels and test labels.
    Use the best threshold found in train set to classify items in test set.
    '''
    fpr_train, tpr_train, thresholds_train = roc_curve(y_train_true, y_train_prob, pos_label =True)
    sum_sensitivity_specificity_train = tpr_train + (1-fpr_train)
    best_threshold_id_train = np.argmax(sum_sensitivity_specificity_train)
    best_threshold = thresholds_train[best_threshold_id_train]
    best_fpr_train = fpr_train[best_threshold_id_train]
    best_tpr_train = tpr_train[best_threshold_id_train]
    y_train = y_train_prob > best_threshold

    cm_train = confusion_matrix(y_train_true, y_train)
    acc_train = accuracy_score(y_train_true, y_train)
    auc_train = roc_auc_score(y_train_true, y_train)

    print 'Train Accuracy: %s ' %acc_train
    print 'Train AUC: %s ' %auc_train
    print 'Train Confusion Matrix:'
    print cm_train

    fig = plt.figure(figsize=(10,5))
    ax = fig.add_subplot(121)
    curve1 = ax.plot(fpr_train, tpr_train)
    curve2 = ax.plot([0, 1], [0, 1], color='navy', linestyle='--')
    dot = ax.plot(best_fpr_train, best_tpr_train, marker='o', color='black')
    ax.text(best_fpr_train, best_tpr_train, s = '(%.3f,%.3f)' %(best_fpr_train, best_tpr_train))
    plt.xlim([0.0, 1.0])
    plt.ylim([0.0, 1.0])
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('ROC curve (Train), AUC = %.4f'%auc_train)

    fpr_test, tpr_test, thresholds_test = roc_curve(y_test_true, y_test_prob, pos_label =True)

    y_test = y_test_prob > best_threshold

    cm_test = confusion_matrix(y_test_true, y_test)
    acc_test = accuracy_score(y_test_true, y_test)
    auc_test = roc_auc_score(y_test_true, y_test)

    print 'Test Accuracy: %s ' %acc_test
    print 'Test AUC: %s ' %auc_test
    print 'Test Confusion Matrix:'
    print cm_test

    tpr_score = float(cm_test[1][1])/(cm_test[1][1] + cm_test[1][0])
    fpr_score = float(cm_test[0][1])/(cm_test[0][0]+ cm_test[0][1])

    ax2 = fig.add_subplot(122)
    curve1 = ax2.plot(fpr_test, tpr_test)
    curve2 = ax2.plot([0, 1], [0, 1], color='navy', linestyle='--')
    dot = ax2.plot(fpr_score, tpr_score, marker='o', color='black')
    ax2.text(fpr_score, tpr_score, s = '(%.3f,%.3f)' %(fpr_score, tpr_score))
    plt.xlim([0.0, 1.0])
    plt.ylim([0.0, 1.0])
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('ROC curve (Test), AUC = %.4f'%auc_test)
    plt.savefig('ROC', dpi = 500)
    plt.show()

    return best_threshold

Answer 12

当您还需要概率时...以下获取 AUC 值并一次将其全部绘制出来。

from sklearn.metrics import plot_roc_curve

plot_roc_curve(m,xs,y)

当你有概率时......你无法一次性获得 auc 值和绘图。执行以下操作：

from sklearn.metrics import roc_curve

fpr,tpr,_ = roc_curve(y,y_probas)
plt.plot(fpr,tpr, label='AUC = ' + str(round(roc_auc_score(y,m.oob_decision_function_[:,1]), 2)))
plt.legend(loc='lower right')

Answer 13

在我的代码中，我有 X_train 和 y_train，类是 0 和 1。

clf.predict_proba()

方法计算每个数据点的两个类的概率。我用不同的阈值比较 class1 的概率。

probability = clf.predict_proba(X_train) 

def plot_roc(y_train, probability):
  threshold_values = np.linspace(0,1,100)       #Threshold values range from 0 to 1
  FPR_list = []
  TPR_list = []

  for threshold in threshold_values:            #For every value of threshold
    y_pred = []                                 #Classify every data point in the test set

#prob is an array consisting of 2 values - Probability of datapoint in Class0 and Class1.
    for prob in probability:
      if ((prob[1])<threshold):                 #Prob of class1 (positive class) 
        y_pred.append(0)                                                  
        continue
      elif ((prob[1])>=threshold): y_pred.append(1)

#Plot Confusion Matrix and Obtain values of TP, FP, TN, FN
    c_m = confusion_matrix(y, y_pred)           
    TN = c_m[0][0]                                                          
    FP = c_m[0][1]
    FN = c_m[1][0]      
    TP = c_m[1][1]                                                      

    FPR = FP/(FP + TN)                          #Obtain False Positive Rate                                          
    TPR = TP/(TP + FN)                          #Obtain True Positive Rate                                      

    FPR_list.append(FPR)
    TPR_list.append(TPR)

  fig = plt.figure()
  plt.plot(FPR_list, TPR_list)                                    
  plt.ylabel('TPR')
  plt.xlabel('FPR')
  plt.show()

Answer 14

A new open-source I help maintain 有很多方法可以测试模型性能。要查看 ROC 曲线，您可以执行以下操作：

from deepchecks.checks import RocReport
from deepchecks import Dataset

RocReport().run(Dataset(df, label='target'), model)

结果是这样的：可以在here

中找到更详细的 RocReport 示例

Answer 15

由于 ROC 曲线仅适用于二元分类然后使用你的数据二值化和 raveled

# Binarize data for getting AUC 
y_test_bin = label_binarize(y_test, classes=range(y_train.min() , y_train.max())) 
y_pred_bin = label_binarize(Predicted_result, classes=range(y_train.min() , y_train.max()))

# Calculate FP , TP rate
fpr, tpr, _ = roc_curve(y_test_bin.ravel(), y_pred_bin.ravel()  )

# Get AUC , 
auc = roc_auc_score(y_test_bin, y_pred_bin, average='micro', multi_class='ovr')
 
#create ROC curve
plt.plot(fpr,tpr , label= f"AUC = {auc}" , )
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0]) 
plt.title('ROC')
plt.legend(loc=7)
plt.figure(figsize = [])

plt.show()

Answer 16

如 w3Schools 所写在这里：

import matplotlib.pyplot as plt

def plot_roc_curve(true_y, y_prob):
    """
    plots the roc curve based of the probabilities
    """

    fpr, tpr, thresholds = roc_curve(true_y, y_prob)
    plt.plot(fpr, tpr)
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')

plot_roc_curve(y, y_proba)
print(f'model AUC score: {roc_auc_score(y, y_proba)}')

如何在 Python 中绘制 ROC 曲线

问题描述投票：0回答：16

16个回答

使用 matplotlib 进行二元分类的 AUC 曲线

加载乳腺癌数据集

拆分数据集

型号

准确性

AUC 曲线

二元分类

最新问题

如何在 Python 中绘制 ROC 曲线

问题描述 投票：0回答：16

16个回答

使用 matplotlib 进行二元分类的 AUC 曲线

加载乳腺癌数据集

拆分数据集

型号

准确性

AUC 曲线

二元分类

最新问题

问题描述投票：0回答：16