这条ROC曲线有意义吗？

Question

此代码返回并绘制真实阳性率，假阳性率，真阳性计数，基于预测值和真值的假阳性计数：

def get_all_stats(y_true , y_pred) : 

    def perf_measure(y_true, y_pred):

        TP = 0
        FP = 0
        TN = 0
        FN = 0

        for i in range(len(y_true)): 
            if y_true[i] == 1 and y_pred[i] == 1:
                TP += 1
            if y_pred[i]==1 and y_true[i]!=y_pred[i]:
                FP += 1
            if y_true[i]== 0 and y_pred[i]==0:
                TN += 1
            if y_pred[i]==0 and y_true[i] != y_pred[i]:
                FN += 1

        if(FP == 0) : 
            FPR = 0;
        else : 
            FPR = FP / (FP + TN)

        if(TP == 0) : 
            TPR = 0
        else : 
            TPR = TP / (TP + FN)

        return(TN , FPR, FN , TPR , TP , FP)

    tn, fpr, fn, tpr, tp , fp = perf_measure(y_true, y_pred)

    return tpr , fpr , tp , fp

tpr1 , fpr1 , tp1 , fp1 = get_all_stats(y_true=[1,1,1] , y_pred=[1,0,0])
tpr2 , fpr2 , tp2 , fp2 = get_all_stats(y_true=[1,0,1] , y_pred=[0,1,0])
tpr3 , fpr3 , tp3 , fp3 = get_all_stats(y_true=[0,0,0] , y_pred=[1,0,0])

plt.figure(figsize=(12,6))
plt.tick_params(labelsize=12)

print(tpr1 , fpr1 , tp1 , fp1)
print(tpr2 , fpr2 , tp2 , fp2)
print(tpr3 , fpr3 , tp3 , fp3)

plt.plot([fpr1,fpr2,fpr3], [tpr1 , tpr2, tpr3], color='blue', label='')
plt.ylabel("TPR",fontsize=16)
plt.xlabel("FPR",fontsize=16)
plt.legend()

产生的ROC图是：

为了模仿三种不同的假阳性和真阳性率，不同的阈值通过三次实现get_all_stats函数来计算这些值。

tpr1 , fpr1 , tp1 , fp1 = get_all_stats(y_true=[1,1,1] , y_pred=[1,0,0])
tpr2 , fpr2 , tp2 , fp2 = get_all_stats(y_true=[1,0,1] , y_pred=[0,1,0])
tpr3 , fpr3 , tp3 , fp3 = get_all_stats(y_true=[0,0,0] , y_pred=[1,0,0])

有9个实例被分类为1或0，其中真值为：[1,1,1,1,0,1,0,0,0]

在阈值1处，预测值是[1,0,0]，其中在该阈值处的真值是[1,1,1]。

在阈值2处，预测值是[0,1,0]，其中在该阈值处的真值是[1,0,1]。

在阈值3处，预测值是[1,0,0]，其中在该阈值处的真值是[0,0,0]。

可以看出，生成的分类器生成的图与“典型的”ROC曲线不同：

当它首先下降然后误报和真正的正率降低导致线“向后移动”。我是否正确实施了ROC曲线？可以为此曲线计算AUC吗？

Answer 1

好的，有动力去帮助，因为你有很多代表 - >帮助了很多其他人。开始了。

这条ROC曲线没有意义。问题是您只在不同阈值的数据子集上计算FPR / TPR。在每个阈值处，您应该使用所有数据来计算FPR和TPR。因此，你的情节似乎有3分，但你应该只有一点与fazxswpoi和y_true = [1,1,1,1,0,1,0,0,0]的FPR / TPR。但是，为了确保您具有实际的ROC曲线，您还不能仅仅在不同的阈值处组成y_pred = [1,0,0,0,1,0,1,0,0]值 - 这些值需要来自实际预测的概率，然后进行适当的阈值处理。我修改了你的代码，因为我喜欢使用y_pred;这是你如何计算ROC曲线。

numpy

# start with the true labels, as you did y_true = np.array([1, 1, 1, 1, 0, 1, 0, 0, 0]) # and a predicted probability of each being a "1" # I just used random numbers for these, but you would get them # from your classifier predictions = np.array([ 0.07485627, 0.72546085, 0.60287482, 0.90537829, 0.75789236, 0.01852192, 0.85425979, 0.36881312, 0.63893516 ]) # now define a set of thresholds (the more thresholds, the better # the curve will look). There's a smarter way to do this in practice # (you can sort the predicted probabilities and just have one threshold # between each), but this is just to help with understanding thresholds = np.linspace(0, 1, 11) # 0.1, 0.2, ..., 1.0 fprs = [] tprs = [] # we can precompute which inputs are actually 1s/0s and how many of each true_1_idx = np.where(y_true == 1)[0] true_0_idx = np.where(y_true == 0)[0] n_true_1 = len(true_1_idx) n_true_0 = len(true_0_idx) for threshold in thresholds: # now, for each threshold, we use that on the underlying probabilities # to get the actual predicted classes pred_classes = predictions >= threshold # and compute FPR/TPR from those tprs.append((pred_classes[true_1_idx] == 1).sum() / n_true_1) fprs.append((pred_classes[true_0_idx] == 1).sum() / n_true_0) plt.figure(figsize=(12,6)) plt.tick_params(labelsize=12) plt.plot(fprs, tprs, color='blue') plt.ylabel("TPR",fontsize=16) plt.xlabel("FPR",fontsize=16)

请注意，随着FPR（x轴）的增加，ROC曲线在TPR（y轴）中始终不递减;也就是说，当你向右移动时，它会上升。从阈值处理的工作原理可以清楚地看出这一点。在阈值0处，所有预测都是“1”，因此我们有FPR = TPR = 1.增加阈值给出的预测值“1”更少，因此FPR和TPR只能保持相同或减少。

请注意，即使我们使用了最佳阈值，由于我们的数据量有限，因此曲线中仍然存在跳跃，因此我们可以通过任何阈值获得有限数量的不同TPR / FPR对。但是，如果你有足够的数据，那么这开始看起来很平滑。在这里，我已经在上面的代码中替换了几行，以获得更平滑的情节：

n_points = 1000 y_true = np.random.randint(0, 2, size=n_points) predictions = np.random.random(n_points) thresholds = np.linspace(0, 1, 1000)

如果不清楚，AUC为0.5是最差的，你可以看到我们用随机“预测”得到的结果。如果您的AUC低于0.5，您可以将每个预测翻转为优于0.5（并且您的模型/培训可能出现问题）。

如果你真的想在实践中绘制一个ROC曲线，不仅仅是自己编写以学习更多，请使用sklearn的。他们还有roc_curve为您获得AUC。

这条ROC曲线有意义吗？

问题描述投票：0回答：1

1个回答

最新问题

这条ROC曲线有意义吗？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1