从ROC曲线获取阈值

Question

我有一些模型，在预测类别百分比的向量上使用

ROCR

包，我有一个性能对象。绘制具有规范“tpr”、“fpr”的性能对象给我一条 ROC 曲线。

我正在比较某些误报率 (x) 阈值下的模型。我希望从性能对象中获得真阳性率 (y) 的值。更重要的是，我想获得用于生成该点的班级百分比阈值。

最接近阈值但不高于阈值的假阳性率（

x-value

）的索引号，应该给我适当的真阳性率（

y-value

）的索引号。我不确定如何获得该索引值。

更重要的是，我如何获得用于说明这一点的类别概率的阈值？

Answer 1

这就是为什么

str

是我最喜欢的 R 函数：

library(ROCR)
data(ROCR.simple)
pred <- prediction( ROCR.simple$predictions, ROCR.simple$labels)
perf <- performance(pred,"tpr","fpr")
plot(perf)
> str(perf)
Formal class 'performance' [package "ROCR"] with 6 slots
  ..@ x.name      : chr "False positive rate"
  ..@ y.name      : chr "True positive rate"
  ..@ alpha.name  : chr "Cutoff"
  ..@ x.values    :List of 1
  .. ..$ : num [1:201] 0 0 0 0 0.00935 ...
      ..@ y.values    :List of 1
      .. ..$ : num [1:201] 0 0.0108 0.0215 0.0323 0.0323 ...
  ..@ alpha.values:List of 1
  .. ..$ : num [1:201] Inf 0.991 0.985 0.985 0.983 ...

啊哈！这是一个 S4 类，所以我们可以使用

来访问插槽。以下是制作

data.frame

的方法：

cutoffs <- data.frame([email protected][[1]], [email protected][[1]], 
                      [email protected][[1]])
> head(cutoffs)
        cut         fpr        tpr
1       Inf 0.000000000 0.00000000
2 0.9910964 0.000000000 0.01075269
3 0.9846673 0.000000000 0.02150538
4 0.9845992 0.000000000 0.03225806
5 0.9834944 0.009345794 0.03225806
6 0.9706413 0.009345794 0.04301075

如果您有想要达到的 fpr 阈值，您可以将此

data.frame

子集以找到低于此 fpr 阈值的最大 tpr：

cutoffs <- cutoffs[order(cutoffs$tpr, decreasing=TRUE),]
> head(subset(cutoffs, fpr < 0.2))
          cut       fpr       tpr
96  0.5014893 0.1495327 0.8494624
97  0.4997881 0.1588785 0.8494624
98  0.4965132 0.1682243 0.8494624
99  0.4925969 0.1775701 0.8494624
100 0.4917356 0.1869159 0.8494624
101 0.4901199 0.1962617 0.8494624

Answer 2

Package

pROC

包括函数

coords

用于计算最佳阈值：

library(pROC)
my_roc <- roc(my_response, my_predictor)
coords(my_roc, "best", ret = "threshold")

Answer 3

2个基于

ROCR

和

pROC

包的解决方案：

threshold1 <- function(predict, response) {
    perf <- ROCR::performance(ROCR::prediction(predict, response), "sens", "spec")
    df <- data.frame(cut = [email protected][[1]], sens = [email protected][[1]], spec = [email protected][[1]])
    df[which.max(df$sens + df$spec), "cut"]
}
threshold2 <- function(predict, response) {
    r <- pROC::roc(response, predict)
    r$thresholds[which.max(r$sensitivities + r$specificities)]
}
data(ROCR.simple, package = "ROCR")
threshold1(ROCR.simple$predictions, ROCR.simple$labels)
#> [1] 0.5014893
threshold2(ROCR.simple$predictions, ROCR.simple$labels)
#> [1] 0.5006387

另见

OptimalCutpoints

包，它提供了许多算法来找到最佳阈值。

Answer 4

类似于@Artem的解决方案基本上，ROC 曲线中的最佳阈值是曲线的最宽部分，或者在保持最低 FPR 的同时给出最大 TPR 的点

所以也可以通过找到最宽的点或TPR和FPR之间最大分离的点来找到最佳阈值

下面是使用包 ROSE 的快速解决方案

library(ROSE)
library(data.table)
threshold_data<-roc.curve(df$response,my_predictor,plotit = TRUE)
#Get TPR, FPR and corresponding threshold from roc.curve function and convert to dataframe 
threshold_data<-data.frame(TPR = threshold_data$false.positive.rate,
                       FPR = threshold_data$true.positive.rate,
                       threshold = threshold_data$thresholds)


# TPR       FPR  threshold       sep
# 1.0000000000 1.0000000       -Inf 0.0000000
# 0.7474009553 0.9820701 0.03405027 0.2346691
# 0.5869626300 0.9478403 0.08923265 0.3608776
# 0.4003933689 0.8777506 0.17368989 0.4773572
# 0.2225344198 0.7571312 0.25101859 0.5345968
# 0.1441416128 0.6495518 0.33035935 0.5054101
# 0.0868221411 0.5281174 0.44915920 0.4412952
# 0.0261309357 0.3390383 0.57857430 0.3129074
# 0.0089912897 0.2257539 0.76554635 0.2167626
# 0.0008429334 0.1140994 0.93730006 0.1132565
# 0.0000000000 0.0000000        Inf 0.0000000

threshold_data<-setDT(threshold_data)
threshold_data[,sep:=abs(FPR-TPR)]
best_threshold<-threshold_data[sep==max(sep),threshold]
#0.2510185

#Same result with package pROC
library(pROC)
my_curve <- roc(df$my_response,my_predictor)
coords(my_curve, "best", ret = "threshold")
#0.2510185

Answer 5

来自Juilee的回答：

dataframe中TPR和FPR的定义有误。更正它们并重新发布相同的答案。

       threshold_data<-data.frame(FPR = 
                           threshold_data$false.positive.rate,
                   TPR = threshold_data$true.positive.rate, 
                   threshold = threshold_data$thresholds)

从ROC曲线获取阈值

问题描述投票：0回答：5

5个回答

最新问题

从ROC曲线获取阈值

问题描述 投票：0回答：5

5个回答

最新问题

问题描述投票：0回答：5