我们有大约200个预测表,如下所示200次迭代,其中一些有AUC = 1
,其余的AUC < 1
使用AUC = 1
平滑roc曲线会产生误差,其中使用AUC < 1
平滑ROC曲线不会产生误差。
题:
如何跳过使用AUC = 1
的ROC曲线的平滑,并且当我们有200个预测表时,用R中的AUC < 1
平滑剩余的剩余曲线。
尝试过的工作:
尝试过基于ifelse()
的AUC < 1
,但它正确运行了几次迭代,但每次都失败了。请帮忙。
AUC <1的预测表
平滑时不会出错。
library(pROC)
roc_tab <- read.table(header = TRUE, text = "
pred A B obs Resample rowIndex
A 1.0 0.0 A Fold01.Rep1 1
A 0.9 0.1 B Fold01.Rep1 20
B 0.4 0.6 B Fold01.Rep1 23
A 0.6 0.4 A Fold02.Rep1 6
B 0.3 0.7 B Fold02.Rep1 17
A 0.7 0.3 B Fold02.Rep1 27
A 0.9 0.1 A Fold03.Rep1 2
B 0.1 0.9 A Fold03.Rep1 3
B 0.2 0.8 B Fold03.Rep1 19
A 0.9 0.1 B Fold03.Rep1 29
")
train_roc <- roc(roc_tab$obs, roc_tab$A, positive = "A", na.rm = TRUE,
direction = ">", auc = TRUE)
train_roc
#
# Call:
# roc.default(response = roc_tab$obs, predictor = roc_tab$A, na.rm = TRUE, direction = ">", auc = TRUE, positive = "A")
#
# Data: roc_tab$A in 4 controls (roc_tab$obs A) > 6 cases (roc_tab$obs B).
# Area under the curve: 0.5833
train_roc_smooth <- roc(roc_tab$obs, roc_tab$A, positive = "A", smooth = TRUE, na.rm = TRUE, direction = ">", auc = TRUE)
train_roc_smooth
#
# Call:
# roc.default(response = roc_tab$obs, predictor = roc_tab$A, na.rm = TRUE, direction = ">", smooth = TRUE, auc = TRUE, positive = "A")
#
# Data: roc_tab$A in 4 controls (roc_tab$obs A) > 6 cases (roc_tab$obs B).
# Smoothing: binormal
# Area under the curve: 0.6093
AUC = 1的预测表
当我们试图平滑时给出错误,你可以在最后看到。
# roc_tab1 <- read.table(header = TRUE, text = "
# pred A B obs Resample rowIndex
# 1 A 1.0 0.0 A Fold01.Rep1 1
# 2 A 1.0 0.0 A Fold01.Rep1 20
# 3 B 0.0 1.0 B Fold01.Rep1 23
# 4 A 1.0 0.0 A Fold02.Rep1 6
# 5 B 0.0 1.0 B Fold02.Rep1 17
# 6 B 0.0 1.0 B Fold02.Rep1 27
# 7 A 1.0 0.0 A Fold03.Rep1 2
# 8 A 1.0 0.0 A Fold03.Rep1 3
# 9 B 0.0 1.0 B Fold03.Rep1 19
# 10 B 0.0 1.0 B Fold03.Rep1 29
")
train_roc <- roc(roc_tab1$obs, roc_tab1$A, positive = "A", na.rm = TRUE, direction = ">", auc = TRUE)
train_roc
#
# Call:
# roc.default(response = roc_tab1$obs, predictor = roc_tab1$A, na.rm = TRUE, direction = ">", auc = TRUE, positive = "A")
#
# Data: roc_tab1$A in 5 controls (roc_tab1$obs A) > 5 cases (roc_tab1$obs B).
# Area under the curve: 1
train_roc_smooth <- roc(roc_tab1$obs, roc_tab1$A, positive = "A", smooth = TRUE, na.rm = TRUE, direction = ">", auc = TRUE)
#
# Error in smooth.roc.binormal(roc, n) :
# ROC curve not smoothable (not enough points).
任何帮助都非常感谢。谢谢!
您可以编写一个函数,该函数根据AUC值返回平滑或不平滑的结果。从编程的角度来看,下面的代码将为您完成工作。但是,从统计的角度来看,我会质疑AUC值为1。
get_auc <- function(df) {
result <- roc(df$obs, df$A, positive = "A", na.rm = TRUE, direction = ">", auc = TRUE)
# check if smoothing needed or not
if(result$auc == 1) {
return(1) # if you are only interested in AUC
# return(result) # if you are intersted in the entire result list returned by roc
} else {
smoothed_result <- roc(df$obs, df$A, positive = "A", smooth = T, na.rm = TRUE, direction = ">", auc = TRUE)
return(as.numeric(smoothed_result$auc)) # if you are only interested in AUC
# return(smoothed_result) # if you are intersted in the entire result list returned by roc
}
}
get_auc(roc_tab)
# [1] 0.6093235
get_auc(roc_tab1)
# [1] 1
## if you want to loop through 200 data frames that you have
# create a list of data frames
dfs <- list(roc_tab, roc_tab1)
# to store AUC values, form a variable with the same length of your list
n <- length(dfs)
AUC <- replicate(n, NA)
# loop throuh the list and extract respective AUC values
for(i in 1:n) {
AUC[i] <- get_roc(dfs[[i]])
}
# list of AUC values
AUC
# [1] 0.6093235 1.0000000
在计算roc之后进行平滑,然后解决方案是分两步进行调整,而不是使用函数roc
的参数。
train_roc <- roc(roc_tab$obs, roc_tab$A, positive = "A", na.rm = TRUE,
direction = ">", auc = TRUE)
train_roc_smooth <- if(train_roc$auc < 1) smooth(train_roc) else train_roc