我想知道是否有办法在插入符号的train()
函数中指定哪个类的结果变量是正数。一个最小的例子:
# Settings
ctrl <- trainControl(method = "repeatedcv", number = 10, savePredictions = TRUE, summaryFunction = twoClassSummary, classProbs = TRUE)
# Data
data <- mtcars %>% mutate(am = factor(am, levels = c(0,1), labels = c("automatic", "manual"), ordered = T))
# Train
set.seed(123)
model1 <- train(am ~ disp + wt, data = data, method = "glm", family = "binomial", trControl = ctrl, tuneLength = 5)
# Data (factor ordering switched)
data <- mtcars %>% mutate(am = factor(am, levels = c(1,0), labels = c("manual", "automatic"), ordered = T))
# Train
set.seed(123)
model2 <- train(am ~ disp + wt, data = data, method = "glm", family = "binomial", trControl = ctrl, tuneLength = 5)
# Specifity and Sensitivity is switched
model1
model2
如果您运行代码,您会注意到两个模型中的“特性和敏感度”指标都已“切换”。看起来train()
函数将因子结果变量的第一级作为积极结果。有没有办法在函数本身中指定一个正类,所以无论结果因子排序如何,我都会得到相同的结果?我尝试添加positive = "manual"
,但这会导致错误。
问题不在于函数train()
,而在函数twoClassSummary
中,看起来像这样:
function (data, lev = NULL, model = NULL)
{
lvls <- levels(data$obs)
[...]
out <- c(rocAUC,
sensitivity(data[, "pred"], data[, "obs"],
lev[1]), # Hard coded positive class
specificity(data[, "pred"], data[, "obs"],
lev[2])) # Hard coded negative class
names(out) <- c("ROC", "Sens", "Spec")
out
}
这是一个较小的包装,所以我们可以解决它!它们传递给sensitivity()
和specificity()
的级别顺序在这里是硬编码的。要解决这个问题,你可以根据twoClassSummary()
编写自己的汇总函数。
sensitivity()
和specificity()
分别采用positive
和negative
级别名称(次优设计选择)。所以我们将这两个参数包含在自定义函数中。再往下,我们将这些参数传递给相应的函数来解决问题。
customTwoClassSummary <- function(data, lev = NULL, model = NULL, positive = NULL, negative=NULL)
{
lvls <- levels(data$obs)
if (length(lvls) > 2)
stop(paste("Your outcome has", length(lvls), "levels. The twoClassSummary() function isn't appropriate."))
caret:::requireNamespaceQuietStop("ModelMetrics")
if (!all(levels(data[, "pred"]) == lvls))
stop("levels of observed and predicted data do not match")
rocAUC <- ModelMetrics::auc(ifelse(data$obs == lev[2], 0,
1), data[, lvls[1]])
out <- c(rocAUC,
# Only change happens here!
sensitivity(data[, "pred"], data[, "obs"], positive=positive),
specificity(data[, "pred"], data[, "obs"], negative=negative))
names(out) <- c("ROC", "Sens", "Spec")
out
}
但是如何在不更改包中的更多代码的情况下指定这些选项?默认情况下,caret
不会将选项传递给摘要函数。我们在调用trainControl()
时将函数包装在一个匿名函数中:
ctrl <- trainControl(method = "repeatedcv", number = 10, savePredictions = TRUE,
# This is a trick how to fix arguments for a function call
summaryFunction = function(...) customTwoClassSummary(...,
positive = "manual", negative="automatic"),
classProbs = TRUE)
...
参数确保caret
传递给匿名函数的所有其他参数都传递给customTwoClassSummary()
。