带有 `rpart` 和 `caret` 的决策树,用于四分位数分解的目标变量

问题描述 投票:0回答:1

以下代码

library(rpart)
library(caret)
youdenSumary <- function(data, lev = NULL, model = NULL){
  if (length(lev) > 2) {
    stop(paste("Your outcome has", length(lev), "levels. The joudenSumary() function isn't appropriate."))
  }
  if (!all(levels(data[, "pred"]) == lev)) {
    stop("levels of observed and predicted data do not match")
  }
  Sens <- caret::sensitivity(data[, "pred"], data[, "obs"], lev[1]) 
  Spec <- caret::specificity(data[, "pred"], data[, "obs"], lev[2])
  j <- (Sens + Spec)/2
  out <- c(j, Spec, Sens)
  names(out) <- c("j", "Spec", "Sens")
  out
}



trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 20,
                       search = "grid",summaryFunction = youdenSumary)

classifier = train(x = training_set[, names(training_set) != "Target"],
                   y = training_set$Target,
                   method = 'rpart',
                   parms = list(split = "gini"),trControl=trctrl,
                   tuneLength = 10,metric = "j")
classifier
complexity_parameter=classifier$bestTune


folds = createFolds(dataset$Target, k = 10)
cv = lapply(folds, function(x) {
  training_fold = dataset[-x, ]
  test_fold = dataset[x, ]
  classifier = rpart(formula = Target ~ .,
                     data = training_fold,control = rpart.control(cp = complexity_parameter))
  y_pred = predict(classifier, newdata = test_fold[!(names(test_fold)%in%"Target")], type = 'class')
  # confrontiamo la variabile di target con i valori predetti
  cm = table(test_fold[, names(test_fold)%in%"Target"], y_pred)
  accuracy = (cm[1,1] + cm[2,2]) / (cm[1,1] + cm[2,2] + cm[1,2] + cm[2,1])
  sensitivity = cm[1,1] / (cm[1,1] + cm[2,1])
  specificity = cm[2,2] / (cm[1,2] + cm[2,2])
  df = data.frame(accuracy = accuracy, sensitivity=sensitivity,
                  specificity=specificity)
  return(df)
})
accuracy = Reduce("+", lapply(cv, "[[", 1))/10
sensitivity = Reduce("+", lapply(cv, "[[", 2))/10
specificity = Reduce("+", lapply(cv, "[[", 3))/10
balanced_accuracy=(sensitivity+specificity)/2

执行网格搜索以找到最佳参数。假设 method = "repeatedcv", number = 10 and repeats = 3,那么三个独立的 10 折交叉验证被用作重采样方案。

然后我们再次应用交叉验证以获得准确性、灵敏度等。

此代码是为二进制目标变量设计的。我如何将此代码调整为非二进制目标变量,例如目标变量分为四分位数(即 1、2、3、4)?

r decision-tree rpart
1个回答
0
投票

我想你可能需要重写很多。

该函数仅在 caret spec 和 sens 函数比较二元结果时起作用。这在多项式情况下不起作用。我会查看这个很棒的软件包https://github.com/WandeRum/multiROC,其中对此进行了更详细的介绍。

© www.soinside.com 2019 - 2024. All rights reserved.