R Ranger 库的交叉验证

问题描述 投票:0回答:1

您好,我有以下游侠型号:

X <- train_df[, -1]
y <- train_df$Price

rf_model <- ranger(Price ~ ., data = train_df, mtry = 11 ,splitrule = "extratrees" ,min.node.size = 1, num.trees =100)

我正在努力完成两件事,

  1. 给我一个平均性能指标,跨非交叉方差数据集进行交叉验证,并给我一个更稳定的准确度指标,尽管种子值发生变化
  2. 设置交叉验证以找到最佳的 mtry 和 num.trees 组合。

我尝试过的:

** 以下内容用于优化 mtry、splitrule 和 min.node.size,但我无法将树的数量添加到等式中,因为在这样做的情况下它会给我一个错误。 ** # 定义要搜索的参数网格 参数网格<- expand.grid(mtry = c(1:ncol(X)), splitrule = c( "variance", "extratrees", "maxstat"), min.node.size = c(1, 5, 10))

# set up the cross-validation scheme
cv_scheme <- trainControl(method = "cv",
                          number = 5,
                          verboseIter = TRUE)

# perform the grid search using caret
rf_model <- train(x = X,
                  y = y,
                  method = "ranger",
                  trControl = cv_scheme,
                  tuneGrid = param_grid)

# view the best parameter values
rf_model$bestTune
r cross-validation r-caret r-ranger
1个回答
0
投票

一个简单的方法是在

num.trees
中添加一个
train
参数并迭代该参数。

另一种方法是创建您的自定义模型,请参阅本章使用您自己的模型

Pham Dinh Khanh 的一篇 RPubs 论文证明了这里

library(caret)
library(mlbench)
library(ranger)
data(PimaIndiansDiabetes)
x=PimaIndiansDiabetes[,-ncol(PimaIndiansDiabetes)]
y=PimaIndiansDiabetes[,ncol(PimaIndiansDiabetes)]

param_grid=expand.grid(mtry = c(1:4),
                       splitrule = c( "variance", "extratrees"),
                       min.node.size = c(1, 5))
cv_scheme <- trainControl(method = "cv",
                          number = 5,
                          verboseIter = FALSE)
models=list()
for (ntree in c(4,100)){
set.seed(123)
rf_model <- train(x = x,
                  y = y,
                  method = "ranger",
                  trControl = cv_scheme,
                  tuneGrid = param_grid,
                  num.trees=ntree)
name=paste0(ntree,"_tr_model")
models[[name]]=rf_model
}

models[["4_tr_model"]]
#> Random Forest 
#> 
#> 768 samples
#>   8 predictor
#>   2 classes: 'neg', 'pos' 
#> 
#> No pre-processing
#> Resampling: Cross-Validated (5 fold) 
#> Summary of sample sizes: 614, 615, 614, 615, 614 
#> Resampling results across tuning parameters:
#> 
#>   mtry  splitrule   min.node.size  Accuracy   Kappa    
#>   1     variance    1                    NaN        NaN
#>   1     variance    5                    NaN        NaN
#>   1     extratrees  1              0.6808675  0.2662428
#>   1     extratrees  5              0.6783125  0.2618862
...

models[["100_tr_model"]]
#> Random Forest 
...
#> 
#>   mtry  splitrule   min.node.size  Accuracy   Kappa    
#>   1     variance    1                    NaN        NaN
#>   1     variance    5                    NaN        NaN
#>   1     extratrees  1              0.7473559  0.3881530
#>   1     extratrees  5              0.7564808  0.4112127
...

创建于 2023-04-19 与 reprex v2.0.2

© www.soinside.com 2019 - 2024. All rights reserved.