为什么 tidymodels 中的 ranger 为直接调用 ranger 提供了不同的模型?

问题描述 投票:0回答:1

我想知道为什么当我在 tidymodels 中使用 ranger 时没有得到相同的模型 ranger 直接?

这是一个可重现的示例:

library(tidymodels)
library(ranger)

# load data
data("iris")
train <- iris |> slice_sample(prop = 0.7)
test <- iris |> anti_join(train)

# rf model specs
rf_mod <- 
  rand_forest(trees = 10)  |>  
  set_engine("ranger", respect.unordered.factors = TRUE, probability = FALSE) |> 
  set_mode("classification")

# fit model using tidymodels
set.seed(100)
rf_mod |> fit(Species ~ ., data = train) # OOB=4.81%

# fit model using ranger directly
set.seed(100)
ranger(Species ~ ., data = train, 
       num.trees=10, respect.unordered.factors = TRUE, probability = FALSE) # OOB=5.77%
r random-forest tidymodels
1个回答
1
投票

您会得到不同的模型,因为未设置

seed
参数。如果您为两种方式设置相同的种子,您将获得相同的模型拟合

library(tidymodels)
library(ranger)

# load data
data("iris")
train <- iris |> slice_sample(prop = 0.7)
test <- iris |> anti_join(train)
#> Joining with `by = join_by(Sepal.Length, Sepal.Width, Petal.Length,
#> Petal.Width, Species)`

# rf model specs
rf_mod <- 
  rand_forest(trees = 10)  |>  
  set_engine("ranger", respect.unordered.factors = TRUE, probability = FALSE, 
             seed = 1234) |> 
  set_mode("classification")

# fit model using tidymodels
set.seed(100)
rf_mod |> fit(Species ~ ., data = train)
#> parsnip model object
#> 
#> Ranger result
#> 
#> Call:
#>  ranger::ranger(x = maybe_data_frame(x), y = y, num.trees = ~10,      respect.unordered.factors = ~TRUE, probability = ~FALSE,      seed = ~1234, num.threads = 1, verbose = FALSE) 
#> 
#> Type:                             Classification 
#> Number of trees:                  10 
#> Sample size:                      105 
#> Number of independent variables:  4 
#> Mtry:                             2 
#> Target node size:                 1 
#> Variable importance mode:         none 
#> Splitrule:                        gini 
#> OOB prediction error:             2.88 %

# fit model using ranger directly
set.seed(100)
ranger(Species ~ ., data = train, 
       num.trees=10, respect.unordered.factors = TRUE, probability = FALSE, 
       seed = 1234)
#> Ranger result
#> 
#> Call:
#>  ranger(Species ~ ., data = train, num.trees = 10, respect.unordered.factors = TRUE,      probability = FALSE, seed = 1234) 
#> 
#> Type:                             Classification 
#> Number of trees:                  10 
#> Sample size:                      105 
#> Number of independent variables:  4 
#> Mtry:                             2 
#> Target node size:                 1 
#> Variable importance mode:         none 
#> Splitrule:                        gini 
#> OOB prediction error:             2.88 %

创建于 2023-11-15,使用 reprex v2.0.2

© www.soinside.com 2019 - 2024. All rights reserved.