我想知道为什么当我在 tidymodels 中使用 ranger 时没有得到相同的模型与 ranger 直接?
这是一个可重现的示例:
library(tidymodels)
library(ranger)
# load data
data("iris")
train <- iris |> slice_sample(prop = 0.7)
test <- iris |> anti_join(train)
# rf model specs
rf_mod <-
rand_forest(trees = 10) |>
set_engine("ranger", respect.unordered.factors = TRUE, probability = FALSE) |>
set_mode("classification")
# fit model using tidymodels
set.seed(100)
rf_mod |> fit(Species ~ ., data = train) # OOB=4.81%
# fit model using ranger directly
set.seed(100)
ranger(Species ~ ., data = train,
num.trees=10, respect.unordered.factors = TRUE, probability = FALSE) # OOB=5.77%
您会得到不同的模型,因为未设置
seed
参数。如果您为两种方式设置相同的种子,您将获得相同的模型拟合
library(tidymodels)
library(ranger)
# load data
data("iris")
train <- iris |> slice_sample(prop = 0.7)
test <- iris |> anti_join(train)
#> Joining with `by = join_by(Sepal.Length, Sepal.Width, Petal.Length,
#> Petal.Width, Species)`
# rf model specs
rf_mod <-
rand_forest(trees = 10) |>
set_engine("ranger", respect.unordered.factors = TRUE, probability = FALSE,
seed = 1234) |>
set_mode("classification")
# fit model using tidymodels
set.seed(100)
rf_mod |> fit(Species ~ ., data = train)
#> parsnip model object
#>
#> Ranger result
#>
#> Call:
#> ranger::ranger(x = maybe_data_frame(x), y = y, num.trees = ~10, respect.unordered.factors = ~TRUE, probability = ~FALSE, seed = ~1234, num.threads = 1, verbose = FALSE)
#>
#> Type: Classification
#> Number of trees: 10
#> Sample size: 105
#> Number of independent variables: 4
#> Mtry: 2
#> Target node size: 1
#> Variable importance mode: none
#> Splitrule: gini
#> OOB prediction error: 2.88 %
# fit model using ranger directly
set.seed(100)
ranger(Species ~ ., data = train,
num.trees=10, respect.unordered.factors = TRUE, probability = FALSE,
seed = 1234)
#> Ranger result
#>
#> Call:
#> ranger(Species ~ ., data = train, num.trees = 10, respect.unordered.factors = TRUE, probability = FALSE, seed = 1234)
#>
#> Type: Classification
#> Number of trees: 10
#> Sample size: 105
#> Number of independent variables: 4
#> Mtry: 2
#> Target node size: 1
#> Variable importance mode: none
#> Splitrule: gini
#> OOB prediction error: 2.88 %
创建于 2023-11-15,使用 reprex v2.0.2