问题是我已经运行了几个 xgboost 调整过程,但只以文本格式保存结果,或更准确地说是元数据,其中保存了模型参数和性能。
它具有以下结构:
str(p)
'data.frame': 130 obs. of 10 variables:
$ mtry : int 922 1046 512 1317 675 1303 518 1029 1345 1180 ...
$ min_n : int 34 36 73 89 91 32 73 52 75 93 ...
$ tree_depth : int 44 33 43 37 34 48 25 19 38 41 ...
$ learn_rate : num 0.0236 0.0257 0.0292 0.0254 0.0271 0.023 0.025 0.0226 0.0281 0.0641 ...
$ loss_reduction: num 0.0268 0.745 0.148 0.171 0.0275 ...
$ sample_size : num 0.967 0.947 0.789 0.825 0.973 0.521 0.798 0.813 0.993 0.959 ...
$ .metric : chr "mn_log_loss" "mn_log_loss" "mn_log_loss" "mn_log_loss" ...
$ .estimator : chr "binary" "binary" "binary" "binary" ...
$ mean : num 0.423 0.424 0.424 0.424 0.424 0.425 0.425 0.426 0.427 0.427 ...
$ std_err : num 0.000382 0.000439 0.000408 0.000344 0.000368 0.000407 0.000386 0.000398 0.000392 0.000441 ...
现在我想使用元数据作为tune_bayes操作的初始值:
Error in check_initial():
! initial should be a positive integer or the results of [tune_grid()]
Run rlang::last_trace() to see where the error occurred.
如何将其转换为匹配的格式,而无需重新运行耗时的计算?
tune_grid 结果如下:
Tuning results
5-fold cross-validation using stratification
A tibble: 5 × 4
splits id .metrics .notes
1 <split [843580/210897]> Fold1 <tibble [1 × 10]> <tibble [0 × 3]>
2 <split [843582/210895]> Fold2 <tibble [1 × 10]> <tibble [0 × 3]>
3 <split [843582/210895]> Fold3 <tibble [1 × 10]> <tibble [0 × 3]>
4 <split [843582/210895]> Fold4 <tibble [1 × 10]> <tibble [0 × 3]>
5 <split [843582/210895]> Fold5 <tibble [1 × 10]> <tibble [0 × 3]>
这是tune_grid的文档: https://github.com/tidymodels/tune/blob/main/R/tune_grid.R 它并没有让我走得太远。
谢谢!
很难给你一个准确的答案,因为没有代码,但我怀疑问题是
mtry
参数不知道上限是多少(因为它基于数据中预测变量的数量)。网格搜索可以解决这个问题,但贝叶斯优化需要您设置它。
您可以从模型规范中获取参数信息,设置
mtry
的范围,然后使用调整对象(不是数据框)作为 tune_bayes()
的输入以及参数信息。
这是一个例子:
library(tidymodels)
set.seed(1)
sim_tr <- sim_regression(250)
sim_rs <- vfold_cv(sim_tr)
xgb_spec <-
boost_tree(mtry = tune(), min_n = tune(), trees = 20) %>%
set_mode("regression")
set.seed(2)
initial_res <-
xgb_spec %>%
tune_grid(
outcome ~ .,
resamples = sim_rs,
grid = 10
)
#> i Creating pre-processing data to finalize unknown parameter: mtry
# Use the tune object `initial_res` as the input
# Set parameter range for mtry:
xgb_param <-
xgb_spec %>%
extract_parameter_set_dials()
# See 'Model parameters needing finalization:' below
xgb_param
#> Collection of 2 parameters for tuning
#>
#> identifier type object
#> mtry mtry nparam[?]
#> min_n min_n nparam[+]
#>
#> Model parameters needing finalization:
#> # Randomly Selected Predictors ('mtry')
#>
#> See `?dials::finalize` or `?dials::update.parameters` for more information.
xgb_param <-
xgb_param %>%
update(mtry = mtry(c(1, 20)))
set.seed(3)
bayes_res <-
xgb_spec %>%
tune_bayes(
outcome ~ .,
resamples = sim_rs,
initial = initial_res, # <- tune object
iter = 4,
# Provide parameter information:
param_info = xgb_param
)
show_best(bayes_res)
#> Warning: No value of `metric` was given; metric 'rmse' will be used.
#> # A tibble: 5 × 9
#> mtry min_n .metric .estimator mean n std_err .config .iter
#> <int> <int> <chr> <chr> <dbl> <int> <dbl> <chr> <int>
#> 1 18 9 rmse standard 16.3 10 1.01 Preprocessor1_Model04 0
#> 2 13 5 rmse standard 16.8 10 0.935 Preprocessor1_Model09 0
#> 3 16 10 rmse standard 16.9 10 0.940 Iter2 2
#> 4 20 14 rmse standard 17.0 10 1.26 Iter1 1
#> 5 12 16 rmse standard 17.3 10 1.13 Preprocessor1_Model02 0
创建于 2024-04-23,使用 reprex v2.1.0