我想创建一个具有“用户定义权重”的集成模型。 如果我使用
tidymodels
创建多个子模型,我想生成一个对每个子模型赋予相同权重的最终模型。包 stacks
非常适合生成更优化的权重...但有时我只想对每个子模型赋予相同的权重。另外... stacks
很棒,因为我可以使用“堆叠”模型对象和 DALEXtra
包来帮助解释最终的集成模型。
这是我正在做的事情的一个例子。
## load in packages
library(tidymodels)
library(stacks)
library(DALEXtra)
# get a sample of the ames dataset
set.seed(1)
df <- ames %>%
sample_n(500)
# some setup: resampling and a basic recipe
set.seed(1)
df_splits <- initial_split(df)
df_train <- training(df_splits)
df_test <- testing(df_splits)
set.seed(1)
df_folds <- vfold_cv(df_train, v = 4)
rec_small <- recipe(Sale_Price ~ Gr_Liv_Area, data = df)
rec_big <- recipe(Sale_Price ~ BsmtFin_SF_1 + First_Flr_SF + Second_Flr_SF, data = df)
# setting up my one model type
rand_forest_ranger_spec <-
rand_forest() %>%
set_engine('ranger') %>%
set_mode('regression')
# setting up my one workflow set of my two recipes and one model type
wf_rfs <-
workflow_set(
preproc = list(rec_small,
rec_big),
models = list(rf = rand_forest_ranger_spec)
)
# estimating my two random forest models
grid_ctrl <-
control_grid(
save_pred = TRUE,
parallel_over = "everything",
save_workflow = TRUE
)
grid_results <-
wf_rfs %>%
workflow_map(
seed = 1503,
resamples = df_folds,
control = grid_ctrl
)
# setting up our stacking
stacks()
df_st <-
stacks() %>%
add_candidates(grid_results)
set.seed(1)
df_model_st <-
df_st %>%
blend_predictions()
# looking at final estimated model
df_model_st$equations$numeric
#### i got
#### -42148.1667470673 + (recipe_1_rf_1_1 * 0.13109783287876) + (recipe_2_rf_1_1 * 1.08833216052151)
#### but what want something like user defined values
#### 0 + (rec_simple_rf_1_1 * .5) + (rec_big_rf_1_1 * .5)
我可以继续使用这个
stacks
模型,并使用 DALEXtra
来帮助解释这个 stacks
集成模型以及一些全局模型解释...有点像这样...
# Fit an ensemble model using that stacks
df_model_st_fitted <-
df_model_st %>%
fit_members()
# I want to be able to use the cool DALEX tools to explain a user-defined weighted ensemble model
vip_features <- c("Gr_Liv_Area", "BsmtFin_SF_1", "First_Flr_SF", "Second_Flr_SF")
vip_train <-
df %>%
select(all_of(vip_features))
# Setting up the explainer
explainer_blended_rf <-
explain_tidymodels(
df_model_st_fitted,
data = vip_train,
y = df$Sale_Price,
label = "Blended Random Forest",
verbose = FALSE
)
# using the explainer to produce a VIP
vip_example <-
explain_tidymodels(
df_model_st_fitted,
data = vip_train,
y = df$Sale_Price,
label = "Blended RF",
verbose = FALSE
) %>%
model_parts()
plot(vip_example)
#using the explainer to produce AL plots
al_rf <- model_profile(explainer = explainer_blended_rf,
type = "accumulated",
variables = names(vip_train)
)
plot(al_rf) +
ggtitle("Accumulated-local profiles")
总之......我喜欢
stacks
,它既可以创建权重,又可以创建模型对象,稍后可以将其用作 tidymodel。但是,我不想要 stacks
创建的权重,我想创建自己的权重。我不知道我是否应该在stacks
内做一些事情来创建我想要的权重。或者...如果我根本不应该打扰stacks
,因为我已经知道我想要的重量了。但是...我不知道如何像stacks
那样创建一个集成模型,以便以后像 tidymodel 一样使用。
这里的一种方法是手动获取每个模型的预测,并获取一个向量,计算存储在结果标题上的列表列中的每个预测值的平均值。
类似这样的:
reduce(results$.pred, \(x, y) x + y) / nrow(results)
要获取堆栈的重要性,在 vip 包中,您可以使用自定义包装器。