对集成模型使用“用户定义的权重”

问题描述 投票:0回答:1

我想创建一个具有“用户定义权重”的集成模型。 如果我使用

tidymodels
创建多个子模型,我想生成一个对每个子模型赋予相同权重的最终模型。包
stacks
非常适合生成更优化的权重...但有时我只想对每个子模型赋予相同的权重。另外...
stacks
很棒,因为我可以使用“堆叠”模型对象和
DALEXtra
包来帮助解释最终的集成模型。

这是我正在做的事情的一个例子。

## load in packages
library(tidymodels)
library(stacks)
library(DALEXtra)

# get a sample of the ames dataset
set.seed(1)
df <- ames %>% 
  sample_n(500)

# some setup: resampling and a basic recipe
set.seed(1)
df_splits <- initial_split(df)
df_train <- training(df_splits)
df_test  <- testing(df_splits)

set.seed(1)
df_folds <- vfold_cv(df_train, v = 4)

rec_small <- recipe(Sale_Price ~ Gr_Liv_Area, data = df)
rec_big <- recipe(Sale_Price ~ BsmtFin_SF_1 + First_Flr_SF + Second_Flr_SF, data = df)

# setting up my one model type
rand_forest_ranger_spec <-
  rand_forest() %>%
  set_engine('ranger') %>%
  set_mode('regression')

# setting up my one workflow set of my two recipes and one model type
wf_rfs <- 
  workflow_set(
    preproc = list(rec_small,
                   rec_big), 
    models = list(rf = rand_forest_ranger_spec)
    )

# estimating my two random forest models
grid_ctrl <-
  control_grid(
    save_pred = TRUE,
    parallel_over = "everything",
    save_workflow = TRUE
  )

grid_results <-
  wf_rfs %>%
  workflow_map(
    seed = 1503,
    resamples = df_folds,
    control = grid_ctrl
  )

# setting up our stacking
stacks()

df_st <- 
  stacks() %>%
  add_candidates(grid_results)

set.seed(1)
df_model_st <-
  df_st %>%
  blend_predictions()

# looking at final estimated model
df_model_st$equations$numeric
#### i got 
#### -42148.1667470673 + (recipe_1_rf_1_1 * 0.13109783287876) + (recipe_2_rf_1_1 * 1.08833216052151)
#### but what want something like user defined values 
#### 0 + (rec_simple_rf_1_1 * .5) + (rec_big_rf_1_1 * .5)

我可以继续使用这个

stacks
模型,并使用
DALEXtra
来帮助解释这个
stacks
集成模型以及一些全局模型解释...有点像这样...

# Fit an ensemble model using that stacks
df_model_st_fitted <-
  df_model_st %>% 
  fit_members()

# I want to be able to use the cool DALEX tools to explain a user-defined weighted ensemble model
vip_features <- c("Gr_Liv_Area", "BsmtFin_SF_1", "First_Flr_SF", "Second_Flr_SF")

vip_train <- 
  df %>% 
  select(all_of(vip_features))

# Setting up the explainer
explainer_blended_rf <- 
  explain_tidymodels(
    df_model_st_fitted, 
    data = vip_train, 
    y = df$Sale_Price,
    label = "Blended Random Forest",
    verbose = FALSE
  )

# using the explainer to produce a VIP
vip_example <- 
  explain_tidymodels(
    df_model_st_fitted, 
    data = vip_train, 
    y = df$Sale_Price,
    label = "Blended RF",
    verbose = FALSE
  ) %>% 
  model_parts() 

plot(vip_example)

#using the explainer to produce AL plots
al_rf <- model_profile(explainer = explainer_blended_rf,
                       type = "accumulated",
                       variables = names(vip_train)
)

plot(al_rf) +
  ggtitle("Accumulated-local profiles")

总之......我喜欢

stacks
,它既可以创建权重,又可以创建模型对象,稍后可以将其用作 tidymodel。但是,我不想要
stacks
创建的权重,我想创建自己的权重。我不知道我是否应该在
stacks
内做一些事情来创建我想要的权重。或者...如果我根本不应该打扰
stacks
,因为我已经知道我想要的重量了。但是...我不知道如何像
stacks
那样创建一个集成模型,以便以后像 tidymodel 一样使用。

r stack tidymodels ensemble-learning dalex
1个回答
0
投票

这里的一种方法是手动获取每个模型的预测,并获取一个向量,计算存储在结果标题上的列表列中的每个预测值的平均值。

类似这样的:

reduce(results$.pred, \(x, y) x + y) / nrow(results)

要获取堆栈的重要性,在 vip 包中,您可以使用自定义包装器。

© www.soinside.com 2019 - 2024. All rights reserved.