如何调整配方包中的 `step_impute_knn` 函数?

问题描述 投票:0回答:1

我想使用

step_impute_knn
包中的
recipe
函数来估算缺失值。此函数使用高尔距离作为距离度量,适用于预测变量是分类数据和连续数据的混合情况。但据我所知,无法将此函数与
tune()
参数一起使用,因为调整必须在(欧洲防风草)模型上完成。但唯一的防风草模型是
nearest_neighbor
函数,它没有高尔距离作为选项。

样本数据:

train <- structure(list(PassengerId = c("0001_01", "0002_01", "0003_01", 
"0003_02", "0004_01", "0005_01"), HomePlanet = c("Europa", "Earth", 
"Europa", "Europa", "Earth", NA), CryoSleep = c("False", 
"False", "False", "False", "False", "False"), Cabin = c("B/0/P", 
"F/0/S", "A/0/S", "A/0/S", "F/1/S", "F/0/P"), Destination = c("TRAPPIST-1e", 
"TRAPPIST-1e", "TRAPPIST-1e", "TRAPPIST-1e", "TRAPPIST-1e", "PSO J318.5-22"
), Age = c(39, 24, 58, 33, 16, 44), VIP = c("False", "False", 
"True", "False", "False", "False"), RoomService = c(0, 109, 43, 
0, 303, 0), FoodCourt = c(0, 9, 3576, 1283, 70, 483), ShoppingMall = c(0, 
25, 0, 371, 151, 0), Spa = c(0, 549, 6715, 3329, 565, 291), VRDeck = c(0, 
44, 49, 193, 2, 0), Name = c("Maham Ofracculy", "Juanna Vines", 
"Altark Susent", "Solam Susent", "Willy Santantines", "Sandie Hinetthews"
), Transported = c("False", "True", "False", "False", "True", 
"True")), row.names = c(NA, 6L), class = "data.frame")

到目前为止我所拥有的:

train_no_na <- train %>%
na.omit()

imp_knn_blueprint <- recipe(Transported ~ ., data = train_no_na) %>%
     step_impute_knn(recipe = ., HomePlanet, 
              impute_with = imp_vars(.), neighbors = 5, 
              options = list(nthread = 1, eps = 1e-08))

imp_knn_prep <- prep(imp_knn_blueprint, training = train_no_na)
imp_knn_5 <- bake(imp_knn_prep, new_data = train)

是否有某种方法可以使用

tidymodels
parsnip
工作流程来调整
step_impute_knn
内部使用的 knn 函数?我尝试阅读该函数的代码,但没有看到他们使用哪个引擎。

编辑:要明确的是,我想通过一些网格搜索来调整

neighbours
内的
step_impute_knn
参数,而不是手动执行。

r tidymodels recipe r-parsnip
1个回答
0
投票

您可以在

tune()
中设置
step_impute_knn
邻居,类似于配方步骤中的其他超参数。

library(tidymodels)


train_folds <- vfold_cv(train_no_na, v = 3)

imp_knn_blueprint <- recipe(Transported ~ ., data = train_no_na) %>%
  step_impute_knn(HomePlanet, 
                  impute_with = imp_vars(all_predictors()), neighbors = tune::tune(), 
                  options = list(nthread = 1, eps = 1e-08))

log_spec <- logistic_reg()

# Update range as appropriate
knn_params <- extract_parameter_set_dials(imp_knn_blueprint) %>%
  update(neighbors = neighbors(c(1L, 10L)))

knn_grid <- grid_regular(knn_params,
                         levels = c(
                          neighbors = 10
                         ))

knn_wf <- 
  workflow() %>%
  add_model(log_spec) %>%
  add_recipe(imp_knn_blueprint)

impute_knn_tune <-
  knn_wf %>%
  tune_grid(
    train_folds,
    grid = knn_grid,
    metrics = metric_set(roc_auc, accuracy)
  )
© www.soinside.com 2019 - 2024. All rights reserved.