na.fail.default中的随机森林错误:对象中缺少值

问题描述 投票:0回答:1

我正在运行一个RF模型,该模型的大多数变量都没有错误;但是,当我包含一个变量:duration_in_program和以下代码时:

```{r Random Forest Model}
## Run a Random Forest model
mod_rf <-
  train(left_school ~ job_title 
        + gender + 
        + marital_status + age_at_enrollment + monthly_wage + educational_qualification + cityD + educational_qualification + cityC.
        + cityB +cityA + duration_in_program, # Equation (outcome and everything else)
        data=train_data, # Training data 
        method = "ranger", # random forest (ranger is much faster than rf)
        metric = "ROC", # area under the curve
        trControl = control_conditions,
        tuneGrid = tune_mtry
  )
mod_rf

我收到以下错误:

Error in na.fail.default(list(left_welfare = c(1L, 2L, 2L, 2L, 2L, 2L, : missing values in object
machine-learning random-forest r-caret feature-selection
1个回答
1
投票

假设train()来自插入符号,您可以使用na.action参数指定一个处理na的函数。默认值为na.failna.omit是非常常见的一种。 randomForest库具有na.roughfix,它将“按中位数/模式插入缺失值。”

mod_rf <-
  train(left_school ~ job_title 
        + gender + 
        + marital_status + age_at_enrollment + monthly_wage + educational_qualification + cityD + educational_qualification + cityC.
        + cityB +cityA + duration_in_program, # Equation (outcome and everything else)
        data=train_data, # Training data 
        method = "ranger", # random forest (ranger is much faster than rf)
        metric = "ROC", # area under the curve
        trControl = control_conditions,
        tuneGrid = tune_mtry,
        na.action = na.omit
  )
mod_rf
© www.soinside.com 2019 - 2024. All rights reserved.