R 中的错误消息:错误`[.data.frame`(m,labs):选择了未定义的列

问题描述 投票:0回答:2

我正在尝试使用 Train 函数在数据集中运行回归树。该数据集具有数值变量,我将其转换为类别变量,试图解决错误消息。我还再次使用 TrainControl 函数来尝试解决该错误。帮忙!!!

library(caret)
library(rpart)
library(mlbench)
data(Dataset)
set.seed(1)
ctrl \<- trainControl(method = "cv", savePredictions = TRUE)
model_T \<- train(VALUE\~REF_DATE+Sex+`Age at admission`+`Years since admission`+`Income type`+Statistics+UOM, data = Dataset, method = 'rpart2', trControl = ctrl)
model_T

数据集的结构:

spec_tbl_df \[46,464 x 8\] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ REF_DATE             : Factor w/ 11 levels "2006","2007",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Sex                  : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
$ Age at admission     : Factor w/ 4 levels "1","2","3","4": 4 4 4 4 4 4 4 4 4 4 ...
$ Years since admission: Factor w/ 11 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Income type          : Factor w/ 6 levels "1","2","3","4",..: 6 6 6 6 6 6 6 6 6 6 ...
$ Statistics           : Factor w/ 4 levels "1","2","3","4": 3 3 3 3 3 3 3 3 3 3 ...
$ UOM                  : Factor w/ 2 levels "1","2": 2 2 2 2 2 2 2 2 2 2 ...
$ VALUE                : num \[1:46464\] 154640 145895 151290 155340 169745 ...
r dataframe regression rpart
2个回答
0
投票

问题与列名中的空格有关

library(caret)
library(rpart)
library(mlbench)
ctrl <- trainControl(method = "cv",
                     savePredictions =TRUE)
model_T <- train(VALUE~REF_DATE+Sex+`Age at admission`+`Years since admission`+`Income type`+Statistics+UOM, 
                 data = Dataset, method = 'rpart2', trControl = ctrl)
#Error in `[.data.frame`(m, labs) : undefined columns selected 

如果我们使用名称干净的数据集,即用下划线等替换空格,它应该可以工作 - 这里我们使用

clean_names
中的
janitor
来做到这一点

library(janitor)
Dataset2 <- clean_names(Dataset)
names(Dataset2)
#[1] "value"                 "ref_date"              "sex"                   "age_at_admission"      "years_since_admission" "income_type"           "statistics"            "uom"    

现在创建模型

model_T2 <- train(value~ref_date+sex+ age_at_admission+years_since_admission+income_type+statistics+uom, 
                  data = Dataset2, method = 'rpart2', trControl = ctrl)

-输出

> model_T2
CART 

200 samples
  7 predictor

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 180, 180, 180, 180, 180, 180, ... 
Resampling results across tuning parameters:

  maxdepth  RMSE       Rsquared    MAE      
  1         0.9669617  0.03721968  0.7642369
  2         0.9674085  0.02626375  0.7656366
  6         1.0268165  0.03139845  0.8033324

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was maxdepth = 1.

数据

set.seed(123)
Dataset <- tibble(VALUE = rnorm(200), REF_DATE = factor(rep(c(2006, 2007), each = 100)), Sex = factor(sample(1:4, size = 200, replace = TRUE)),
                  `Age at admission` = factor(sample(1:4, size = 200, replace = TRUE)),
                  `Years since admission` = factor(sample(1:11, size = 200, replace = TRUE)), 
                  `Income type` = factor(sample(1:6, size = 200, replace = TRUE)),
                  Statistics = factor(sample(1:4, size = 200, replace = TRUE)),
                  UOM = factor(sample(1:2, size = 200, replace = TRUE))
                  )

0
投票

显然,数据列名称中有一些

space
,这在 R 中在语法上无效。另外,请注意
','
,它对
data frames
有效,但对模型中的公式无效。

除了 akrun 的函数和库之外,您还可以使用

make.names()
包中的
base
函数,如下所示:

names(Dataset)=make.names(names(Dataset))

一旦您修复了名称,错误消息就会消失,您的模型就会起飞。

© www.soinside.com 2019 - 2024. All rights reserved.