如何解决R中未定义的列选择错误?

问题描述 投票:1回答:1

尽管我不太适合我的项目使用caret r包,但我打算使用lassorandomforest进行预测。我使用randomforest对数据进行了预测,但是出现如下奇怪的错误:

>     Error in `[.data.frame`(data, , all.vars(Terms), drop = FALSE) : 
>       undefined columns selected
>     In addition: There were 50 or more warnings (use warnings() to see the first 50)

我不明白为什么会这样。任何使这项工作的线索?为什么我有这个错误?有什么想法吗?

最小可复制数据

这里是最小的可复制数据:

mydf = structure(list(taken_time = c(15L, 5L, 39L, 
-21L, 46L, 121L, 9L, 100L, 70L, 92L, 31L, 37L), ap6xl = c(203.2893857, 
4.858269406, 200, 14220, 218.2215352, 115.5227706, 4.858269406, 
516.18125, 72.06166523, 4.858269406, 96.68516046, 386.1480917
), pct5 = c(732.074484, 25.67901235, 1900, 120.0477168, 3621.328567, 
79.30561111, 8376.70314, 4183.709089, 59.77649029, 997.7490228, 
118.9774144, 171.2285804), crp4 = c(196115424.7, 1073624.455, 
10007, 1457496.474, 10343851.7, 81288042.73, 320405225.1, 334807893.9, 
112950094.2, 15775090.31, 3008739.881, 127837638.1), age = c(52L, 
74L, 52L, 67L, 82L, 67L, 71L, 84L, 58L, 52L, 81L, 60L), gender = structure(c(2L, 
2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("F", 
"M"), class = "factor"), inpatient_readmission_time_rtd = c(79.78819444, 
57.59068053, 57.59068053, 57.59068053, 57.59068053, 9.893055556, 
150.1951389, 57.59068053, 134.05625, 57.59068053, 65.16041667, 
17.46527778), infection_flag = c(0L, 0L, 1L, 1L, 0L, 1L, 0L, 
1L, 1L, 1L, 1L, 0L), temperature_value = c(98.9, 98.9, 98, 101.3, 
99.5, 98.1, 98.7, 97.1, 98.1, 98.2, 100.4, 98.8), heartrate_value = c(106, 
61, 78, 91, 120, 68, 93.55081001, 122, 110, 75, 116, 111), pH_result_time_rta = c(11, 
85.50402145, 85.50402145, 85.50402145, 85.50402145, 85.50402145, 
85.50402145, 85.50402145, 85.50402145, 85.50402145, 50, 85.50402145
), gcst_value = c(15, 15, 15, 14.63769293, 15, 14.63769293, 15, 
15, 15, 14.63769293, 15, 15)), row.names = c(NA, 12L), class = "data.frame")

我的尝试

这是我尝试过的方法,但是插入符号只是在抱怨这一点。为什么?有什么主意吗?

library(caret)

fitControl <- trainControl(method = "repeatedcv",number = 10,repeats = 10, search = "random")
model_cv <- train(mydf$gcst_value ~ .,data = dat,method = "randomforest",
                  trControl = fitControl,na.action = na.omit)

immunoscore = predict(model_cv, mydf)

更新

这是我的r会议:

> > sessionInfo() R version 3.6.3 (2020-02-29) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build
> 18362)
> 
> Matrix products: default
> 
> Random number generation:  RNG:     Mersenne-Twister   Normal: 
> Inversion   Sample:  Rounding    locale: [1] LC_COLLATE=English_United
> States.1252  LC_CTYPE=English_United States.1252    [3]
> LC_MONETARY=English_United States.1252 LC_NUMERIC=C                   
> [5] LC_TIME=English_United States.1252    
> 
> attached base packages: [1] stats     graphics  grDevices utils    
> datasets  methods   base     
> 
> other attached packages: [1] randomForest_4.6-14 data.table_1.12.8  
> stringr_1.4.0       ranger_0.12.1       caret_6.0-86        [6]
> ggplot2_3.3.0       lattice_0.20-38     jsonlite_1.6.1     
> dplyr_0.8.5        
> 
> loaded via a namespace (and not attached):  [1] Rcpp_1.0.3          
> pillar_1.4.3         compiler_3.6.3       gower_0.2.1         
> plyr_1.8.6            [6] class_7.3-15         iterators_1.0.12    
> tools_3.6.3          elasticnet_1.1.1     rpart_4.1-15         [11]
> ipred_0.9-9          lubridate_1.7.4      lifecycle_0.2.0     
> tibble_2.1.3         gtable_0.3.0         [16] nlme_3.1-144        
> pkgconfig_2.0.3      rlang_0.4.5          Matrix_1.2-18       
> foreach_1.5.0        [21] rstudioapi_0.11      prodlim_2019.11.13  
> withr_2.1.2          pROC_1.16.2          generics_0.0.2       [26]
> recipes_0.1.10       stats4_3.6.3         nnet_7.3-12         
> grid_3.6.3           tidyselect_1.0.0     [31] glue_1.3.2          
> R6_2.4.1             survival_3.1-8       lava_1.6.7          
> reshape2_1.4.3       [36] purrr_0.3.3          magrittr_1.5        
> lars_1.2             ModelMetrics_1.2.2.2 splines_3.6.3        [41]
> MASS_7.3-51.5        scales_1.1.0         codetools_0.2-16    
> assertthat_0.2.1     timeDate_3043.102    [46] colorspace_1.4-1    
> stringi_1.4.6        munsell_0.5.0        crayon_1.3.4
r random-forest r-caret
1个回答
1
投票

您需要解决两个问题:

  • 您需要在data中包含所有列。由于gcst_value与data.frame参数(data

  • 的位置不同,因此会导致您的问题出现错误。
  • dat不是有效的模型。在方法参数中用randomForest表示。

解决上述问题(请参见下面的注释):

rf

摘要:

fitControl <- trainControl(method = "repeatedcv",number = 10,repeats = 10, 
   search = "random")
    model_cv <- train(gcst_value ~ .,data = mydf,method = "rf",
                      trControl = fitControl,
    na.action = na.omit)
    immunoscore = predict(model_cv, mydf)

获得 summary(model_cv) Length Class Mode call 4 -none- call type 1 -none- character predicted 12 -none- numeric mse 500 -none- numeric rsq 500 -none- numeric oob.times 12 -none- numeric importance 11 -none- numeric importanceSD 0 -none- NULL localImportance 0 -none- NULL proximity 0 -none- NULL (完全具有代表性)

RMSE

NOTE

  1. 此模型的有效性由原始发布者负责。

  2. 警告可能是由于模型有效性问题引起的。我从答案中忽略了那些。

更多说明

关于检查警告消息(请参阅上面的注释1):

50:在randomForest.default(x,y,mtry = param $ mtry,...)中:响应具有五个或更少的唯一值。您确定要进行回归吗?

© www.soinside.com 2019 - 2024. All rights reserved.