使用Rmarkdown中的插入符号包构建模型

问题描述 投票:0回答:1

我有挑战,需要帮助。我正在做我的第一个数据科学项目,建立模型时会引起挑战。正在跟进有关edureka(https://www.edureka.co/blog/data-science-projects/)的教程,该错误可能出自

set.seed(30162)
trnctrl <- trainControl(method="cv", number=10) 
boostfit <- train(incomelevel ~ age + educationnum + relationship + workclass +
                    occupation + relationship + maritalstatus +
                    hoursperweek + capitalgain + capitalloss +
                    race + nativecountry,
                  trcontrol = trnctrl,
                  method="gbm", 
                  data=trainset, 
                  verbose=FALSE)

我检查了收入水平变量中是否缺少任何值:

table(complete.cases(trainset$incomelevel))
 TRUE 
31978 

错误:

Error in na.fail.default(list(incomelevel = c(1L, 1L, 1L, 1L, 1L, 1L, : missing values in object
``

瞥见(火车)

Observations: 31,978
Variables: 14
$ age           <int> 39, 50, 38, 53, 28, 37, 49, 52, 31, 42, 37, 30, 23, 32, 34, 25, 32, 38, 43, 40, 54, 35, 43, 59,...
$ workclass     <fct> State-gov, Self-emp-not-inc, Private, Private, Private, Private, Private, Self-emp-not-inc, Pri...
$ education     <fct> Bachelors, Bachelors, HS-grad, 11th, Bachelors, Masters, 9th, HS-grad, Masters, Bachelors, Some...
$ educationnum  <int> 13, 13, 9, 7, 13, 14, 5, 9, 14, 13, 10, 13, 13, 12, 4, 9, 9, 7, 14, 16, 9, 5, 7, 9, 13, 9, 10, ...
$ maritalstatus <fct> Never-married, Married-civ-spouse, Divorced, Married-civ-spouse, Married-civ-spouse, Married-ci...
$ occupation    <fct> Adm-clerical, Exec-managerial, Handlers-cleaners, Handlers-cleaners, Prof-specialty, Exec-manag...
$ relationship  <fct> Not-in-family, Husband, Not-in-family, Husband, Wife, Wife, Not-in-family, Husband, Not-in-fami...
$ race          <fct> White, White, White, Black, Black, White, Black, White, White, White, Black, Asian-Pac-Islander...
$ sex           <fct> Male, Male, Male, Male, Female, Female, Female, Male, Female, Male, Male, Male, Female, Male, M...
$ capitalgain   <int> 2174, 0, 0, 0, 0, 0, 0, 0, 14084, 5178, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ capitalloss   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2042, 0, 0, 0, 0, 0, 0, 0, 0,...
$ hoursperweek  <int> 40, 13, 40, 40, 40, 40, 16, 45, 50, 40, 80, 40, 30, 50, 45, 35, 40, 50, 45, 60, 20, 40, 40, 40,...
$ nativecountry <fct> United-States, United-States, United-States, United-States, Cuba, United-States, Jamaica, Unite...
$ incomelevel   <fct> <=50K, <=50K, <=50K, <=50K, <=50K, <=50K, <=50K, >50K, >50K, >50K, >50K, >50K, <=50K, <=50K, <=...
r model r-caret caret
1个回答
0
投票

再次看这个问题,我认为数据需要再进行一些预处理。需要做的是One-Hot-Encoding。查看错误消息后,它声称incomelevel没有整数值。由于您只显示了一小段数据,所以我们可以看到的是

incomelevel   <fct> <=50K, >50K, ...

这使我认为<=>符号也弄乱了事情。如上所述,您的模型需要整数。尝试编写少量代码将incomelevel <=50K, >50K, ...转换为

| Row No. | <=50K | >50K | more(?) |
| ------: |-----: |----: |-------: |
|       1 |     0 |    1 |       0 |
etc...

然后将新表追加到您的数据并删除income

看来,如果发现问题,您可能需要考虑所有因素。

恕我直言,最好将字符串数据转换为仅0,1的整数。

© www.soinside.com 2019 - 2024. All rights reserved.