我在用kaggle的pokemon数据来练习KNN的归纳,通过 preProcess()
但当我这样做的时候,我遇到了以下的消息,在 predict()
步。我想知道是我使用的数据格式不正确,还是有些列的 "类 "不合适。下面是我的代码。
library(dplyr)
library(ggplot2)
library(tidyr)
library(reshape2)
library(caret)
library(skimr)
library(psych)
library(e1071)
library(data.table)
pokemon <- read.csv("https://www.dropbox.com/s/znbta9u9tub2ox9/pokemon.csv?dl=1")
pokemon = tbl_df(pokemon)
# select relevant features
df <- select(pokemon, hp, weight_kg, height_m, sp_attack, sp_defense, capture_rate)
pre_process_missing_data <- preProcess(df, method="knnImpute")
classify_legendary <- predict(pre_process_missing_data, newdata = df)
和我收到这个错误信息
Error: Must subset rows with a valid subscript vector.
x Subscript `nn$nn.idx` must be a simple vector, not a matrix.
Run `rlang::last_error()` to see where the error occurred.
输入的 preProcess
要成为 data.frame
. 这样就可以了
pre_process_missing_data <- preProcess(as.data.frame(df), method="knnImpute")
classify_legendary <- predict(pre_process_missing_data, newdata = df)
classify_legendary
> classify_legendary
# A tibble: 801 x 6
hp weight_kg height_m sp_attack sp_defense capture_rate
<dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 -0.902 -0.498 -0.429 -0.195 -0.212 45
2 -0.337 -0.442 -0.152 0.269 0.325 45
3 0.415 0.353 0.774 1.57 1.76 45
4 -1.13 -0.484 -0.522 -0.349 -0.748 45
5 -0.412 -0.388 -0.0591 0.269 -0.212 45
6 0.340 0.266 0.496 2.71 1.58 45
7 -0.939 -0.479 -0.615 -0.659 -0.247 45
8 -0.375 -0.356 -0.152 -0.195 0.325 45
9 0.378 0.221 0.404 1.97 1.58 45
10 -0.902 -0.535 -0.800 -1.59 -1.82 255
# ... with 791 more rows