我正在进行 knn
在我的数据上进行回归,并希望。
a) 通过交叉验证 repeatedcv
来寻找一个最佳 k
;
b) 在建立knn模型时,使用 PCA
在 90%
级阈值来降低维度。
library(caret)
library(dplyr)
set.seed(0)
data = cbind(rnorm(20, 100, 10), matrix(rnorm(400, 10, 5), ncol = 20)) %>%
data.frame()
colnames(data) = c('True', paste0('Day',1:20))
tr = data[1:15, ] #training set
tt = data[16:20,] #test set
train.control = trainControl(method = "repeatedcv", number = 5, repeats=3)
k = train(True ~ .,
method = "knn",
tuneGrid = expand.grid(k = 1:10),
#trying to find the optimal k from 1:10
trControl = train.control,
preProcess = c('scale','pca'),
metric = "RMSE",
data = tr)
我的问题是
(1) 我注意到 某人 建议修改trainControl中的pca参数。
ctrl <- trainControl(preProcOptions = list(thresh = 0.8))
mod <- train(Class ~ ., data = Sonar, method = "pls",
trControl = ctrl)
如果我修改了trainControl中的参数,是否意味着在KNN过程中仍然进行PCA?和这个问题类似
(2)我发现了另一个 例子 这符合我的情况--我希望将阈值改为90%,但我不知道在哪里可以将它改成这样。Caret
's train
功能,尤其是我还需要 scale
选项。
我为我冗长的描述和随机的参考资料道歉。先谢谢你!
(感谢Camille的建议,使代码能正常工作!)
回答你的问题。
我注意到有人建议改变trainControl中的pca参数。
mod <- train(Class ~ ., data = Sonar, method = "pls",trControl = ctrl)
如果我改变了trainControl中的参数,是否意味着在KNN过程中仍然要进行PCA?
是的,如果你用它来做。
train.control = trainControl(method = "repeatedcv", number = 5, repeats=3,preProcOptions = list(thresh = 0.9))
k = train(True ~ .,
method = "knn",
tuneGrid = expand.grid(k = 1:10),
trControl = train.control,
preProcess = c('scale','pca'),
metric = "RMSE",
data = tr)
你可以在preProcess下检查。
k$preProcess
Created from 15 samples and 20 variables
Pre-processing:
- centered (20)
- ignored (0)
- principal component signal extraction (20)
- scaled (20)
PCA needed 9 components to capture 90 percent of the variance
这样就可以回答2),就是单独使用preProcess。
mdl = preProcess(tr[,-1],method=c("scale","pca"),thresh=0.9)
mdl
Created from 15 samples and 20 variables
Pre-processing:
- centered (20)
- ignored (0)
- principal component signal extraction (20)
- scaled (20)
PCA needed 9 components to capture 90 percent of the variance
train.control = trainControl(method = "repeatedcv", number = 5, repeats=3)
k = train(True ~ .,
method = "knn",
tuneGrid = expand.grid(k = 1:10),
trControl = train.control,
metric = "RMSE",
data = predict(mdl,tr))