preProc = c(“ center”,“ scale”)表示插入符号的包装(R)和最小-最大规格化

问题描述 投票:0回答:1

我想知道如何在preProctrain()功能中使用caret。我正在使用train()neuralnet函数中运行神经网络。代码来自this question

这实际上是代码:

nn <- train(medv ~ ., 
            data = df, 
            method = "neuralnet", 
            tuneGrid = grid,
            metric = "RMSE",
            preProc = c("center", "scale", "nzv"), #good idea to do this with neural nets - your error is due to non scaled data
            trControl = trainControl(
              method = "cv",
              number = 5,
              verboseIter = TRUE)
            )

原始数据未缩放,因此建议在运行神经网络之前先缩放数据。

但是,在参数preProc中出现三个元素:centerscalenzv。我在解释这些价值观时遇到问题,因为我不知道为什么会出现这些价值观。此外,我想使用min-max缩放/标准化我的数据。这将是功能:

maxs = apply(pk_dc_only$C, 2, max)
mins = apply(pk_dc_only$C, 2, min)
scaled = as.data.frame(scale(df, center = mins, scale = maxs - mins))

是否可以在preProc内使用最小-最大缩放对数据进行归一化?

如果是的话,如何在预测时撤消缩放?

r neural-network normalization r-caret
1个回答
0
投票

[c(“ center”,“ scale”,“ nzv”)的三个选项确实在vignette中缩放和居中:

method =“ center”减去预测变量数据的平均值(再次 方法中的x)数据来自预测值,而method =“ scale” 除以标准差。

nzv基本上排除方差接近零的变量,这意味着它们几乎是恒定的,并且很可能对预测没有用。要执行最小最大,有一个选项:

“ range”转换会将数据缩放到“ rangeBounds”之内。 如果新样本的值大于或小于 训练集,值将超出此范围。

我们在下面尝试:

library(mlbench)
data(BostonHousing)
library(caret)

idx = sample(nrow(BostonHousing),400)
df = BostonHousing[idx,]
df$chas = as.numeric(df$chas)
pre_mdl = preProcess(df,method="range")

nn <- train(medv ~ ., data = predict(pre_mdl,df),
method = "neuralnet",tuneGrid=G,
metric = "RMSE",trControl = trainControl(
method = "cv",number = 5,verboseIter = TRUE))

nn$preProcess
Created from 400 samples and 13 variables

Pre-processing:
  - ignored (0)
  - re-scaling to [0, 1] (13)

summary(nn$finalModel$data)


          crim                zn             indus             chas       
 Min.   :0.000000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
 1st Qu.:0.000821   1st Qu.:0.0000   1st Qu.:0.1646   1st Qu.:0.0000  
 Median :0.002454   Median :0.0000   Median :0.2969   Median :0.0000  
 Mean   :0.042130   Mean   :0.1309   Mean   :0.3804   Mean   :0.0625  
 3rd Qu.:0.039150   3rd Qu.:0.2000   3rd Qu.:0.6466   3rd Qu.:0.0000  
 Max.   :1.000000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
      nox               rm              age              dis         
 Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.00000  
 1st Qu.:0.1276   1st Qu.:0.4470   1st Qu.:0.4032   1st Qu.:0.08522  
 Median :0.2819   Median :0.5076   Median :0.7503   Median :0.20133  
 Mean   :0.3363   Mean   :0.5232   Mean   :0.6647   Mean   :0.25146  
 3rd Qu.:0.4918   3rd Qu.:0.5880   3rd Qu.:0.9361   3rd Qu.:0.38622  
 Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.00000  
      rad              tax            ptratio             b         
 Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
 1st Qu.:0.1304   1st Qu.:0.1770   1st Qu.:0.5106   1st Qu.:0.9475  
 Median :0.1739   Median :0.2729   Median :0.6862   Median :0.9861  
 Mean   :0.3676   Mean   :0.4171   Mean   :0.6243   Mean   :0.8987  
 3rd Qu.:1.0000   3rd Qu.:0.9141   3rd Qu.:0.8085   3rd Qu.:0.9983  
 Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
     lstat           .outcome     
 Min.   :0.0000   Min.   :0.0000  
 1st Qu.:0.1492   1st Qu.:0.2683  
 Median :0.2705   Median :0.3644  
 Mean   :0.3069   Mean   :0.3902  
 3rd Qu.:0.4220   3rd Qu.:0.4450  
 Max.   :1.0000   Max.   :1.0000 

不太清楚“预测时撤消缩放”是什么意思。也许您的意思是将它们转换回原始比例:

test = BostonHousing[-idx,]
test$chas = as.numeric(test$chas)
test_medv = test$medv
test = predict(pre_mdl,test)

范围存储在preProcess模型下,在>下>

pre_mdl$ranges
         crim  zn indus chas   nox    rm   age     dis rad tax ptratio      b
[1,]  0.00632   0  0.46    1 0.385 3.561   2.9  1.1691   1 187    12.6   0.32
[2,] 88.97620 100 27.74    2 0.871 8.780 100.0 12.1265  24 711    22.0 396.90
     lstat medv
[1,]  1.73    5
[2,] 36.98   50

所以我们写了一个包装器:

convert_response = function(value,mdl,method,column){
bounds = mdl[[method]][,column]
value*diff(bounds) + min(bounds)
}

plot(test_medv,convert_response(predict(nn,test),pre_mdl,"ranges","medv"),
ylab="predicted")

enter image description here

© www.soinside.com 2019 - 2024. All rights reserved.