将某些列重新缩放为R中的特定均值和标准差

问题描述 投票:0回答:1

给出如下数据帧,如何重新缩放v5,以使mean100standard deviation15

head(df, n=5)

输出:

v1  v2    v3   v4   v5 
65  1   121.12  4   27
98  1   89.36   4   25
85  1   115.44  4   27
83  1   99.45   3   25
115 1   92.75   4   27
98  0   107.90  1   18

我已经尝试过psych包,但最后一列的最终df不正确:

library(psych)
library(tidyverse)
v5.rescaled <- df %>% rescale(df$v5, mean = 100, sd = 15)
df$v5.rescaled

输出:

t.t.scale.x.....sd...mean.
121.11985               
89.35994                
115.43986               
99.44991                
92.74993                

但是对于重新缩放的head(df, n=5)v5不正确:

    v1  v2     v3   v4  v5        v5.rescaled
1   65  1   121.12  4   27  <data.frame [5 × 1]>
2   98  1   89.36   4   25  <data.frame [5 × 1]>
3   85  1   115.44  4   27  <data.frame [5 × 1]>
4   83  1   99.45   3   25  <data.frame [5 × 1]>
5   115 1   92.75   4   27  <data.frame [5 × 1]>
r dplyr psych
1个回答
1
投票
  1. 请下次尝试发布有效的reprex。这样可以避免其他人不得不手动复制输入数据的麻烦。同样,还不清楚您的第一个引用有v1-v5列的df的代码块与后续引用了df$mother.iq的代码块之间的关系。
  2. psych::rescale()的帮助文件特别指出,输入x应该是矩阵或数据帧。我怀疑这就是为什么您获得的输出不是您期望的原因。
  3. 虽然可以使用psych::rescale(),但提供更大灵活性的更好选择可能是完全放弃对{psych}包的依赖,而是仅根据需要手动重新缩放列。下面的reprex中说明了这两种方法:
# load libraries
library(tidyverse)

# define data as per OP
df <- data.frame(
          v1 = c(65L, 98L, 85L, 83L, 115L, 98L),
          v2 = c(1L, 1L, 1L, 1L, 1L, 0L),
          v3 = c(121.12, 89.36, 115.44, 99.45, 92.75, 107.9),
          v4 = c(4L, 4L, 4L, 3L, 4L, 1L),
          v5 = c(27L, 25L, 27L, 25L, 27L, 18L)
)

# rescale via psych::rescale using entire data frame
df %>% psych::rescale(mean = 100, sd = 15)
#>          v1        v2        v3        v4        v5
#> 1  77.38682 106.12372 119.90143 108.25723 109.31746
#> 2 106.46091 106.12372  82.24089 108.25723 100.71673
#> 3  95.00748 106.12372 113.16617 108.25723 109.31746
#> 4  93.24541 106.12372  94.20546  95.87139 100.71673
#> 5 121.43847 106.12372  86.26070 108.25723 109.31746
#> 6 106.46091  69.38138 104.22535  71.09970  70.61416

# if you only want to do this for specific columns, do it manually by targeting
# columns using dplyr::mutate_at(), an anonymous function, and scale (from base
# R):
df %>% 
  mutate_at(vars(v4, v5), function(x) scale(x)*15 + 100)
#>    v1 v2     v3        v4        v5
#> 1  65  1 121.12 108.25723 109.31746
#> 2  98  1  89.36 108.25723 100.71673
#> 3  85  1 115.44 108.25723 109.31746
#> 4  83  1  99.45  95.87139 100.71673
#> 5 115  1  92.75 108.25723 109.31746
#> 6  98  0 107.90  71.09970  70.61416
© www.soinside.com 2019 - 2024. All rights reserved.