我有一个大约 700 列的数据框,其中列 1:699 个特征,最后一列是组名称(A 或 B)。我打算做的是在每个列特征的组之间运行多重相关性。
我的数据如下:
speed distance mpg car_group
120 3000 25 A
110 3040 35 A
. . .
. . .
. . .
. . .
70 4000 50 B
73 5000 30 B
The code that I have written:
data %>%
pivot_longer(!car_group, names_to = 'Feature', values_to = 'value') %>%
nest(Feature) %>%
dplyr::mutate(
fit = map(data, ~cor.test(car_group,value,method = 'spearman', data=.x)),
tidied = map(fit,tidy),
) %>%
unnest(tidied)
cor.test 函数无法拟合相关性,因为它需要两个数字列。
最好的实施方式是什么?
非常感谢!
我编写了一个名为 {longpairs} 的包来处理这种情况。
可以从 github 安装:
remotes::install_github("the-mad-statter/longpairs")
有了它你可以做到:
library(purrr)
library(dplyr)
library(longpairs)
data <- data.frame(
id = rep(1:4, 3),
group = rep(LETTERS[1:3], each = 4),
y1 = rnorm(12),
y2 = rnorm(12)
)
# 2 features
# for 3 groups
# with 4 observations in each group
head(data)
#> id group y1 y2
#> 1 1 A 0.3229226 -0.8212847
#> 2 2 A -0.4579473 0.9818359
#> 3 3 A -0.7886976 -1.3323891
#> 4 4 A -0.3926829 -1.6587242
#> 5 1 B -0.7830501 0.5354096
#> 6 2 B 1.8048759 -0.3728410
# names of feature columns
features <- setdiff(names(data), c("id", "group"))
# row bind (i.e., map_dfr()) lp_cor() output across features
# because lp_cor() is only programmed to deal with one feature
map_dfr(
features,
~ {
bind_cols(
data.frame(y = .),
lp_cor(data, !!., group, id)
)
}
)
#> y name1 name2 estimate statistic p.value parameter conf.low
#> 1 y1 group==A group==B 0.17197091 0.24688163 0.82802909 2 -0.94536521
#> 2 y1 group==A group==C -0.04436289 -0.06280044 0.95563711 2 -0.96433405
#> 3 y1 group==B group==C 0.32814896 0.49127667 0.67185104 2 -0.92450976
#> 4 y2 group==A group==B -0.61703119 -1.10887147 0.38296881 2 -0.99064517
#> 5 y2 group==A group==C -0.54078977 -0.90921376 0.45921023 2 -0.98824198
#> 6 y2 group==B group==C 0.95562028 4.58739129 0.04437972 2 -0.06702336
#> conf.high method alternative p.flag n1 n2
#> 1 0.9723491 Pearson's product-moment correlation two.sided 4 4
#> 2 0.9575509 Pearson's product-moment correlation two.sided 4 4
#> 3 0.9801246 Pearson's product-moment correlation two.sided 4 4
#> 4 0.8453892 Pearson's product-moment correlation two.sided 4 4
#> 5 0.8751564 Pearson's product-moment correlation two.sided 4 4
#> 6 0.9990998 Pearson's product-moment correlation two.sided * 4 4
#> m1 m2 s1 s2 message.class message
#> 1 -0.3291013 -0.4756453 0.4679772 1.7625710 <NA> <NA>
#> 2 -0.3291013 -0.2154266 0.4679772 1.1180152 <NA> <NA>
#> 3 -0.4756453 -0.2154266 1.7625710 1.1180152 <NA> <NA>
#> 4 -0.7076405 0.1726153 1.1778676 0.5892729 <NA> <NA>
#> 5 -0.7076405 0.5184370 1.1778676 1.1827297 <NA> <NA>
#> 6 0.1726153 0.5184370 0.5892729 1.1827297 <NA> <NA>
创建于 2024-04-12,使用 reprex v2.1.0