使用 tidyverse 对具有以下数据结构的每个特征运行 Spearman 相关性

问题描述 投票:0回答:1

我有一个大约 700 列的数据框,其中列 1:699 个特征,最后一列是组名称(A 或 B)。我打算做的是在每个列特征的组之间运行多重相关性。

我的数据如下:

   speed  distance  mpg car_group
    120   3000      25   A
    110   3040      35   A
    .      .        .
    .      .        .
    .      .        .
    .      .        .
    70   4000      50    B
    73   5000      30    B


 The code that I have written: 
 data %>% 
     pivot_longer(!car_group, names_to = 'Feature', values_to = 'value') %>%
     nest(Feature) %>% 
     dplyr::mutate(
        fit = map(data, ~cor.test(car_group,value,method = 'spearman', data=.x)),
        tidied = map(fit,tidy),
      ) %>%
    unnest(tidied)
     

cor.test 函数无法拟合相关性,因为它需要两个数字列。

最好的实施方式是什么?

非常感谢!

r tidymodels tidy
1个回答
0
投票

我编写了一个名为 {longpairs} 的包来处理这种情况。

可以从 github 安装:

remotes::install_github("the-mad-statter/longpairs")

有了它你可以做到:

library(purrr)
library(dplyr)
library(longpairs)

data <- data.frame(
  id = rep(1:4, 3),
  group = rep(LETTERS[1:3], each = 4),
  y1 = rnorm(12),
  y2 = rnorm(12)
)

# 2 features 
# for 3 groups 
# with 4 observations in each group
head(data)
#>   id group         y1         y2
#> 1  1     A  0.3229226 -0.8212847
#> 2  2     A -0.4579473  0.9818359
#> 3  3     A -0.7886976 -1.3323891
#> 4  4     A -0.3926829 -1.6587242
#> 5  1     B -0.7830501  0.5354096
#> 6  2     B  1.8048759 -0.3728410

# names of feature columns
features <- setdiff(names(data), c("id", "group"))

# row bind (i.e., map_dfr()) lp_cor() output across features 
# because lp_cor() is only programmed to deal with one feature
map_dfr(
  features,
  ~ {
    bind_cols(
      data.frame(y = .),
      lp_cor(data, !!., group, id)
    )
  }
)
#>    y    name1    name2    estimate   statistic    p.value parameter    conf.low
#> 1 y1 group==A group==B  0.17197091  0.24688163 0.82802909         2 -0.94536521
#> 2 y1 group==A group==C -0.04436289 -0.06280044 0.95563711         2 -0.96433405
#> 3 y1 group==B group==C  0.32814896  0.49127667 0.67185104         2 -0.92450976
#> 4 y2 group==A group==B -0.61703119 -1.10887147 0.38296881         2 -0.99064517
#> 5 y2 group==A group==C -0.54078977 -0.90921376 0.45921023         2 -0.98824198
#> 6 y2 group==B group==C  0.95562028  4.58739129 0.04437972         2 -0.06702336
#>   conf.high                               method alternative p.flag n1 n2
#> 1 0.9723491 Pearson's product-moment correlation   two.sided         4  4
#> 2 0.9575509 Pearson's product-moment correlation   two.sided         4  4
#> 3 0.9801246 Pearson's product-moment correlation   two.sided         4  4
#> 4 0.8453892 Pearson's product-moment correlation   two.sided         4  4
#> 5 0.8751564 Pearson's product-moment correlation   two.sided         4  4
#> 6 0.9990998 Pearson's product-moment correlation   two.sided      *  4  4
#>           m1         m2        s1        s2 message.class message
#> 1 -0.3291013 -0.4756453 0.4679772 1.7625710          <NA>    <NA>
#> 2 -0.3291013 -0.2154266 0.4679772 1.1180152          <NA>    <NA>
#> 3 -0.4756453 -0.2154266 1.7625710 1.1180152          <NA>    <NA>
#> 4 -0.7076405  0.1726153 1.1778676 0.5892729          <NA>    <NA>
#> 5 -0.7076405  0.5184370 1.1778676 1.1827297          <NA>    <NA>
#> 6  0.1726153  0.5184370 0.5892729 1.1827297          <NA>    <NA>

创建于 2024-04-12,使用 reprex v2.1.0

© www.soinside.com 2019 - 2024. All rights reserved.