如何在R中对每个组进行相关性测试并将结果存储在列表中?

问题描述 投票:0回答:1

注意:我尝试使用在R中,对于第三列中的每个组进行两列之间的相关性测试来解决这个问题,但没有成功

我有以下数据

> x = data.frame(year = c(2019, 2019, 2020, 2020, 2021, 2021, 2022, 2022, 2023, 2023),
                 group = rep(c("A", "B"), 5),
                 y = runif(10))
> x
   year group          y
1  2019     A 0.26550866
2  2019     B 0.37212390
3  2020     A 0.57285336
4  2020     B 0.90820779
5  2021     A 0.20168193
6  2021     B 0.89838968
7  2022     A 0.94467527
8  2022     B 0.66079779
9  2023     A 0.62911404
10 2023     B 0.06178627

我想为每组进行

year
和变量
y
之间的相关性测试。如果我这样做的话,我可以单独实现这个目标

> A = x %>% filter(group == "A")
> B = x %>% filter(group == "B")
> 
> cor.test(A$year, A$y)

    Pearson's product-moment correlation

data:  A$year and A$y
t = 0.67259, df = 3, p-value = 0.5494
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.7644073  0.9430671
sample estimates:
      cor 
0.3619872

> cor.test(B$year, B$y)

    Pearson's product-moment correlation

data:  B$year and B$y
t = 1.9909, df = 3, p-value = 0.1406
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.3822520  0.9826436
sample estimates:
      cor 
0.7544519 

我试图用一个

group_by
声明来概括我有很多团体的情况

我的(不成功的)尝试是

> # Unsuccessful attempt 1
>
> x %>% group_by(group) %>% 
   group_map(~cor.test(x$year, x$y))

[[1]]

    Pearson's product-moment correlation

data:  x$year and x$y
t = 0.15447, df = 8, p-value = 0.8811
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.5955411  0.6614485
sample estimates:
       cor 
0.05453354 


[[2]]

    Pearson's product-moment correlation

data:  x$year and x$y
t = 0.15447, df = 8, p-value = 0.8811
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.5955411  0.6614485
sample estimates:
       cor 
0.05453354

这是不正确的。

> # Unsuccessful attempt 2
> # (using https://stackoverflow.com/questions/14030697/in-r-correlation-test-between-two-columns-for-each-of-the-groups-in-a-third-co)
> library(plyr)
> daply(x, .(group), function(y) cor.test(y$year, y$y))
> # Error message

有没有办法实现获取每组的相关性测试列表?

r tidyverse grouping correlation hypothesis-test
1个回答
0
投票

您可以使用这个简单的代码:

library(dplyr)
df <-  data.frame(year = c(2019, 2019, 2020, 2020, 2021, 2021, 2022, 2022, 2023, 2023),
                  group = rep(c("A", "B"), 5),
                  y = runif(10))

test_list <- df %>% 
             group_by(group) %>% 
             summarize(cor_test=list(cor.test(year, y)))

可以得到A组cor.test的结果,如下:

test_list$cor_test[[1]]

    Pearson's product-moment correlation

data:  year and y
t = -0.051541, df = 3, p-value = 0.9621
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.8886893  0.8754974
sample estimates:
        cor 
-0.02974395
© www.soinside.com 2019 - 2024. All rights reserved.