注意:我尝试使用在R中,对于第三列中的每个组进行两列之间的相关性测试来解决这个问题,但没有成功
我有以下数据
> x = data.frame(year = c(2019, 2019, 2020, 2020, 2021, 2021, 2022, 2022, 2023, 2023),
group = rep(c("A", "B"), 5),
y = runif(10))
> x
year group y
1 2019 A 0.26550866
2 2019 B 0.37212390
3 2020 A 0.57285336
4 2020 B 0.90820779
5 2021 A 0.20168193
6 2021 B 0.89838968
7 2022 A 0.94467527
8 2022 B 0.66079779
9 2023 A 0.62911404
10 2023 B 0.06178627
我想为每组进行
year
和变量 y
之间的相关性测试。如果我这样做的话,我可以单独实现这个目标
> A = x %>% filter(group == "A")
> B = x %>% filter(group == "B")
>
> cor.test(A$year, A$y)
Pearson's product-moment correlation
data: A$year and A$y
t = 0.67259, df = 3, p-value = 0.5494
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.7644073 0.9430671
sample estimates:
cor
0.3619872
> cor.test(B$year, B$y)
Pearson's product-moment correlation
data: B$year and B$y
t = 1.9909, df = 3, p-value = 0.1406
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.3822520 0.9826436
sample estimates:
cor
0.7544519
我试图用一个
group_by
声明来概括我有很多团体的情况
我的(不成功的)尝试是
> # Unsuccessful attempt 1
>
> x %>% group_by(group) %>%
group_map(~cor.test(x$year, x$y))
[[1]]
Pearson's product-moment correlation
data: x$year and x$y
t = 0.15447, df = 8, p-value = 0.8811
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.5955411 0.6614485
sample estimates:
cor
0.05453354
[[2]]
Pearson's product-moment correlation
data: x$year and x$y
t = 0.15447, df = 8, p-value = 0.8811
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.5955411 0.6614485
sample estimates:
cor
0.05453354
这是不正确的。
> # Unsuccessful attempt 2
> # (using https://stackoverflow.com/questions/14030697/in-r-correlation-test-between-two-columns-for-each-of-the-groups-in-a-third-co)
> library(plyr)
> daply(x, .(group), function(y) cor.test(y$year, y$y))
> # Error message
有没有办法实现获取每组的相关性测试列表?
您可以使用这个简单的代码:
library(dplyr)
df <- data.frame(year = c(2019, 2019, 2020, 2020, 2021, 2021, 2022, 2022, 2023, 2023),
group = rep(c("A", "B"), 5),
y = runif(10))
test_list <- df %>%
group_by(group) %>%
summarize(cor_test=list(cor.test(year, y)))
可以得到A组cor.test的结果,如下:
test_list$cor_test[[1]]
Pearson's product-moment correlation
data: year and y
t = -0.051541, df = 3, p-value = 0.9621
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.8886893 0.8754974
sample estimates:
cor
-0.02974395