按组创建具有相关性和p值的数据框?

问题描述 投票:0回答:1

我正在尝试根据R中的特定组(COUNTY)关联多个变量。尽管我能够通过此方法成功找到各列的关联,但似乎找不到找到保存p-的方法。每个组的表的值。有什么建议吗?

示例数据:

crops <- data.frame(
    COUNTY = sample(37001:37900), 
    CropYield = sample(c(1:100), 10, replace = TRUE), 
    MaxTemp =sample(c(40:80), 10, replace = TRUE),
    precip =sample(c(0:10), 10, replace = TRUE), 
    ColdDays =sample(c(1:73), 10, replace = TRUE))

示例代码:

crops %>% 
     group_by(COUNTY) %>%
     do(data.frame(Cor=t(cor(.[,2:5], .[,2]))))

^这为我提供了每一列的相关性,但我还需要知道每一列的p值。理想情况下,最终输出将如下所示。

Desired Output

r statistics correlation p-value
1个回答
0
投票
[每个COUNTY您只有1个观测值,所以它将不起作用。我为每个COUNTY设置了更多示例:

set.seed(111) crops <- data.frame( COUNTY = sample(37001:37002,10,replace=TRUE), CropYield = sample(c(1:100), 10, replace = TRUE), MaxTemp =sample(c(40:80), 10, replace = TRUE), precip =sample(c(0:10), 10, replace = TRUE), ColdDays =sample(c(1:73), 10, replace = TRUE))

我认为您需要转换为长格式,并对每个COUNTY和变量进行cor.test

calcor=function(da){ data.frame(cor.test(da$CropYield,da$value)[c("estimate","p.value")]) } crops %>% pivot_longer(-c(COUNTY,CropYield)) %>% group_by(COUNTY,name) %>% do(calcor(.)) # A tibble: 6 x 4 # Groups: COUNTY, name [6] COUNTY name estimate p.value <int> <chr> <dbl> <dbl> 1 37001 ColdDays 0.466 0.292 2 37001 MaxTemp -0.225 0.628 3 37001 precip -0.356 0.433 4 37002 ColdDays 0.888 0.304 5 37002 MaxTemp 0.941 0.220 6 37002 precip -0.489 0.674

以上为您提供了每个县的每个变量与作物产量的相关性。现在只需将其转换为宽格式即可:

crops %>% pivot_longer(-c(COUNTY,CropYield)) %>% group_by(COUNTY,name) %>% do(calcor(.)) %>% pivot_wider(values_from=c(estimate,p.value),names_from=name) COUNTY estimate_ColdDa… estimate_MaxTemp estimate_precip p.value_ColdDays <int> <dbl> <dbl> <dbl> <dbl> 1 37001 0.466 -0.225 -0.356 0.292 2 37002 0.888 0.941 -0.489 0.304 # … with 2 more variables: p.value_MaxTemp <dbl>, p.value_precip <dbl>

© www.soinside.com 2019 - 2024. All rights reserved.