如何同时使用 group_by 和 mutate 来计算某些列的平均值?

问题描述 投票:0回答:1

我正在尝试计算四组的平均值。我的数据框看起来类似于以下内容:

Sex <- c("F", "F", "M", "M", "F")
Phenotype <- c(Control, Experimental, Experimental, Control, Control)
MOp_Amygdala <- c("10", "15", "2", "6", "8")
MOp_Thalamus <- c("19", "12", "4", "4", "6")
MOp_Cerebellum <- c("34", "45", "67", "78", "99")
MOq_Cortex <- c("2", "5", "6", "17", "2")
MOq_Striatum  <- c("100", "101", "102", "106", "200")

df <- data.frame(Sex, Phenotype, MOp_Amygdala, MOp_Thalamus, MOp_Cerebellum, MOq_Cortex, MOq_Striatum)

我想找到我的四组杏仁核、丘脑和小脑的平均值:M-对照、M-实验、F-对照和 F-实验。

这是我到目前为止所尝试过的:

Q1 <- data %>% 
  group_by(Sex, Phenotype)%>%
  select(starts_with("MOp")) %>%
  rowwise() %>%
  mutate(Group_Means = mean(c(MOp_Amygdala, MOp_Thalamus, MOp_Cerebellum))) #redundant 

我的输出主要问题是 group_by 似乎不起作用。我最终得到了 5 个观察值,每个样本一个观察值,而不是 4 个观察值(M-对照、M-实验、F-对照和 F-实验)。

r group-by mean data-cleaning data-wrangling
1个回答
0
投票

您可以通过汇总数据框来计算每组的

mean

我修改了您的输入数据:

  1. 现在引用表型值
  2. 数字数据不再用引号引起来
  3. 我为本次演示生成了一些额外的行
Sex <- rep(c("F", "F", "M", "M", "F"), 5)
Phenotype <- rep(c('Control', 'Experimental', 'Experimental', 'Control', 'Control'), 5)
MOp_Amygdala <- c(10, 15, 2, 6, 8, sample(seq(1,20,1), 20, replace = TRUE))
MOp_Thalamus <- c(19, 12, 4, 4, 6, sample(seq(1,20,1), 20, replace = TRUE))
MOp_Cerebellum <- c(34, 45, 67, 78, 99, sample(seq(20,100,1), 20, replace = TRUE))
MOq_Cortex <- c(2, 5, 6, 17, 2, sample(seq(1,20,1), 20, replace = TRUE))
MOq_Striatum  <- c(100, 101, 102, 106, 200, sample(seq(100,200,1), 20, replace = TRUE))

df <- data.frame(Sex, Phenotype, MOp_Amygdala, MOp_Thalamus, MOp_Cerebellum, MOq_Cortex, MOq_Striatum)

library(tidyverse)

glimpse(df)
#> Rows: 25
#> Columns: 7
#> $ Sex            <chr> "F", "F", "M", "M", "F", "F", "F", "M", "M", "F", "F", …
#> $ Phenotype      <chr> "Control", "Experimental", "Experimental", "Control", "…
#> $ MOp_Amygdala   <dbl> 10, 15, 2, 6, 8, 16, 6, 14, 3, 2, 16, 20, 15, 15, 2, 8,…
#> $ MOp_Thalamus   <dbl> 19, 12, 4, 4, 6, 14, 9, 12, 2, 9, 17, 17, 4, 7, 16, 9, …
#> $ MOp_Cerebellum <dbl> 34, 45, 67, 78, 99, 73, 21, 94, 30, 75, 54, 80, 48, 27,…
#> $ MOq_Cortex     <dbl> 2, 5, 6, 17, 2, 6, 8, 5, 10, 4, 7, 14, 8, 1, 12, 11, 12…
#> $ MOq_Striatum   <dbl> 100, 101, 102, 106, 200, 192, 193, 162, 121, 198, 109, …

以下是计算这三列每组平均值的一种方法:

df %>%
  summarise(across(starts_with('MOp'), mean),
            .by = c(Sex, Phenotype))
#>   Sex    Phenotype MOp_Amygdala MOp_Thalamus MOp_Cerebellum
#> 1   F      Control          7.9         12.3           61.5
#> 2   F Experimental         14.6         13.2           52.6
#> 3   M Experimental         12.4         10.8           73.6
#> 4   M      Control          7.2          5.4           50.4

创建于 2023-07-24,使用 reprex v2.0.2

© www.soinside.com 2019 - 2024. All rights reserved.