(R, dplyr) 选择以相同字符串开头的多列，并按组汇总平均值 (90% CI)

Question

我是 tidyverse 的新手，从概念上讲，我想计算以“ab”开头、按“case”分组的所有列的平均值和 90% CI。尝试了很多方法，但似乎都不起作用，我的实际数据有很多列，所以明确列出它们不是一个选择。

测试数据如下

library(tidyverse)

dat <- tibble(case= c("case1", "case1", "case2", "case2", "case3"), 
              abc = c(1, 2, 3, 1, 2), 
              abe = c(1, 3, 2, 3, 4), 
              bca = c(1, 6, 3, 8, 9))

下面的代码是我在概念上想做的事情，但显然不起作用

dat %>% group_by(`case`) %>% 
  summarise(mean=mean(select(starts_with("ab"))), 
            qt=quantile(select(starts_with("ab"), prob=c(0.05, 0.95))))

我想要得到的是像下面这样的东西

case abc_mean abe_mean abc_lb abc_ub abe_lb abe_ub

  <chr>    <dbl>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 case1      1.5      2.0   1.05   1.95   1.10   2.90
2 case2      2.0      2.5   1.10   2.90   2.05   2.95
3 case3      2.0      4.0   2.00   2.00   4.00   4.00

Answer 1

另一个选项是

summarise_at

。

vars(starts_with("ab"))

用于选择列，

funs(...)

用于应用汇总功能。

library(tidyverse)

dat2 <- dat %>% 
  group_by(case) %>% 
  summarise_at(vars(starts_with("ab")), funs(mean = mean(.),
                                             lb = quantile(., prob = 0.05),
                                             ub = quantile(., prob = 0.95))) 
dat2
# # A tibble: 3 x 7
#    case abc_mean abe_mean abc_lb abe_lb abc_ub abe_ub
#   <chr>    <dbl>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
# 1 case1      1.5      2.0   1.05   1.10   1.95   2.90
# 2 case2      2.0      2.5   1.10   2.05   2.90   2.95
# 3 case3      2.0      4.0   2.00   4.00   2.00   4.00

更新

具有

across

功能的更新选项。

dat2 <- dat %>% 
  group_by(case) %>% 
  summarise(across(starts_with("ab"), .fns = list(
    mean = ~mean(.x),
    lb = ~quantile(.x, prob = 0.05),
    ub = ~quantile(.x, prob = 0.95))
  ))

dat2
# # A tibble: 3 × 7
#   case  abc_mean abe_mean abc_lb abe_lb abc_ub abe_ub
#   <chr>    <dbl>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
# 1 case1      1.5      2     1.05   1.1    1.95   2.9 
# 2 case2      2        2.5   1.1    2.05   2.9    2.95
# 3 case3      2        4     2      4      2      4

Answer 2

您非常接近，只需将

select

移到

summarise

之前即可。然后我们使用

summarise_all

，并在

funs

中指定适当的函数。

dat %>%
    group_by(case) %>%
    select(starts_with('ab')) %>%
    summarise_all(funs('mean' = mean, 'ub' = quantile(., .95), 'lb' = quantile(., .05)))

# # A tibble: 3 x 7
#    case abc_mean abe_mean abc_ub abe_ub abc_lb abe_lb
#   <chr>    <dbl>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
# 1 case1      1.5      2.0   1.95   2.90   1.05   1.10
# 2 case2      2.0      2.5   2.90   2.95   1.10   2.05
# 3 case3      2.0      4.0   2.00   4.00   2.00   4.00

我们使用

summarise_all

而不是

summarise

，因为我们希望对多个列执行相同的操作。使用

summarise_all

而不是使用

summarise

调用（在调用中我们分别指定每一列和每个操作）需要更少的输入。

(R, dplyr) 选择以相同字符串开头的多列，并按组汇总平均值 (90% CI)

问题描述投票：0回答：2

测试数据如下

下面的代码是我在概念上想做的事情，但显然不起作用

我想要得到的是像下面这样的东西

2个回答

更新

最新问题

(R, dplyr) 选择以相同字符串开头的多列，并按组汇总平均值 (90% CI)

问题描述 投票：0回答：2

测试数据如下

下面的代码是我在概念上想做的事情，但显然不起作用

我想要得到的是像下面这样的东西

2个回答

更新

最新问题

问题描述投票：0回答：2