使用 R dplyr mutate 创建多个列，使用 across 而不是循环？

Question

我正在尝试使用 R 的

dplyr

包为数据集中的每年创建多个新列，即与每年季度末数字（三月、六月、九月、十二月）对应的列的总和。我能够弄清楚如何“有效”地做到这一点的唯一方法是使用 for 循环。但有些事情告诉我，有一种替代的、更有效的或更好的方法来解决这个问题（也许我应该在这里使用地图函数，但我只是不确定？）。这是一个可以复制的玩具示例：

library(tidyverse)
library(glue)

# Create a toy example and print the resulting tibble
set.seed(100) # make results reproducible by setting seed
vars <- c("AgeGroup", paste0(month.abb[seq(3, 12, 3)], "_", rep(15:17, each = 4)))

(df <- cbind(LETTERS[1:5], matrix(rpois(n = (length(vars) - 1) * 5, 30), nrow = 5)) %>% 
    data.frame() %>%
    setNames(vars) %>% 
    tibble() %>% 
    mutate(across(-1, as.integer))
  )

将示例/可重现的数据集设置为：

# A tibble: 5 × 13
  AgeGroup Mar_15 Jun_15 Sep_15 Dec_15 Mar_16 Jun_16 Sep_16 Dec_16 Mar_17 Jun_17 Sep_17 Dec_17
  <chr>     <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>
1 A            27     26     33     36     34     25     27     37     37     32     37     30
2 B            21     32     24     31     25     39     32     20     30     32     25     26
3 C            34     28     30     23     25     29     35     26     19     30     28     29
4 D            30     32     29     34     31     29     35     37     28     34     31     50
5 E            31     33     27     31     23     26     29     28     28     26     19     37

所以我想做的是为每年（'15、'16 和 '17）创建一个新变量，称为

sum_15

、

sum_16

和

sum_17

，它们是所有月份值的总和来自以相应两位数年份结尾的变量（例如

ends_with("15")

、

ends_with("16")

、

ends_with("17")

）。

我已经能够使用以下代码实现所需的结果，但如果我可以明智地应用

across

语句或可能是

map

函数（或某些函数），我宁愿不使用循环你们可能会建议的其他方法）：

# This works, but I'd rather not use a for loop if I can avoid it:
for (i in 15:17) {
  df <- df %>% mutate("sum_{i}" := rowSums(across(ends_with(glue("_{i}")))))
}

#write out the df that displays what I am trying to achieve
df %>% select(AgeGroup, starts_with("sum"))

# A tibble: 5 × 4
  AgeGroup sum_15 sum_16 sum_17
  <chr>     <dbl>  <dbl>  <dbl>
1 A           122    123    136
2 B           108    116    113
3 C           115    115    106
4 D           125    132    143
5 E           122    106    110

我查看了 SO 上的其他示例，但我发现的所有示例都过于简单化，并且似乎通过在 mutate 语句中手动创建变量来一次仅创建一个变量 - 类似于：

df %>% mutate(sum15 = rowSums(across(ends_with("_15"))),
              sum16 = rowSums(across(ends_with("_16"))),
              sum17 = rowSums(across(ends_with("_17"))),
              )

这显然不是我想要的，因为这基本上是一种更手动的方式来完成我已经使用 for 循环所做的事情。

任何人都可以提供有关如何改进此代码并避免 for 循环的任何建议吗？

非常感谢！

Answer 1

一种可能性：

df %>%
  bind_cols(
    df %>%
      pivot_longer(-AgeGroup) %>%
      mutate(yr = paste0("sum", str_sub(name, start = 4))) %>%
      count(AgeGroup, yr, wt = value) %>% 
      pivot_wider(names_from = yr, values_from = n) %>%
      select(-AgeGroup)
  )


# A tibble: 5 × 16
  AgeGroup Mar_15 Jun_15 Sep_15 Dec_15 Mar_16 Jun_16 Sep_16 Dec_16 Mar_17 Jun_17 Sep_17 Dec_17 sum_15 sum_16 sum_17
  <chr>     <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>
1 A            27     26     33     36     34     25     27     37     37     32     37     30    122    123    136
2 B            21     32     24     31     25     39     32     20     30     32     25     26    108    116    113
3 C            34     28     30     23     25     29     35     26     19     30     28     29    115    115    106
4 D            30     32     29     34     31     29     35     37     28     34     31     50    125    132    143
5 E            31     33     27     31     23     26     29     28     28     26     19     37    122    106    110

使用 R dplyr mutate 创建多个列，使用 across 而不是循环？

问题描述投票：0回答：1

1个回答

最新问题

使用 R dplyr mutate 创建多个列，使用 across 而不是循环？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1