在 dplyr 中同时使用两个单独的键旋转更宽

问题描述 投票:0回答:1

我需要使用 dplyr 基于两组键将整洁的数据集转换为更广泛的格式。我不太熟悉旋转的术语,所以如果“键”不是正确的术语,请原谅。这里有一些玩具数据来说明。这些数据来自两个虚构的参与者:每四天采取一次三种不同的措施。对于每项指标,我们都有四天内的总分,并在“总分”列中表示。对于每项措施,该值在四天内将保持不变。

library(dplyr)

df <- data.frame(id = rep(c("DFE3",
                            "DFE76"),
                          each = 12),
                 measure = rep(letters[1:3],
                               each = 4,
                               length.out = 24),
                 day = rep(1:4,
                           times = 3,
                           length.out = 24),
                 score = sample(0:5,
                                24,
                                replace = T)) %>%
        arrange(id,measure,day) %>%
          group_by(id, measure) %>%
            mutate(tot = sum(score)) %>%
              ungroup
  
         

df

# # A tibble: 24 x 5
#      id  measure   day score   tot
#   <fct>  <fct>   <int> <int> <int>
#  1 DFE3  a           1     5    12
#  2 DFE3  a           2     2    12
#  3 DFE3  a           3     5    12
#  4 DFE3  a           4     0    12
#  5 DFE3  b           1     1     9
#  6 DFE3  b           2     2     9
#  7 DFE3  b           3     5     9
#  8 DFE3  b           4     1     9
#  9 DFE3  c           1     0    15
# 10 DFE3  c           2     5    15
# # i 14 more rows
# # i Use `print(n = ...)` to see more rows

现在我想做的是旋转,以便我为每个

measure
获得一列,为
week
列获得
score
,并为 tot 列获得一列
仅针对每个度量

当我运行这段代码时...

df %>%
  pivot_wider(names_from = c(measure,
                             day),
              values_from = c(score, tot)) 

# A tibble: 2 x 25
# id      score_a_1 score_a_2 score_a_3 score_a_4 score_b_1 score_b_2 score_b_3 score_b_4 score_c_1 score_c_2 score_c_3 score_c_4 tot_a_1
# <fct>       <int>     <int>     <int>     <int>     <int>     <int>     <int>     <int>     <int>     <int>     <int>     <int>   <int>
# 1 DFE3          2         1         3         3         4         4         5         0         2         0         3         5       9
# 2 DFE76         1         4         4         2         1         2         2         4         2         3         2         5      11
# # i 11 more variables: tot_a_2 <int>, tot_a_3 <int>, tot_a_4 <int>, tot_b_1 <int>, tot_b_2 <int>, tot_b_3 <int>, tot_b_4 <int>,
# #   tot_c_1 <int>, tot_c_2 <int>, tot_c_3 <int>, tot_c_4 <int>

...它给了我想要的分数变量 - 分布在

measure
week
- 但它对
tot
列做了同样的事情,这不是我想要的(应该只有三个每个 id 列,每个度量一列。

有什么方法可以使用

pivot_wider
同时执行这些过程吗?

r dplyr pivot-table
1个回答
0
投票
library(dplyr); library(tidyr)
df %>%
  select(-tot) %>%
  pivot_wider(names_from = c(measure, day), values_from = score) %>%
  left_join(df %>%
              distinct(id, measure, tot) %>%
              pivot_wider(names_from = measure, values_from = tot))

结果

Joining with `by = join_by(id)`
# A tibble: 2 × 16
  id      a_1   a_2   a_3   a_4   b_1   b_2   b_3   b_4   c_1   c_2   c_3   c_4     a     b     c
  <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 DFE3      0     4     0     4     4     0     3     2     5     2     5     1     8     9    13
2 DFE76     4     5     5     5     2     2     3     5     4     4     2     2    19    12    12
© www.soinside.com 2019 - 2024. All rights reserved.