R 创建一个数据透视表来计算多个组的百分比

问题描述 投票:0回答:1

这是我的示例数据集

sample <- structure(list(Week  = c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2),
                         Project = c("A", "A", "A",  "A",  "B",  "B",  "B",  "B",  "C",  "C",  "C",  "A", "A",   "A" ), 
                    Status= c( "Active","Rescheduled","Active", "Cancelled", "Active", "Cancelled",  "Cancelled", 
                               "Rescheduled",  "Active",  "Active",   "Rescheduled", "Cancelled", "Cancelled", "Active")),
                    .Names = c("Week","Project","Status"),
                      class = "data.frame" , row.names = c(NA, -14L))
> sample
   Week Project      Status
1     1       A      Active
2     1       A Rescheduled
3     1       A      Active
4     1       A   Cancelled
5     1       B      Active
6     1       B   Cancelled
7     1       B   Cancelled
8     1       B Rescheduled
9     2       C      Active
10    2       C      Active
11    2       C Rescheduled
12    2       A   Cancelled
13    2       A   Cancelled
14    2       A      Active

我想创建一个数据框,按周计算每个程序的活跃率和 1-活跃率(=非活跃率),并使用“NA”值

所以桌子看起来像这样

  Week A_active_rate A_Non_active_rate B_active_rate B_Non_active_rate C_active_rate C_Non_active_rate
1    1          0.50              0.50          0.25              0.75            NA                NA
2    2          0.33              0.67            NA                NA          0.67              0.33
r group-by pivot
1个回答
0
投票

这是一种方法:

library(tidyverse)

sample <- structure(list(Week  = c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2),
                         Project = c("A", "A", "A",  "A",  "B",  "B",  "B",  "B",  "C",  "C",  "C",  "A", "A",   "A" ), 
                         Status= c( "Active","Rescheduled","Active", "Cancelled", "Active", "Cancelled",  "Cancelled", 
                                    "Rescheduled",  "Active",  "Active",   "Rescheduled", "Cancelled", "Cancelled", "Active")),
                    .Names = c("Week","Project","Status"),
                    class = "data.frame" , row.names = c(NA, -14L))

sample %>%
  mutate(active_or_not = ifelse(Status == "Active", "Active", "Non_Active")) %>%
  mutate(count = n(), .by = c(Week, Project, active_or_not)) %>%
  mutate(perc = count / n(), .by = c(Week, Project)) %>%
  pivot_wider(names_from = c(Project, active_or_not),
              id_cols = Week,
              values_from = perc,
              values_fn = mean,
              values_fill = NA,
              names_glue = "{.name}_rate")
#> # A tibble: 2 × 7
#>    Week A_Active_rate A_Non_Active_rate B_Active_rate B_Non_Active_rate
#>   <dbl>         <dbl>             <dbl>         <dbl>             <dbl>
#> 1     1         0.5               0.5            0.25              0.75
#> 2     2         0.333             0.667         NA                NA   
#> # ℹ 2 more variables: C_Active_rate <dbl>, C_Non_Active_rate <dbl>

创建于 2024-04-11,使用 reprex v2.1.0

© www.soinside.com 2019 - 2024. All rights reserved.