这是我的示例数据集
sample <- structure(list(Week = c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2),
Project = c("A", "A", "A", "A", "B", "B", "B", "B", "C", "C", "C", "A", "A", "A" ),
Status= c( "Active","Rescheduled","Active", "Cancelled", "Active", "Cancelled", "Cancelled",
"Rescheduled", "Active", "Active", "Rescheduled", "Cancelled", "Cancelled", "Active")),
.Names = c("Week","Project","Status"),
class = "data.frame" , row.names = c(NA, -14L))
> sample
Week Project Status
1 1 A Active
2 1 A Rescheduled
3 1 A Active
4 1 A Cancelled
5 1 B Active
6 1 B Cancelled
7 1 B Cancelled
8 1 B Rescheduled
9 2 C Active
10 2 C Active
11 2 C Rescheduled
12 2 A Cancelled
13 2 A Cancelled
14 2 A Active
我想创建一个数据框,按周计算每个程序的活跃率和 1-活跃率(=非活跃率),并使用“NA”值
所以桌子看起来像这样
Week A_active_rate A_Non_active_rate B_active_rate B_Non_active_rate C_active_rate C_Non_active_rate
1 1 0.50 0.50 0.25 0.75 NA NA
2 2 0.33 0.67 NA NA 0.67 0.33
这是一种方法:
library(tidyverse)
sample <- structure(list(Week = c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2),
Project = c("A", "A", "A", "A", "B", "B", "B", "B", "C", "C", "C", "A", "A", "A" ),
Status= c( "Active","Rescheduled","Active", "Cancelled", "Active", "Cancelled", "Cancelled",
"Rescheduled", "Active", "Active", "Rescheduled", "Cancelled", "Cancelled", "Active")),
.Names = c("Week","Project","Status"),
class = "data.frame" , row.names = c(NA, -14L))
sample %>%
mutate(active_or_not = ifelse(Status == "Active", "Active", "Non_Active")) %>%
mutate(count = n(), .by = c(Week, Project, active_or_not)) %>%
mutate(perc = count / n(), .by = c(Week, Project)) %>%
pivot_wider(names_from = c(Project, active_or_not),
id_cols = Week,
values_from = perc,
values_fn = mean,
values_fill = NA,
names_glue = "{.name}_rate")
#> # A tibble: 2 × 7
#> Week A_Active_rate A_Non_Active_rate B_Active_rate B_Non_Active_rate
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.5 0.5 0.25 0.75
#> 2 2 0.333 0.667 NA NA
#> # ℹ 2 more variables: C_Active_rate <dbl>, C_Non_Active_rate <dbl>
创建于 2024-04-11,使用 reprex v2.1.0