我有一个如下所示的数据框:
group1 <- c('A','A','A','A',
'B','B','B','B',
'C','C','C','C')
group2 <- c(1, 2, 3, 4,
1, 2, 3, 4,
1, 2, 3, 4)
indicator <- c(1, 1, 1, 1,
1, 1, NA, 1,
NA, 1, 1, NA)
df <- data.frame(group1, group2, indicator)
我想创建一个新列,通过以下逻辑评估 group1 值内每个 group2 值的指标值:
(1) First value of 1 = "New"
(2) First value of 1 after NA = "Return"
(3) All other values of 1 = "Normal"
(4) All NAs = "None"
生成的数据框如下所示:
group1 group2 indicator category
A 1 1 New
A 2 1 Normal
A 3 1 Normal
A 4 1 Normal
B 1 1 New
B 2 1 Normal
B 3 NA None
B 4 1 Return
C 1 NA None
C 2 1 Return
C 3 1 Normal
C 4 NA None
对这个专栏部分的评估让我很困惑。我应该如何生成所需的数据框?
这是使用一些辅助列的一种方法。我将它们留在里面,以便您可以了解发生了什么,但您当然可以在计算完成后将它们删除:
library(dplyr)
mutate(df,
na_counter = cumsum(is.na(indicator)),
one_counter = cumsum(coalesce(indicator, 0) == 1),
one_after_first_na_counter = cumsum(na_counter > 0 & coalesce(indicator, 0) == 1),
category = case_when(
is.na(indicator) ~ "None",
one_after_first_na_counter == 1 ~ "Return",
one_counter == min(one_counter[one_after_first_na_counter != 1]) ~ "New",
.default = "Normal"
),
.by = group1
)
# group1 group2 indicator na_counter one_counter one_after_first_na_counter category
# 1 A 1 1 0 1 0 New
# 2 A 2 1 0 2 0 Normal
# 3 A 3 1 0 3 0 Normal
# 4 A 4 1 0 4 0 Normal
# 5 B 1 1 0 1 0 New
# 6 B 2 1 0 2 0 Normal
# 7 B 3 NA 1 2 0 None
# 8 B 4 1 1 3 1 Return
# 9 C 1 NA 1 0 0 None
# 10 C 2 1 1 1 1 Return
# 11 C 3 1 1 2 2 Normal
# 12 C 4 NA 2 2 2 None