我有一个包含 2 列(ID 和年份)的数据框。我想创建一个名为“FLAG”的第三列,它的输出基于以下条件(全部按 ID 分组):
我举了一个例子来说明我希望我的数据框最终是什么样子。
data <- data.frame("ID" = c("A", "A", "A", "B", "B", "B", "C", "C", "C", "C", "D", "D"),
Year" = c(2019, 2021, 2022, 2019, 2020, 2021, 2019, 2020, 2021, 2022, 2018, 2019), "Flag"
= c("Gap", "Gap", "Gap", "Ap21", "Ap21", "Ap21", "Ap22", "Ap22", "Ap22", "Ap22",
"notupdated", "notupdated"))
dplyr
解决方案:
library(dplyr)
data |>
mutate(
Flag = case_when(
all(c(2020, 2021, 2022) %in% Year) ~ "Ap22",
all(c(2020, 2021) %in% Year) ~ "Ap21",
any(c(2020, 2021) %in% Year) ~ "Gap",
.default = "notupdated"
),
.by = ID
)
输出:
ID Year Flag
1 A 2019 Gap
2 A 2021 Gap
3 A 2022 Gap
4 B 2019 Ap21
5 B 2020 Ap21
6 B 2021 Ap21
7 C 2019 Ap22
8 C 2020 Ap22
9 C 2021 Ap22
10 C 2022 Ap22
11 D 2018 notupdated
12 D 2019 notupdated
如果规则应该独立工作,这里有一个替代方案。
特别是规则“存在于 2020 年或 2021 年(但不是两者),那么输出“Gap””需要的不仅仅是
any(c(2020, 2021) %in% Year)
示例
library(dplyr)
data %>%
mutate(one = Year == 2020, two = Year == 2021, three = Year == 2022,
Flag = case_when(
(!any(one) | !any(two)) & (any(one) | any(two)) ~ "Gap"),
.by = ID) %>%
select(-c(one, two, three))
ID Year Flag
1 A 2019 Gap
2 A 2021 Gap
3 A 2022 Gap
4 B 2019 <NA>
5 B 2020 <NA>
6 B 2021 <NA>
7 C 2019 <NA>
8 C 2020 <NA>
9 C 2021 <NA>
10 C 2022 <NA>
11 D 2018 <NA>
12 D 2019 <NA>
适用于所有规则
library(dplyr)
data %>%
mutate(one = Year == 2020, two = Year == 2021, three = Year == 2022,
Flag = case_when(
(!any(one) | !any(two)) & (any(one) | any(two)) ~ "Gap",
any(one) & any(two) & !any(three) ~ "Ap21",
any(one) & any(two) & any(three) ~ "Ap22",
!any(one) & !any(two) & !any(three) ~ "notupdated"), .by = ID) %>%
select(-c(one, two, three))
ID Year Flag
1 A 2019 Gap
2 A 2021 Gap
3 A 2022 Gap
4 B 2019 Ap21
5 B 2020 Ap21
6 B 2021 Ap21
7 C 2019 Ap22
8 C 2020 Ap22
9 C 2021 Ap22
10 C 2022 Ap22
11 D 2018 notupdated
12 D 2019 notupdated
我认为@jan的回答很棒,这只是对这个问题的推理 而且更复杂。
library(cgwtools) # provides an rle for sequences, `seqle`, very handy...
length(cgwtools::seqle(data$Year[which(data$ID == 'A')])$lengths)
[1] 2
# any gap is a gap
(cgwtools::seqle(data$Year[which(data$ID == 'B')])$values)+(cgwtools::seqle(data$Year[which(data$ID == 'B')])$lengths) -1
[1] 2021
(cgwtools::seqle(data$Year[which(data$ID == 'C')])$values)+(cgwtools::seqle(data$Year[which(data$ID == 'C')])$lengths) -1
[1] 2022
(cgwtools::seqle(data$Year[which(data$ID == 'D')])$values)+(cgwtools::seqle(data$Year[which(data$ID == 'D')])$lengths) -1
[1] 2019
# and use for Flag logic
不是
base
,但是seqle
确实可以派上用场。