我试图根据其他 2 列中满足的几个条件在 ID 列中创建增量计数器,然后“重置”满足这些条件以确定 ID 的下一个增量。这是时间序列数据,因此顺序很重要(我没有包含时间戳列)。
我将提供一个玩具数据集。我有 3 列:位置、活动和 ID。目前,我的 ID 列为空,但我已在此处填充了值来说明我的条件。我想从1开始初始化ID,然后我想检查D是否发生。这是我的第一个条件。然后,我需要检查 A 是否出现在 D 之后,在这种情况下,A 也应该位于位置 2。一旦满足此条件以及 D 条件,我想在下一行中将 ID 加 1。然后在下一行中,我想“重置”已发生的条件,然后再次逐行检查 D 是否发生,然后在 D 之后发生的位置 2 处的第一个 A 实例处,我想将下一行增加 1 .这会重复到数据集的最后。
df <- data.frame(
Location = c(2, 3, 3, 2, 1, 2, 2, 2, 1, 3, 3, 1, 2, 3, 2, 2, 1, 2, 3, 2, 1),
Activity = c("A", "B", "C", "D", "D", "B", "A", "A", "B", "A", "C", "D", "A", "B", "B", "D", "A", "D", "D", "A", "C"),
ID = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4)
)
# Print the dataframe to view its structure
print(df)
Location Activity ID
1 2 A 1
2 3 B 1
3 3 C 1
4 2 D 1
5 1 D 1
6 2 B 1
7 2 A 1
8 2 A 2
9 1 B 2
10 3 A 2
11 3 C 2
12 1 D 2
13 2 A 2
14 3 B 3
15 2 B 3
16 2 D 3
17 1 A 3
18 2 D 3
19 3 D 3
20 2 A 3
21 1 C 4
...
我已经尝试了某种条件逻辑的多次迭代,但它似乎失败了。我的最佳尝试如下,但它与我对 ID 列的期望不符。
# Function to increment ID based on conditions
increment_id_based_on_conditions <- function(df) {
df$ID[1] <- 1 # Initialize the first ID
# Initialize control variables
waiting_for_a <- FALSE
last_id <- 1
for (i in 1:nrow(df)) {
if (waiting_for_a && df$Activity[i] == "A" && df$Location[i] == 2) {
last_id <- last_id + 1 # Increment ID after conditions are met
waiting_for_a <- FALSE # Reset condition
} else if (df$Activity[i] == "D") {
waiting_for_a <- TRUE # Set condition to start waiting for "A" at Location 2
}
df$ID[i] <- last_id # Update ID column
}
df$ID <- c(df$ID[-1], NA) # Shift ID down by one row and make last ID NA
return(df)
}
# Apply the function to dataset
df_with_ids <- increment_id_based_on_conditions(df)
# View the updated dataset
print(df_with_ids)
Location Activity ID
1 2 A 1
2 3 B 1
3 3 C 1
4 2 D 1
5 1 D 1
6 2 B 2
7 2 A 2
8 2 A 2
9 1 B 2
10 3 A 2
11 3 C 2
12 1 D 3
13 2 A 3
14 3 B 3
15 2 B 3
16 2 D 3
17 1 A 3
18 2 D 3
19 3 D 4
20 2 A 4
21 1 C NA
此解决方案为“D”创建组,并为每个组确定第一个“2A”位置。有了这些信息,就可以创建一个唯一的 ID。看:
df <- mutate(df, id = row_number())
aux <- df %>%
mutate(d_group = cumsum(if_else(activity == "D", 1, 0))) %>%
distinct(d_group, location, activity, .keep_all = TRUE) %>%
filter(location == 2, activity == "A", d_group > 0) %>%
pull(id)
df <- mutate(df, id = cumsum(if_else(dplyr::lag(id) %in% aux, 1, 0)) + 1)
rm(aux)
# ---------
> df
location activity id
1 2 A 1
2 3 B 1
3 3 C 1
4 2 D 1
5 1 D 1
6 2 B 1
7 2 A 1
8 2 A 2
9 1 B 2
10 3 A 2
11 3 C 2
12 1 D 2
13 2 A 2
14 3 B 3
15 2 B 3
16 2 D 3
17 1 A 3
18 2 D 3
19 3 D 3
20 2 A 3
21 1 C 4