我无法找到在以下问题中使用的正确代码:
我有一个数据集(df),其中每个 ROV 横断面有部分(行)。我想根据部分以及变量 Year 和 ROV 创建一个新的 ID 列(即新的分组变量)。我希望它包含前 5 个部分(行),然后是接下来的 5 个部分,依此类推 - 直到基材或 ROV 发生变化。所以并不是每个新 ID 都会由 5 个元素组成。
dataframe <- data.frame(
Year = c("2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2023","2023","2023"),
ROV=c("2","2","2","2","2","2","2","2","4","4","4","4","4","4","4","4"),
Section=c("3","4","5","6","7","8","9","10","1","2","3","4","5","6","7","8"),
Substrate=c("Mud","Mud","Mud","Mud","Mud","Mud","Mud","Mud","Bedrock","Bedrock","Bedrock","Bedrock","Dead Lophelia", "Dead Lophelia", "Bedrock","Bedrock"))
result <- data.frame(Year = c("2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2021","2023","2023","2023"),
ROV=c("2","2","2","2","2","2","2","2","4","4","4","4","4","4","4","4"),
Section = c("3","4","5","6","7","8","9","10","1","2","3","4","5","6","7","8"),
Substrate= c("Mud","Mud","Mud","Mud","Mud","Mud","Mud","Mud","Bedrock","Bedrock","Bedrock","Bedrock","Dead Lophelia", "Dead Lophelia", "Bedrock","Bedrock"),
ID=c("1","1","1","1","1","2","2","2","3","3","3","3","4","6","7","7" ))
此代码每 5 行(=部分)创建一个 ID,但不处理 ROV 或 Substrate 中的更改。我已经尝试过 slice() 但没有得到所需的输出。有没有 dplyr 解决方案?
dataframe$ID <- 1 + seq(0, nrow(dataframe) - 1) %/% 5
这是一种使用
dplyr
的方法
library(dplyr)
dataframe %>%
mutate(subgroup = rep(LETTERS, each = 5, length.out = n()),
.by = c(ROV, Substrate, Year)) %>%
mutate(ID = cur_group_id(),
.by = c(ROV, Substrate, Year, subgroup)) %>%
select(-subgroup)
#> Year ROV Section Substrate ID
#> 1 2021 2 3 Mud 1
#> 2 2021 2 4 Mud 1
#> 3 2021 2 5 Mud 1
#> 4 2021 2 6 Mud 1
#> 5 2021 2 7 Mud 1
#> 6 2021 2 8 Mud 2
#> 7 2021 2 9 Mud 2
#> 8 2021 2 10 Mud 2
#> 9 2021 4 1 Bedrock 3
#> 10 2021 4 2 Bedrock 3
#> 11 2021 4 3 Bedrock 3
#> 12 2021 4 4 Bedrock 3
#> 13 2021 4 5 Dead Lophelia 4
#> 14 2023 4 6 Dead Lophelia 5
#> 15 2023 4 7 Bedrock 6
#> 16 2023 4 8 Bedrock 6
创建于 2024-01-09,使用 reprex v2.0.2