我有一个有序数据框,其中一列中有一个分类变量(特别是鸟类家族)。现在,由于它们以特定方式排序,每个物种占一行,因此一些鸟类家族将所有物种在数据框中挤在一起,而另一些则被其他家族的物种打断。我想做的是在 for 循环中将每个块(同一族)逐个子集化,以便我可以在进入下一个块之前进行一些进一步的处理。我不能使用 unique(),因为这否定了某些系列被中断的事实,将它们全部放在子集中。这是数据集的示例子集:
structure(list(jetzspp = c("Acanthisitta_chloris", "Xenicus_gilviventris",
"Ampelioides_tschudii", "Pipreola_aureopectus", "Pipreola_chlorolepidota",
"Xenopipo_holochlora", "Xenopipo_uniformis", "Tityra_cayana",
"Tityra_inquisitor", "Tachuris_rubrigastra", "Conopias_parvus"
), iocorder = c("Passeriformes", "Passeriformes", "Passeriformes",
"Passeriformes", "Passeriformes", "Passeriformes", "Passeriformes",
"Passeriformes", "Passeriformes", "Passeriformes", "Passeriformes"
), bird_family = c("Acanthisittidae", "Acanthisittidae", "Cotingidae",
"Cotingidae", "Cotingidae", "Pipridae", "Pipridae", "Cotingidae",
"Cotingidae", "Tyrannidae", "Tyrannidae")), row.names = c(NA,
-11L), class = c("tbl_df", "tbl", "data.frame"))
在这里,我想对 Acanthisittidae 进行子集化,然后对 Cotingidae 进行子集化(仅将三个挤在一起),对 Pipridae 进行子集化,然后对 Cotingidae 进行子集化(将接下来的两个挤在一起),依此类推。
此外,我想隔离数据集中出现中断的鸟类科的名称。
我还没有找到跟踪转换的功能。使每个中断的族块名称唯一,然后循环遍历它们是可能的(?),但对于 3k+ 长且有许多中断的数据库来说,它效率不高,因为我必须将它们改回来。
这可能对您的部分问题有帮助。
consecutive_id()
中的dplyr
函数将为bird_family中的每个分组提供一个唯一的id。然后,您可以在循环中使用这些值:
library(dplyr)
df <- structure(list(jetzspp = c("Acanthisitta_chloris", "Xenicus_gilviventris",
"Ampelioides_tschudii", "Pipreola_aureopectus", "Pipreola_chlorolepidota",
"Xenopipo_holochlora", "Xenopipo_uniformis", "Tityra_cayana",
"Tityra_inquisitor", "Tachuris_rubrigastra", "Conopias_parvus"
), iocorder = c("Passeriformes", "Passeriformes", "Passeriformes",
"Passeriformes", "Passeriformes", "Passeriformes", "Passeriformes",
"Passeriformes", "Passeriformes", "Passeriformes", "Passeriformes"
), bird_family = c("Acanthisittidae", "Acanthisittidae", "Cotingidae",
"Cotingidae", "Cotingidae", "Pipridae", "Pipridae", "Cotingidae",
"Cotingidae", "Tyrannidae", "Tyrannidae")), row.names = c(NA,
-11L), class = c("tbl_df", "tbl", "data.frame"))
# Add unique id to each 'run' of values
df <- df %>%
mutate(id = consecutive_id(bird_family))
df
jetzspp iocorder bird_family id
1 Acanthisitta_chloris Passeriformes Acanthisittidae 1
2 Xenicus_gilviventris Passeriformes Acanthisittidae 1
3 Ampelioides_tschudii Passeriformes Cotingidae 2
4 Pipreola_aureopectus Passeriformes Cotingidae 2
5 Pipreola_chlorolepidota Passeriformes Cotingidae 2
6 Xenopipo_holochlora Passeriformes Pipridae 3
7 Xenopipo_uniformis Passeriformes Pipridae 3
8 Tityra_cayana Passeriformes Cotingidae 4
9 Tityra_inquisitor Passeriformes Cotingidae 4
10 Tachuris_rubrigastra Passeriformes Tyrannidae 5
11 Conopias_parvus Passeriformes Tyrannidae 5
# Loop through df
for(i in unique(df$id)) {
print(subset(df, df$id == i))
}
jetzspp iocorder bird_family id
1 Acanthisitta_chloris Passeriformes Acanthisittidae 1
2 Xenicus_gilviventris Passeriformes Acanthisittidae 1
jetzspp iocorder bird_family id
3 Ampelioides_tschudii Passeriformes Cotingidae 2
4 Pipreola_aureopectus Passeriformes Cotingidae 2
5 Pipreola_chlorolepidota Passeriformes Cotingidae 2
jetzspp iocorder bird_family id
6 Xenopipo_holochlora Passeriformes Pipridae 3
7 Xenopipo_uniformis Passeriformes Pipridae 3
jetzspp iocorder bird_family id
8 Tityra_cayana Passeriformes Cotingidae 4
9 Tityra_inquisitor Passeriformes Cotingidae 4
jetzspp iocorder bird_family id
10 Tachuris_rubrigastra Passeriformes Tyrannidae 5
11 Conopias_parvus Passeriformes Tyrannidae 5