如何在 for 循环中对压缩在一起的值(按行)进行子集化,并隔离未压缩在一起的值

问题描述 投票:0回答:1

我有一个有序数据框,其中一列中有一个分类变量(特别是鸟类家族)。现在,由于它们以特定方式排序,每个物种占一行,因此一些鸟类家族将所有物种在数据框中挤在一起,而另一些则被其他家族的物种打断。我想做的是在 for 循环中将每个块(同一族)逐个子集化,以便我可以在进入下一个块之前进行一些进一步的处理。我不能使用 unique(),因为这否定了某些系列被中断的事实,将它们全部放在子集中。这是数据集的示例子集:

structure(list(jetzspp = c("Acanthisitta_chloris", "Xenicus_gilviventris", 
"Ampelioides_tschudii", "Pipreola_aureopectus", "Pipreola_chlorolepidota", 
"Xenopipo_holochlora", "Xenopipo_uniformis", "Tityra_cayana", 
"Tityra_inquisitor", "Tachuris_rubrigastra", "Conopias_parvus"
), iocorder = c("Passeriformes", "Passeriformes", "Passeriformes", 
"Passeriformes", "Passeriformes", "Passeriformes", "Passeriformes", 
"Passeriformes", "Passeriformes", "Passeriformes", "Passeriformes"
), bird_family = c("Acanthisittidae", "Acanthisittidae", "Cotingidae", 
"Cotingidae", "Cotingidae", "Pipridae", "Pipridae", "Cotingidae", 
"Cotingidae", "Tyrannidae", "Tyrannidae")), row.names = c(NA, 
-11L), class = c("tbl_df", "tbl", "data.frame"))

在这里,我想对 Acanthisittidae 进行子集化,然后对 Cotingidae 进行子集化(仅将三个挤在一起),对 Pipridae 进行子集化,然后对 Cotingidae 进行子集化(将接下来的两个挤在一起),依此类推。

此外,我想隔离数据集中出现中断的鸟类科的名称。

我还没有找到跟踪转换的功能。使每个中断的族块名称唯一,然后循环遍历它们是可能的(?),但对于 3k+ 长且有许多中断的数据库来说,它效率不高,因为我必须将它们改回来。

r dataframe dplyr subset
1个回答
0
投票

这可能对您的部分问题有帮助。

consecutive_id()
中的
dplyr
函数将为bird_family中的每个分组提供一个唯一的id。然后,您可以在循环中使用这些值:

library(dplyr)

df <- structure(list(jetzspp = c("Acanthisitta_chloris", "Xenicus_gilviventris", 
"Ampelioides_tschudii", "Pipreola_aureopectus", "Pipreola_chlorolepidota", 
"Xenopipo_holochlora", "Xenopipo_uniformis", "Tityra_cayana", 
"Tityra_inquisitor", "Tachuris_rubrigastra", "Conopias_parvus"
), iocorder = c("Passeriformes", "Passeriformes", "Passeriformes", 
"Passeriformes", "Passeriformes", "Passeriformes", "Passeriformes", 
"Passeriformes", "Passeriformes", "Passeriformes", "Passeriformes"
), bird_family = c("Acanthisittidae", "Acanthisittidae", "Cotingidae", 
"Cotingidae", "Cotingidae", "Pipridae", "Pipridae", "Cotingidae", 
"Cotingidae", "Tyrannidae", "Tyrannidae")), row.names = c(NA, 
-11L), class = c("tbl_df", "tbl", "data.frame"))

# Add unique id to each 'run' of values
df <- df %>%  
  mutate(id = consecutive_id(bird_family))

df
                   jetzspp      iocorder     bird_family      id
1     Acanthisitta_chloris Passeriformes Acanthisittidae       1
2     Xenicus_gilviventris Passeriformes Acanthisittidae       1
3     Ampelioides_tschudii Passeriformes      Cotingidae       2
4     Pipreola_aureopectus Passeriformes      Cotingidae       2
5  Pipreola_chlorolepidota Passeriformes      Cotingidae       2
6      Xenopipo_holochlora Passeriformes        Pipridae       3
7       Xenopipo_uniformis Passeriformes        Pipridae       3
8            Tityra_cayana Passeriformes      Cotingidae       4
9        Tityra_inquisitor Passeriformes      Cotingidae       4
10    Tachuris_rubrigastra Passeriformes      Tyrannidae       5
11         Conopias_parvus Passeriformes      Tyrannidae       5

# Loop through df
for(i in unique(df$id)) {
  
  print(subset(df, df$id == i))
  
}

               jetzspp      iocorder     bird_family id
1 Acanthisitta_chloris Passeriformes Acanthisittidae  1
2 Xenicus_gilviventris Passeriformes Acanthisittidae  1
                  jetzspp      iocorder bird_family id
3    Ampelioides_tschudii Passeriformes  Cotingidae  2
4    Pipreola_aureopectus Passeriformes  Cotingidae  2
5 Pipreola_chlorolepidota Passeriformes  Cotingidae  2
              jetzspp      iocorder bird_family id
6 Xenopipo_holochlora Passeriformes    Pipridae  3
7  Xenopipo_uniformis Passeriformes    Pipridae  3
            jetzspp      iocorder bird_family id
8     Tityra_cayana Passeriformes  Cotingidae  4
9 Tityra_inquisitor Passeriformes  Cotingidae  4
                jetzspp      iocorder bird_family id
10 Tachuris_rubrigastra Passeriformes  Tyrannidae  5
11      Conopias_parvus Passeriformes  Tyrannidae  5
© www.soinside.com 2019 - 2024. All rights reserved.