我有一个尺寸为 1042x64 的 tibble_data.frame。列是两栖动物科,行是该科中所有物种的名称。前 5 行 2 列如下所示:
> amphilist[1:5,1:2]
A tibble: 5 × 2
`Allophrynidae_(1_genus;_3_species)` `Alsodidae_(3_genera;_26_species)`
<chr> <chr>
1 Allophryne_relicta Alsodes_australis
2 Allophryne_resplendens Alsodes_barrioi
3 Allophryne_ruthveni Alsodes_cantillanensis
4 NA Alsodes_coppingeri
5 NA Alsodes_gargola
这些科中有不同数量的物种,最大的包含 1,042 个物种,最小的仅包含 1 个物种。除了唯一的科有 1,042 个物种外,所有列都填充了 NA,完成了 1,042 行。我需要从每个科中随机排序一定数量的物种,以进行下一步的分析,但是我不断获得所有列的 NA,甚至是其中没有 NA 的列。这是我到目前为止所做的:
我创建了一个循环来获取按科分类的物种丰富度 (spcR) 并将其保存在 df“species_no”中。然后使用“ifelse”子句输入我需要的物种数量并将其保存到 df #
amphilist <- read_xlsx("amphilist.xlsx", col_names = TRUE)
families <- colnames(amphilist)
family_n <- ncol(amphilist)
spcR <- vector(length = family_n)
for(i in 1:length(families)) {
families.i <- families[i]
spcR[i] <- colSums(amphilist[,families.i] > 0, na.rm = TRUE)
}
species_no <- data.frame(families, spcR)
species_no$choose <- ifelse(species_no$spcR > 50, ceiling(species_no$spcR/10),
ifelse(species_no$spcR >= 5 & species_no$spcR <= 50,
5, species_no$spcR))
> species_no[1:3,]
families spcR choose
1 Allophrynidae_(1_genus;_3_species) 3 3
2 Alsodidae_(3_genera;_26_species) 26 5
3 Alytidae_(2_subfamilies;_3_genera;_12_species) 12 5
从这里开始,我陷入困境并收到错误 NA。我创建了一个包含所需元素数量的向量,但是我无法做出随机选择。我想从每列中获取由 Choose_no 向量定义的物种数量,不考虑 NA #
choose_no <- species_no$choose
set.seed(43)
for(i in 1:length(families)) {
families.i <- families[i]
choose_no.i <- choose_no[i]
rand_amphilist <- amphilist[sample(amphilist[,i],
size = choose_no.i), ]
}
有人可以帮助我吗?非常感谢!
# SETUP
# load lib
library(tidyverse)
# example data
amphilist <- tribble(
~"Allophrynidae_(1_genus;_3_species)", ~"Alsodidae_(3_genera;_26_species)"
, "Allophryne_relicta" , "Alsodes_australis"
, "Allophryne_resplendens" , "Alsodes_barrioi"
, "Allophryne_ruthveni" , "Alsodes_cantillanensis"
, NA , "Alsodes_coppingeri"
, NA , "Alsodes_gargola" )
# make it long; an 8 row frame( not 5x2 = 10)
amphilist_long <- amphilist |> pivot_longer(cols=everything(),
names_to = "category",
values_to = "entry") |> filter(!is.na(entry))
# random sample 2 entry from each cat
set.seed(42) # for reproducibility
# the main event
(samp_list <- slice_sample(amphilist_long,
n=2,
by = category))