从不同长度的列中随机排序特定数量的元素

问题描述 投票:0回答:1

我有一个尺寸为 1042x64 的 tibble_data.frame。列是两栖动物科,行是该科中所有物种的名称。前 5 行 2 列如下所示:

> amphilist[1:5,1:2]
A tibble: 5 × 2
  `Allophrynidae_(1_genus;_3_species)` `Alsodidae_(3_genera;_26_species)`
  <chr>                                <chr>                             
1 Allophryne_relicta                   Alsodes_australis                 
2 Allophryne_resplendens               Alsodes_barrioi                   
3 Allophryne_ruthveni                  Alsodes_cantillanensis            
4 NA                                   Alsodes_coppingeri                
5 NA                                   Alsodes_gargola 

这些科中有不同数量的物种,最大的包含 1,042 个物种,最小的仅包含 1 个物种。除了唯一的科有 1,042 个物种外,所有列都填充了 NA,完成了 1,042 行。我需要从每个科中随机排序一定数量的物种,以进行下一步的分析,但是我不断获得所有列的 NA,甚至是其中没有 NA 的列。这是我到目前为止所做的:

我创建了一个循环来获取按科分类的物种丰富度 (spcR) 并将其保存在 df“species_no”中。然后使用“ifelse”子句输入我需要的物种数量并将其保存到 df #

amphilist <- read_xlsx("amphilist.xlsx", col_names = TRUE)

families <- colnames(amphilist)
family_n <- ncol(amphilist)
spcR <- vector(length = family_n)

for(i in 1:length(families)) {
  families.i <- families[i]
  spcR[i] <- colSums(amphilist[,families.i] > 0, na.rm = TRUE)
}

species_no <- data.frame(families, spcR)
species_no$choose <- ifelse(species_no$spcR > 50, ceiling(species_no$spcR/10), 
                            ifelse(species_no$spcR >= 5 & species_no$spcR <= 50,
                                   5, species_no$spcR))

> species_no[1:3,]
                                        families spcR choose
1             Allophrynidae_(1_genus;_3_species)    3      3
2               Alsodidae_(3_genera;_26_species)   26      5
3 Alytidae_(2_subfamilies;_3_genera;_12_species)   12      5

从这里开始,我陷入困境并收到错误 NA。我创建了一个包含所需元素数量的向量,但是我无法做出随机选择。我想从每列中获取由 Choose_no 向量定义的物种数量,不考虑 NA #

choose_no <- species_no$choose
set.seed(43)
for(i in 1:length(families)) {
  families.i <- families[i]
  choose_no.i <- choose_no[i]
  rand_amphilist <- amphilist[sample(amphilist[,i], 
                                     size = choose_no.i), ]
}

有人可以帮助我吗?非常感谢!

r sorting random sample
1个回答
0
投票
# SETUP 
# load lib
library(tidyverse)

# example data
amphilist <- tribble(
~"Allophrynidae_(1_genus;_3_species)", ~"Alsodidae_(3_genera;_26_species)"
, "Allophryne_relicta"                ,  "Alsodes_australis"                
, "Allophryne_resplendens"            ,  "Alsodes_barrioi"                  
, "Allophryne_ruthveni"               ,  "Alsodes_cantillanensis"          
, NA                                ,  "Alsodes_coppingeri"               
, NA                                ,  "Alsodes_gargola" )

# make it long; an 8 row frame( not 5x2 = 10)
amphilist_long <- amphilist |> pivot_longer(cols=everything(),
                          names_to = "category",
                          values_to = "entry") |> filter(!is.na(entry))


# random sample 2 entry from each cat
set.seed(42) # for reproducibility

# the main event
(samp_list <- slice_sample(amphilist_long,
                           n=2,
                           by = category))
© www.soinside.com 2019 - 2024. All rights reserved.