我目前正在使用代码对 R 中的数据按组内的最小值到最大值进行排序。
如果多个物种具有相同的最小值,则为第二个值较低的物种(如果存在)分配较低的排名顺序。
library(tidyverse)
species <- data.frame(
species = c("dog", "dog", "cat", "cat", "fish", "fish", "lion"),
overall.percentage = c(12, 13, 20, 12, 20, 50, 12)
)
find_min <- function(x, rank) {
if (rank > length(x)) return(-Inf)
x[row_number(x) == rank]
}
rank <- species |>
summarise(
min_1 = find_min(overall.percentage, 1L),
min_2 = find_min(overall.percentage, 2L),
.by = species
) |>
mutate(rank = row_number(pick(min_1, min_2))) |>
select(species, rank)
species |>
left_join(rank, join_by(species))
#> species overall.percentage rank
#> <chr> <dbl> <int>
#> 1 dog 12 2
#> 2 dog 13 2
#> 3 cat 20 3
#> 4 cat 12 3
#> 5 fish 20 4
#> 6 fish 50 4
#> 7 lion 12 1
我想重构代码,以便当多个物种也共享第二/第三/N最小值时,具有较低N+1值的物种被分配较低的排名顺序。
当两个物种具有相同的最小值时,列数较少的物种将分配较低的排名。
因此输出如下:
species <- data.frame(
species = c("dog", "dog", "dog", "cat", "cat", "cat", "lion", "lion"),
overall.percentage = c(11, 12, 14, 11, 12, 13, 11, 12)
)
将是:
#> species overall.percentage rank
#> <chr> <dbl> <int>
#> 1 dog 11 3
#> 2 dog 12 3
#> 3 dog 14 3
#> 4 cat 11 2
#> 5 cat 12 2
#> 6 cat 13 2
#> 7 lion 11 1
#> 8 lion 12 1
使用基数 R,您可以通过排名和
reorder
并将因子顺序转换为整数:
species$rank <-
with(species,
species |>
reorder(X = overall.percentage, FUN = \(xs) sum(rank(xs))) |>
as.integer()
)
> species
species overall.percentage rank
1 dog 11 3
2 dog 12 3
3 dog 14 3
4 cat 11 2
5 cat 12 2
6 cat 13 2
7 lion 11 1
8 lion 12 1