在 R 中,我想根据人们的排名偏好以及他们的表现将项目分配给他们。假设我有 5 个项目和 3 个人。在这种情况下,三个人都想要项目 A,因为这是他们的首选,但 Anna 应该得到它,因为她的绩效得分最高。现在她不在等式中,James 和 Billy 都在争夺 Billy 应该得到的项目 B,因为他有更好的绩效衡量标准。我怎么能在 R 中做到这一点?我会在现实中有更多的项目和人。
Project Rank Person Performance
A 1 Billy 95
B 2 Billy 95
C 3 Billy 95
D 4 Billy 95
E 5 Billy 95
A 1 Anna 97
B 2 Anna 97
C 3 Anna 97
D 5 Anna 97
E 4 Anna 97
A 1 James 92
B 2 James 92
C 4 James 92
D 3 James 92
E 5 James 92
这是
purrr::reduce
的迭代解决方案。这种方法还可以显示每个人选择的项目的等级,例如詹姆斯选择了项目D
,这是他排名第三的选项。
library(dplyr)
library(purrr)
df %>%
arrange(desc(Performance), Rank) %>%
split(.$Person) %>% # group_split(Person)
reduce(~ bind_rows(.x, head(anti_join(.y, .x, by = 'Project'), n = 1)),
.init = tibble(Project = character(0)))
# # A tibble: 3 × 4
# Project Rank Person Performance
# <chr> <int> <chr> <int>
# 1 A 1 Anna 97
# 2 B 2 Billy 95
# 3 D 3 James 92
按
Performance
和Rank
排列数据,使Performance
高的人优先选择项目。
按
Person
.将数据框拆分为列表
从高
Performance
到低迭代:
Anna
,她有最高优先权选择她的第一个项目,A
.Billy
,他的排名第一的项目A
已被Anna
选中,因此从他的项目列表中删除该选项。 Billy
原来选择了他排名第二的项目,B
.James
的回合中,他的前两个排名项目A
和B
已被其他人选择,因此从他的项目列表中删除这些选项。 James
不能不选择他的排名第三的项目,D
.base 中的一种方式可能看起来像:
Reduce
的最高绩效的人开始。#Get Persons ordered by Performance
P <- which(!duplicated(DF$Person))
P <- DF$Person[P[order(DF$Performance[P], decreasing = TRUE)]]
#Split by person and order by Rank
. <- lapply(split(DF[c("Project", "Rank")], DF$Person), \(x) x[[1]][x[[2]]])
#Get highest ranked (remaining) project per Person
setNames(Reduce(\(x, y) {c(x, y[!y %in% x][1])}, .[P[-1]], .[[P[1]]][1]), P)
# Anna Billy James
# "A" "B" "D"
资料:
DF <- read.table(header=TRUE, text="Project Rank Person Performance
A 1 Billy 95
B 2 Billy 95
C 3 Billy 95
D 4 Billy 95
E 5 Billy 95
A 1 Anna 97
B 2 Anna 97
C 3 Anna 97
D 5 Anna 97
E 4 Anna 97
A 1 James 92
B 2 James 92
C 4 James 92
D 3 James 92
E 5 James 92")
基准
library(dplyr)
library(purrr)
library(data.table)
bench::mark(check=FALSE,
Maël = {l <- DF %>%
mutate(perf_rank = dense_rank(-Performance)) %>%
group_split(perf_rank, Person)
choice = setNames(character(length(l)), unique(DF$Person[order(-DF$Performance)]))
for(i in seq_along(l)){
tmp <- l[[i]]
choice[i] <- tmp$Project[which.min(tmp$Rank)]
l <- lapply(l, \(x) subset(x, x$Project != choice[i]))
}
choice},
"Darren Tsai" = {DF %>%
arrange(desc(Performance), Rank) %>%
split(.$Person) %>% # group_split(Person)
reduce(~ bind_rows(.x, head(anti_join(.y, .x, by = 'Project'), n = 1)),
.init = tibble(Project = character(0)))},
IceCreamToucan = {df <- as.data.table(DF)
setkey(df, Person, Rank)
person <- df[order(-Performance), unique(Person)]
project <- Reduce(
\(project, person) c(project, df[person, setdiff(Project, project)[1]]),
person, init = character())
df[data.table(Person = person, Project = project), on = .(Person, Project)]},
"Nobody/alexis" = {mat = xtabs(Performance / Rank ~ Person + Project, DF); mrg = matchingR::galeShapley.marriageMarket(mat, t(mat))$engagements; data.frame(pers = rownames(mat), proj = colnames(mat)[mrg])},
"alexis_laz" = {mat = xtabs(Performance * Rank ~ Person + Project, DF)
x = clue::solve_LSAP(mat, maximum = FALSE)
data.frame(pers = rownames(mat), proj = colnames(mat)[x])},
GKi = {P <- which(!duplicated(DF$Person))
P <- DF$Person[P[order(DF$Performance[P], decreasing = TRUE)]]
. <- lapply(split(DF[c("Project", "Rank")], DF$Person), \(x) x[[1]][x[[2]]])
setNames(Reduce(\(A, B) {c(A, B[!B %in% A][1])}, .[P[-1]], .[[P[1]]][1]), P)}
)
结果
expression min median itr/se…¹ mem_al…² gc/se…³ n_itr n_gc total…⁴
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:by> <dbl> <int> <dbl> <bch:t>
1 Maël 7.45ms 7.62ms 129. 55.91KB 22.4 52 9 402ms
2 Darren Tsai 5.5ms 5.61ms 176. 18.98KB 18.6 76 8 431ms
3 IceCreamToucan 1.98ms 2.05ms 478. 385.87KB 15.4 218 7 456ms
4 Nobody/alexis 460.87µs 478.88µs 2026. 4.98KB 19.0 960 9 474ms
5 alexis_laz 425.96µs 457.64µs 2075. 496B 19.1 980 9 472ms
6 GKi 146.92µs 156.23µs 6075. 23.35KB 24.4 2491 10 410ms
在这种情况下,GKi1 是最快的,大约比第二个 alexis_laz 快 3 倍。 alexis_laz 分配最少的内存。
这里有一个
for
循环的方法。首先,根据性能(和 Person,如果几个具有相同的性能)分组。第一个应该是性能最好的。
然后,在
for
循环中,迭代选择排名最低的项目,并为其他人移除那个。
l <- split(df, list(-df$Performance, df$Person), drop = TRUE)
choice = setNames(character(length(l)), unique(df$Person[order(-df$Performance)]))
for(i in seq_along(l)){
tmp <- l[[i]]
choice[i] <- tmp$Project[which.min(tmp$Rank)]
l <- lapply(l, \(x) subset(x, x$Project != choice[i]))
}
choice
# Anna Billy James
# "A" "B" "D"
使用数据表
library(data.table)
setDT(df)
setkey(df, Person, Rank)
person <- df[order(-Performance), unique(Person)]
project <- Reduce(
\(project, person) c(project, df[person, setdiff(Project, project)[1]]),
person, init = character())
df[data.table(Person = person, Project = project), on = .(Person, Project)]
#> Key: <Person, Rank>
#> Project Rank Person Performance
#> <char> <int> <char> <int>
#> 1: A 1 Anna 97
#> 2: B 2 Billy 95
#> 3: D 3 James 92
经过一番搜索,似乎
clue::solve_LSAP
可以在这里使用。
首先,我们在人员和项目之间制作一个“分数”/“成本”指标矩阵;在这里我们可以使用一个分数集作为'performance * rank'(
DF
从GKi的答案中复制):
mat = xtabs(Performance * Rank ~ Person + Project, DF)
mat
# Project
#Person A B C D E
# Anna 97 194 291 485 388
# Billy 95 190 285 380 475
# James 92 184 368 276 460
然后,我们将人员分配给项目,以使总“成本”最小化:
library(clue)
x = solve_LSAP(mat, maximum = FALSE)
x
#Optimal assignment:
#1 => 1, 2 => 2, 3 => 4
data.frame(pers = rownames(mat), proj = colnames(mat)[x])
# pers proj
#1 Anna A
#2 Billy B
#3 James D