我有一个人员技能数据集以及他们获得这些技能的年份。我有一个函数可以按照中点假设来插入这些技能(这里很好地回答了“https://stackoverflow.com/questions/77008576/accumulate-strings- Between-years-using-midpoint-asstitution”)。
have <- data.frame(
c("A", "A","A","B","B","C","C"),
c("S1", "S4", "S2", "S1,S2","S3","S1,S3","S1,S2" ),
c(2015,2015, 2018,2016,2018,2016,2024)
)
colnames(have) <- c("person", "skill", "year")
skill_interp <- function(x, y) {
x <- cumany(x) + 0
function(z) {
i <- findInterval(z, y)
ifelse(i==length(y), x[i], ifelse(z-y[i] == y[i+1]-z, pmax(x[i], x[i+1]),
ifelse(z-y[i] < y[i+1]-z, x[i], x[i+1])))
}
}
have %>%
separate_longer_delim(skill, ",") %>%
mutate(present=1) %>%
group_by(person) %>%
complete(skill, year, fill=list(present=0)) %>%
ungroup() %>%
reframe(, present=skill_interp(present, year)(first(year):last(year)), year=first(year):last(year), .by = c(person, skill)) %>%
filter(present==1) %>%
summarize(skill=paste(skill, collapse=","), .by=c(person, year))
对于那些职业生涯较长的人来说,他们积累了很多技能,其中较旧的技能可能无关紧要(如果未使用)。我想在X年后“停止插入”那些不相关的技能,在这种情况下说是5年,这样,如果我知道他们不使用它,那么该技能就会从由此产生的积累中“退出”。
我想要的数据框是:
want <- data.frame(
c("A","A","A","A","B","B","B","C","C","C","C","C","C","C","C","C"),
c("S1,S4","S1,S4","S1,S2,S4","S1,S2,S4","S1,S2","S1,S2,S3","S1,S2,S3","S1,S3","S1,S3","S1,S3","S1,S3","S1,S2,S3","S1,S2","S1,S2","S1,S2","S1,S2"),
c(2015,2016,2017,2018,2016,2017,2018,2016,2017,2018, 2019, 2020, 2021, 2022, 2023, 2024)
)
colnames(want) <- c("person", "skills", "year")
不确定,为什么你想添加技能,尽管他们还没有获得技能。也是 C:2020 中 S2 的来源。但这已经很接近了:
## helper
ncb <- \(x, ...) paste(sort(unique(el(strsplit(x, ',')))), collapse=',')
aggregate(skill ~ person + year, have, \(x) paste(x, collapse=',')) |>
transform(skill=ave(skill, person, FUN=\(x)
Reduce(\(...) ncb(paste(..., sep=',')), x, accumulate=TRUE))) |>
{\(.) .[with(., order(person, year)), ]}() |>
merge(
do.call(rbind, by(have, have$person, \(x) expand.grid(
person=el(x$person),
year=do.call(seq.int, c(as.list(range(x$year)), 1))
))),
all=TRUE) |>
transform(skill=ave(skill, person, FUN=zoo::na.locf))
# person year skill
# 1 A 2015 S1,S4
# 2 A 2016 S1,S4
# 3 A 2017 S1,S4
# 4 A 2018 S1,S2,S4
# 5 B 2016 S1,S2
# 6 B 2017 S1,S2
# 7 B 2018 S1,S2,S3
# 8 C 2016 S1,S3
# 9 C 2017 S1,S3
# 10 C 2018 S1,S3
# 11 C 2019 S1,S3
# 12 C 2020 S1,S3
# 13 C 2021 S1,S3
# 14 C 2022 S1,S3
# 15 C 2023 S1,S3
# 16 C 2024 S1,S2,S3