在任意 X 年的年份之间插入相关字符串

问题描述 投票:0回答:1

我有一个人员技能数据集以及他们获得这些技能的年份。我有一个函数可以按照中点假设来插入这些技能(这里很好地回答了“https://stackoverflow.com/questions/77008576/accumulate-strings- Between-years-using-midpoint-asstitution”)。

have <- data.frame(
  c("A", "A","A","B","B","C","C"),
  c("S1", "S4", "S2", "S1,S2","S3","S1,S3","S1,S2" ),
  c(2015,2015, 2018,2016,2018,2016,2024)
)
colnames(have) <- c("person", "skill", "year")

skill_interp <- function(x, y) {
  x <- cumany(x) + 0
  function(z) {
    i <- findInterval(z, y)
    ifelse(i==length(y), x[i], ifelse(z-y[i] == y[i+1]-z, pmax(x[i], x[i+1]),
                                      ifelse(z-y[i] < y[i+1]-z, x[i], x[i+1])))
  }
}

have %>% 
  separate_longer_delim(skill, ",") %>% 
  mutate(present=1) %>% 
  group_by(person) %>% 
  complete(skill, year, fill=list(present=0)) %>%
  ungroup() %>% 
  reframe(, present=skill_interp(present, year)(first(year):last(year)), year=first(year):last(year), .by = c(person, skill)) %>% 
  filter(present==1) %>% 
  summarize(skill=paste(skill, collapse=","), .by=c(person, year))

对于那些职业生涯较长的人来说,他们积累了很多技能,其中较旧的技能可能无关紧要(如果未使用)。我想在X年后“停止插入”那些不相关的技能,在这种情况下说是5年,这样,如果我知道他们不使用它,那么该技能就会从由此产生的积累中“退出”。

我想要的数据框是:

want <- data.frame(
  c("A","A","A","A","B","B","B","C","C","C","C","C","C","C","C","C"),
  c("S1,S4","S1,S4","S1,S2,S4","S1,S2,S4","S1,S2","S1,S2,S3","S1,S2,S3","S1,S3","S1,S3","S1,S3","S1,S3","S1,S2,S3","S1,S2","S1,S2","S1,S2","S1,S2"),
  c(2015,2016,2017,2018,2016,2017,2018,2016,2017,2018, 2019, 2020, 2021, 2022, 2023, 2024)
)
colnames(want) <- c("person", "skills", "year")
r dataframe data-cleaning
1个回答
0
投票

不确定,为什么你想添加技能,尽管他们还没有获得技能。也是 C:2020 中 S2 的来源。但这已经很接近了:

## helper
ncb <- \(x, ...) paste(sort(unique(el(strsplit(x, ',')))), collapse=',')

aggregate(skill ~ person + year, have, \(x) paste(x, collapse=',')) |>
  transform(skill=ave(skill, person, FUN=\(x) 
                      Reduce(\(...) ncb(paste(..., sep=',')), x, accumulate=TRUE))) |>
  {\(.) .[with(., order(person, year)), ]}() |>
  merge(
    do.call(rbind, by(have, have$person, \(x) expand.grid(
      person=el(x$person),
      year=do.call(seq.int, c(as.list(range(x$year)), 1))
    ))), 
    all=TRUE) |>
  transform(skill=ave(skill, person, FUN=zoo::na.locf))

#    person year    skill
# 1       A 2015    S1,S4
# 2       A 2016    S1,S4
# 3       A 2017    S1,S4
# 4       A 2018 S1,S2,S4
# 5       B 2016    S1,S2
# 6       B 2017    S1,S2
# 7       B 2018 S1,S2,S3
# 8       C 2016    S1,S3
# 9       C 2017    S1,S3
# 10      C 2018    S1,S3
# 11      C 2019    S1,S3
# 12      C 2020    S1,S3
# 13      C 2021    S1,S3
# 14      C 2022    S1,S3
# 15      C 2023    S1,S3
# 16      C 2024 S1,S2,S3
© www.soinside.com 2019 - 2024. All rights reserved.