我有一个关于人员技能以及他们使用这些技能的年份的数据集。我想在他们使用这些技能的年份之间补充这些技能,因为我认为他们在“使用”之间获得了这些技能。我对 Base、tidyverse 或 data.table 不可知。
我的有:
have <- data.frame(
person = c("A","A","B","B","C","C"),
skill = c("S1","S2", "S1,S2","S3","S1,S2","S1,S3" ),
year = c(2015,2018,2016,2018,2016,2018)
)
我的愿望是:
want <- data.frame(
person = c("A","A","A","A","B","B","B","C","C","C"),
skill = c("S1","S1","S1,S2","S1,S2","S1,S2","S1,S2,S3","S1,S2,S3","S1,S3","S1,S2,S3","S1,S2,S3"),
year = c(2015,2016,2017,2018,2016,2017,2018,2016,2017,2018)
)
我可以使用
fill(.direction = "down")
,但我不确定如何用技能向后填充,当他们在一年内展示出他们已经拥有的多种技能时,我也看不到重复数据删除的途径。
使用 2 个
unnest
的方法,第一个获得缺失的年份,第二个获得每个 skill 元素。使用 distinct
压缩输出。
df %>%
group_by(person) %>%
arrange(person, year) %>%
mutate(year = list(seq(first(year), last(year))),
first_skill = first(skill)) %>%
unnest(year) %>%
mutate(skill = strsplit(skill, ",")) %>%
unnest(skill) %>%
mutate(skill = paste(unique(skill), collapse = ","),
first_skill = if_else(row_number() == 1, first_skill, skill)) %>%
distinct(year, skill, .keep_all = TRUE) %>%
mutate(skill = first_skill, first_skill = NULL) %>%
ungroup()
注意:如果总是订购
year,请跳过
arrange
。
输出
# A tibble: 10 × 3
person skill year
<chr> <chr> <int>
1 A S1 2015
2 A S1,S2 2016
3 A S1,S2 2017
4 A S1,S2 2018
5 B S1,S2 2016
6 B S1,S2,S3 2017
7 B S1,S2,S3 2018
8 C S1,S2 2016
9 C S1,S2,S3 2017
10 C S1,S2,S3 2018