使用中点假设累积年份之间的字符串

问题描述 投票:0回答:1

我有一个关于人员技能以及他们使用这些技能的年份的数据集。我想在他们使用这些技能的年份之间补充这些技能,因为我认为他们在“使用”之间获得了这些技能。我对 Base、tidyverse 或 data.table 不可知。

我的有:

have <- data.frame(
  person = c("A","A","B","B","C","C"),
  skill = c("S1","S2", "S1,S2","S3","S1,S2","S1,S3" ),
  year = c(2015,2018,2016,2018,2016,2018)
)

我的愿望是:

want <- data.frame(
  person = c("A","A","A","A","B","B","B","C","C","C"),
  skill = c("S1","S1","S1,S2","S1,S2","S1,S2","S1,S2,S3","S1,S2,S3","S1,S3","S1,S2,S3","S1,S2,S3"),
  year = c(2015,2016,2017,2018,2016,2017,2018,2016,2017,2018)
)

我可以使用

fill(.direction = "down")
,但我不确定如何用技能向后填充,当他们在一年内展示出他们已经拥有的多种技能时,我也看不到重复数据删除的途径。

r dataframe dplyr data.table data-cleaning
1个回答
0
投票

使用 2 个

unnest
的方法,第一个获得缺失的年份,第二个获得每个 skill 元素。使用
distinct
压缩输出。

df %>% 
  group_by(person) %>% 
  arrange(person, year) %>% 
  mutate(year = list(seq(first(year), last(year))), 
         first_skill = first(skill)) %>% 
  unnest(year) %>% 
  mutate(skill = strsplit(skill, ",")) %>% 
  unnest(skill) %>% 
  mutate(skill = paste(unique(skill), collapse = ","), 
         first_skill = if_else(row_number() == 1, first_skill, skill)) %>% 
  distinct(year, skill, .keep_all = TRUE) %>% 
  mutate(skill = first_skill, first_skill = NULL) %>%
  ungroup()

注意:如果总是订购

year
,请跳过 arrange

输出

# A tibble: 10 × 3
   person skill     year
   <chr>  <chr>    <int>
 1 A      S1        2015
 2 A      S1,S2     2016
 3 A      S1,S2     2017
 4 A      S1,S2     2018
 5 B      S1,S2     2016
 6 B      S1,S2,S3  2017
 7 B      S1,S2,S3  2018
 8 C      S1,S2     2016
 9 C      S1,S2,S3  2017
10 C      S1,S2,S3  2018
© www.soinside.com 2019 - 2024. All rights reserved.