根据组内的日期过滤器创建列

Question

我有以下数据：

> dput(my_data)
structure(list(id = c(1, 1, 1, 2, 2, 3, 3), begin = c("2017-01-01", 
"2017-08-01", "2022-05-01", "2017-01-01", "2017-09-01", "2017-01-01", 
"2017-09-01"), end = c("2017-07-01", "2022-04-01", "2023-06-01", 
"2017-08-01", "2023-06-01", "2017-08-01", "2023-06-01"), position = c("position_1", 
"position_2", "position_3", "position_1", "position_2", "position_1", 
"position_1")), row.names = c(NA, -7L), class = "data.frame")


 id      begin        end   position
1  1 2017-01-01 2017-07-01 position_1
2  1 2017-08-01 2022-04-01 position_2
3  1 2022-05-01 2023-06-01 position_3
4  2 2017-01-01 2017-08-01 position_1
5  2 2017-09-01 2023-06-01 position_2
6  3 2017-01-01 2017-08-01 position_1
7  3 2017-09-01 2023-06-01 position_1

我想要什么： 我正在尝试创建一个新列（例如

first_position_in_2023

），其中包含 2023 年 1 月 1 日出现的每个 id 的

first

位置。（第一个位置的时间戳开始-结束包括 2023-01-01。

对于我的数据：

id == 1：位置_3
id == 2：位置_2
id == 2：位置_1

理想情况下，我希望能够创建多个列，例如：

2017 年第一名
2018 年第一名
2019 年第一名 ...

我尝试过的：


my_data %>%
  group_by(id) %>%
  summarise(first_position_in_2023 = position[begin <= (as.Date("2023-01-01"))])


# A tibble: 7 x 2
# Groups:   id [3]
     id first_position_in_2023
  <dbl> <chr>                 
1     1 position_1            
2     1 position_2            
3     1 position_3            
4     2 position_1            
5     2 position_2            
6     3 position_1            
7     3 position_1

这似乎只给我每行的结果，而不是每组的first结果。

Answer 1

既然您包含了所有标签，您可以尝试这种

dplyr

/

tidyr

/

lubridate

方法来实现每年这样做的更广泛目标：

library(dplyr)
library(tidyr)

df %>%
  mutate(year = lubridate::year(end)) %>% select(-c(begin, end)) %>%
  pivot_wider(names_from = year, values_from = position, 
              names_glue = "first_position_in_{year}")

输出：

    id first_position_in_2017 first_position_in_2022 first_position_in_2023
  <dbl> <chr>                  <chr>                  <chr>                 
1     1 position_1             position_2             position_3            
2     2 position_1             NA                     position_2            
3     3 position_1             NA                     position_1

Answer 2

您可以编写一个函数来完成相同的任务：

fn <- function(year, begin, end, pos){
  yr <- as.Date(paste0(year, '-01-01'))
  res <- pos[begin < yr & yr < end]
  if(length(res)) res else NA
}

yrs <- 2017:2023
names(yrs) <- paste0('first_pos_in_', yrs)

my_data %>%
  group_by(id) %>%
  mutate(map_dfc(yrs, fn, begin, end, position))



    id begin  end   position first_pos_in_2017 first_pos_in_2018 first_pos_in_2019
  <dbl> <chr>  <chr> <chr>    <lgl>             <chr>             <chr>            
1     1 2017-… 2017… positio… NA                position_2        position_2       
2     1 2017-… 2022… positio… NA                position_2        position_2       
3     1 2022-… 2023… positio… NA                position_2        position_2       
4     2 2017-… 2017… positio… NA                position_2        position_2       
5     2 2017-… 2023… positio… NA                position_2        position_2       
6     3 2017-… 2017… positio… NA                position_1        position_1       
7     3 2017-… 2023… positio… NA                position_1        position_1       
# ℹ 4 more variables: first_pos_in_2020 <chr>, first_pos_in_2021 <chr>,
#   first_pos_in_2022 <chr>, first_pos_in_2023 <chr>

根据组内的日期过滤器创建列

问题描述投票：0回答：2

2个回答

最新问题

根据组内的日期过滤器创建列

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2