如何过滤截止年之前最后一个测量年的所有观测值以及截止年之后所有年份的所有观测值

问题描述 投票:0回答:2

我想过滤截止年之前的最后一个测量年的所有观测值以及截止年之后所有年份的所有观测值。

这是一个例子:

d <- data.frame(group = c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2),
                cut_off = c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016),
                year = c(2000,2010,2010,2015,2015,2017,2017,2020,2024,2001,2009,2016,2017,2017,2021,2023),
                value = c(10,20,30,40,50,60,70,80,90,100,110,120,130,140,150,160))

> d
   group cut_off year value
1      1    2017 2000    10
2      1    2017 2010    20
3      1    2017 2010    30
4      1    2017 2015    40
5      1    2017 2015    50
6      1    2017 2017    60
7      1    2017 2017    70
8      1    2017 2020    80
9      1    2017 2024    90
10     1    2016 2001   100
11     2    2016 2009   110
12     2    2016 2016   120
13     2    2016 2017   130
14     2    2016 2017   140
15     2    2016 2021   150
16     2    2016 2023   160

这是我想要的输出:

desired <- data.frame(group = c(1,1,1,1,1,1,2,2,2,2,2,2),
            cut_off = c(2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016),
            year = c(2015,2015,2017,2017,2020,2024,2009,2016,2017,2017,2021,2023),
            value = c(40,50,60,70,80,90,110,120,130,140,150,160))

> desired
   group cut_off year value
1      1    2017 2015    40
2      1    2017 2015    50
3      1    2017 2017    60
4      1    2017 2017    70
5      1    2017 2020    80
6      1    2017 2024    90
7      2    2016 2009   110
8      2    2016 2016   120
9      2    2016 2017   130
10     2    2016 2017   140
11     2    2016 2021   150
12     2    2016 2023   160

选择过去的所有年份(包括截止日期)很容易:

require(dplyr)
d %>%
+   filter(year >= cut_off)
  group cut_off year value
1     1    2017 2017    60
2     1    2017 2017    70
3     1    2017 2020    80
4     1    2017 2024    90
5     2    2016 2016   120
6     2    2016 2017   130
7     2    2016 2017   140
8     2    2016 2021   150
9     2    2016 2023   160

但我不知道如何获得截止年之前的最后一年。我尝试了使用

lag()
以及带和不带
group_by()
的组合,但无法使其工作,例如(但不起作用)

d %>%
  filter(year >= cut_off | lag(year) < cut_off)
r dplyr
2个回答
0
投票
library(dplyr)

d |> 
  mutate(diff = cut_off - year) |>
  filter(year >= cut_off | year == year[which.min(diff[diff > 0])], .by = group) |> 
  select(-diff)


0
投票
d <- data.frame(group = c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2),
                cut_off = c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016),
                year = c(2000,2010,2010,2015,2015,2017,2017,2020,2024,2001,2009,2016,2017,2017,2021,2023),
                value = c(10,20,30,40,50,60,70,80,90,100,110,120,130,140,150,160))
  
library(tidyverse)

d |> 
  mutate(year_prior_to_cutoff = year[year < cut_off] |> sort() |> last(), .by = group) |> 
  filter(year >= cut_off | year == year_prior_to_cutoff)
#>    group cut_off year value year_prior_to_cutoff
#> 1      1    2017 2015    40                 2015
#> 2      1    2017 2015    50                 2015
#> 3      1    2017 2017    60                 2015
#> 4      1    2017 2017    70                 2015
#> 5      1    2017 2020    80                 2015
#> 6      1    2017 2024    90                 2015
#> 7      2    2016 2009   110                 2009
#> 8      2    2016 2016   120                 2009
#> 9      2    2016 2017   130                 2009
#> 10     2    2016 2017   140                 2009
#> 11     2    2016 2021   150                 2009
#> 12     2    2016 2023   160                 2009
© www.soinside.com 2019 - 2024. All rights reserved.