我想过滤截止年之前的最后一个测量年的所有观测值以及截止年之后所有年份的所有观测值。
这是一个例子:
d <- data.frame(group = c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2),
cut_off = c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016),
year = c(2000,2010,2010,2015,2015,2017,2017,2020,2024,2001,2009,2016,2017,2017,2021,2023),
value = c(10,20,30,40,50,60,70,80,90,100,110,120,130,140,150,160))
> d
group cut_off year value
1 1 2017 2000 10
2 1 2017 2010 20
3 1 2017 2010 30
4 1 2017 2015 40
5 1 2017 2015 50
6 1 2017 2017 60
7 1 2017 2017 70
8 1 2017 2020 80
9 1 2017 2024 90
10 1 2016 2001 100
11 2 2016 2009 110
12 2 2016 2016 120
13 2 2016 2017 130
14 2 2016 2017 140
15 2 2016 2021 150
16 2 2016 2023 160
这是我想要的输出:
desired <- data.frame(group = c(1,1,1,1,1,1,2,2,2,2,2,2),
cut_off = c(2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016),
year = c(2015,2015,2017,2017,2020,2024,2009,2016,2017,2017,2021,2023),
value = c(40,50,60,70,80,90,110,120,130,140,150,160))
> desired
group cut_off year value
1 1 2017 2015 40
2 1 2017 2015 50
3 1 2017 2017 60
4 1 2017 2017 70
5 1 2017 2020 80
6 1 2017 2024 90
7 2 2016 2009 110
8 2 2016 2016 120
9 2 2016 2017 130
10 2 2016 2017 140
11 2 2016 2021 150
12 2 2016 2023 160
选择过去的所有年份(包括截止日期)很容易:
require(dplyr)
d %>%
+ filter(year >= cut_off)
group cut_off year value
1 1 2017 2017 60
2 1 2017 2017 70
3 1 2017 2020 80
4 1 2017 2024 90
5 2 2016 2016 120
6 2 2016 2017 130
7 2 2016 2017 140
8 2 2016 2021 150
9 2 2016 2023 160
但我不知道如何获得截止年之前的最后一年。我尝试了使用
lag()
以及带和不带 group_by()
的组合,但无法使其工作,例如(但不起作用)
d %>%
filter(year >= cut_off | lag(year) < cut_off)
library(dplyr)
d |>
mutate(diff = cut_off - year) |>
filter(year >= cut_off | year == year[which.min(diff[diff > 0])], .by = group) |>
select(-diff)
d <- data.frame(group = c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2),
cut_off = c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016),
year = c(2000,2010,2010,2015,2015,2017,2017,2020,2024,2001,2009,2016,2017,2017,2021,2023),
value = c(10,20,30,40,50,60,70,80,90,100,110,120,130,140,150,160))
library(tidyverse)
d |>
mutate(year_prior_to_cutoff = year[year < cut_off] |> sort() |> last(), .by = group) |>
filter(year >= cut_off | year == year_prior_to_cutoff)
#> group cut_off year value year_prior_to_cutoff
#> 1 1 2017 2015 40 2015
#> 2 1 2017 2015 50 2015
#> 3 1 2017 2017 60 2015
#> 4 1 2017 2017 70 2015
#> 5 1 2017 2020 80 2015
#> 6 1 2017 2024 90 2015
#> 7 2 2016 2009 110 2009
#> 8 2 2016 2016 120 2009
#> 9 2 2016 2017 130 2009
#> 10 2 2016 2017 140 2009
#> 11 2 2016 2021 150 2009
#> 12 2 2016 2023 160 2009