我正在尝试在表格上创建一个新列,以观察每个季度 ID 被标记为特定状态的情况。 (注:数据集于 2023 年 10 月结束)。因此,我可能需要/应该将 End_date 中的所有 NA 更改为 2023-10-31,将季度更改为 2023.3
我的桌子目前看起来像这样:
ID Start_date End_date Status Quarter_Start Quarter_End
1 2019-03-28 2020-03-26 A 2019.1 2020.1
2 2011-02-28 2011-04-12 C 2011.1 2011.2
3 2019-03-28 NA F 2019.1 NA
3 2005-02-28 2007-06-20 A 2005.1 2020.2
我希望它看起来像这样:
ID Start_date End_date Status Quarter
1 2019-03-28 2020-03-26 A 2019.1
1 2019-03-28 2020-03-26 A 2019.2
1 2019-03-28 2020-03-26 A 2019.3
1 2019-03-28 2020-03-26 A 2019.4
1 2019-03-28 2020-03-26 A 2020.1
2 2011-02-28 2011-04-12 C 2011.1
2 2011-02-28 2011-04-12 C 2011.2
3 2023-03-28 2023-10-31 F 2023.1
3 2023-03-28 2023-10-31 F 2023.2
3 2023-03-28 2023-10-31 F 2023.3
3 2005-02-28 2007-06-20 A 2005.1
3 2005-02-28 2007-06-20 A 2005.2
3 2005-02-28 2007-06-20 A 2005.3
3 2005-02-28 2007-06-20 A 2005.4
3 2005-02-28 2007-06-20 A 2006.1
3 2005-02-28 2007-06-20 A 2006.2
3 2005-02-28 2007-06-20 A 2006.3
3 2005-02-28 2007-06-20 A 2006.4
3 2005-02-28 2007-06-20 A 2007.1
3 2005-02-28 2007-06-20 A 2007.2
我已经尝试了这篇文章中的大部分选项使用开始和结束日期按日期范围扩展行,但到目前为止没有一个对我有用。
您可以创建一个列表列,其中包含每对开始日期和结束日期的季度序列,然后
tidyr::unnest_longer()
:
library(lubridate)
library(dplyr)
library(tidyr)
library(purrr)
dat %>%
mutate(
End_date = replace_na(End_date, ymd("2023-10-31")),
Quarter = map2(
floor_date(Start_date, unit = "quarter"),
End_date,
\(start, end) quarter(seq(start, end, by = "quarter"), type = "year.quarter")
)
) %>%
unnest_longer(Quarter) %>%
select(!Quarter_Start:Quarter_End)
# A tibble: 62 × 5
ID Start_date End_date Status Quarter
<int> <date> <date> <chr> <dbl>
1 1 2019-03-28 2020-03-26 A 2019.1
2 1 2019-03-28 2020-03-26 A 2019.2
3 1 2019-03-28 2020-03-26 A 2019.3
4 1 2019-03-28 2020-03-26 A 2019.4
5 1 2019-03-28 2020-03-26 A 2020.1
6 2 2011-02-28 2011-04-12 C 2011.1
7 2 2011-02-28 2011-04-12 C 2011.2
8 3 2019-03-28 2023-10-31 F 2019.1
9 3 2019-03-28 2023-10-31 F 2019.2
10 3 2019-03-28 2023-10-31 F 2019.3
# ℹ 52 more rows