我有一个包含两个日期时间列的数据集:
Begin
& End
:
library(tidyverse)
df <- tibble(
Group = c("A", "B", "C"),
Begin = as_datetime(c("2023-07-15 01:40:11", "2023-07-22 05:54:44", "2023-08-05 16:43:09")),
End = as_datetime(c("2023-07-15 13:43:15", "2023-07-25 10:50:45", "2023-08-06 10:42:12"))
)
df
# A tibble: 3 × 3
Group Begin End
<chr> <dttm> <dttm>
1 A 2023-07-15 01:40:11 2023-07-15 13:43:15
2 B 2023-07-22 05:54:44 2023-07-25 10:50:45
3 C 2023-08-05 16:43:09 2023-08-06 10:42:12
这两个时间戳之间的时间间隔可能是几天。另一方面,开始和结束也可以是同一天。
现在我想把这个时间序列分成几天。
开始和结束之间的日子是“完整”的日子,因此从 00:00:00 开始,到 23:59:59 结束。
我认为这个期望的输出准确地说明了我想要的:
(我添加“组”列只是为了更好地说明,目前与编程无关)。
# A tibble: 7 × 3
Group Begin End
<chr> <dttm> <dttm>
1 A 2023-07-15 01:40:11 2023-07-15 13:43:15
2 B 2023-07-22 05:54:44 2023-07-22 23:59:59
3 B 2023-07-23 00:00:00 2023-07-23 23:59:59
4 B 2023-07-24 00:00:00 2023-07-24 23:59:59
5 B 2023-07-25 00:00:00 2023-07-25 10:50:45
6 C 2023-08-05 16:43:09 2023-08-05 23:59:09
7 C 2023-08-06 00:00:00 2023-08-06 10:42:12
谁能帮我找到解决办法吗?
我认为困难在于保留第一天的开始时间戳和最后一天的结束时间戳。
df |>
# create a date column for beginning and end, for ease of use
mutate(b = as.Date(Begin), e = as.Date(End),
# create a sequence of dates between begin and end
days = map2(b, e, ~ seq.Date(.x, .y, by = "1 day"))) |>
# unnest the days column into many rows
unnest(days) |>
# if the beginning date is the same as the date in `days`, then use the original Begin column
# else, use `days` as a datetime
mutate(Begin = if_else(b == days, Begin, as_datetime(b)),
# same with End, but subtracting one minute
End = if_else(e == days, End, as_datetime(e) - minutes(1)), .keep = "unused")
输出:
Group Begin End
<chr> <dttm> <dttm>
1 A 2023-07-15 01:40:11 2023-07-15 13:43:15
2 B 2023-07-22 05:54:44 2023-07-24 23:59:00
3 B 2023-07-22 00:00:00 2023-07-24 23:59:00
4 B 2023-07-22 00:00:00 2023-07-24 23:59:00
5 B 2023-07-22 00:00:00 2023-07-25 10:50:45
6 C 2023-08-05 16:43:09 2023-08-05 23:59:00
7 C 2023-08-05 00:00:00 2023-08-06 10:42:12