我正在尝试解决以下问题,将连续的开始和结束时间分组在一起,以计算总持续时间中每一天的总旅行费用。下面是一个示例数据,需要输出。
rm(list =ls())
library(tidyverse)
library(data.table)
library(lubridate)
df <- data.frame(CountryID = c('101', '101', '101', '101', '101', '102', '102', '102', '102'),
AreaID = c('1', '1', '1', '1', '1', '2', '2', '2', '2'),
Period = c('01/01/2023', '01/01/2023', '01/01/2023', '01/01/2023', '01/01/2023', '02/01/2023', '02/01/2023', '02/01/2023', '02/01/2023'),
Day = c('Sunday', 'Sunday', 'Sunday', 'Sunday', 'Sunday', 'Monday', 'Monday', 'Monday', 'Monday'),
StartTime = c('7:00:00 AM', '7:30:00 AM', '8:00:00 AM', '8:30:00 AM', '9:00:00 AM', '7:00:00 AM', '7:30:00 AM', '8:00:00 AM', '8:30:00 AM'),
EndTime = c('7:30:00 AM', '8:00:00 AM', '8:30:00 AM', '9:00:00 AM', '9:30:00 AM', '7:30:00 AM', '8:00:00 AM', '8:30:00 AM', '9:00:00 AM')
TravelCost = c('10', '12', '11', '13', '14', '12', '10', '9', '8'))
Output <- data.frame(CountryID = C(101, 102),
AreaID = C(1, 2),
Period = c('01/01/2023', '02/01/2023'),
Day = c('Sunday', 'Monday'),
StartTime = c('7:00:00 AM', '7:00:00 AM'),
EndTime = c('9:30:00 AM', '9:0:00 AM')
TotalTravelCost = c('60', '39')
任何人都可以帮我找出我在代码中遗漏的问题吗? 提前致谢。
Output <- df %>%
group_by(CountryID, AreaID, Period, Day, StartTime, EndTime) %>%
summarise(TotalTravelCost = sum(TravelCost))
也许像下面这样:
library(dplyr)
Output <- df %>%
group_by(CountryID, AreaID, Period, Day) %>%
mutate(across(ends_with('Time'), ~ strptime(., '%H:%M:%S'))) %>%
mutate(idx = cumsum(coalesce(+(StartTime - lag(EndTime) > 1L), 0L))) %>%
group_by(CountryID, AreaID, Period, Day, idx) %>%
summarise(
StartTime = format(min(StartTime), '%H:%M:%S'),
EndTime = format(max(EndTime), '%H:%M:%S'),
TravelCost = sum(as.numeric(TravelCost), na.rm = TRUE)
) %>%
select(-idx)
输出:
> Output
# A tibble: 2 × 7
# Groups: CountryID, AreaID, Period, Day [2]
CountryID AreaID Period Day StartTime EndTime TravelCost
<chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 101 1 01/01/2023 Sunday 07:00:00 09:30:00 60
2 102 2 02/01/2023 Monday 07:00:00 09:00:00 39
请注意,我已经编辑了您的
data.frame
更正(假定的)拼写错误。如果你真的有奇怪的格式时间(例如08:0:00
),请恢复到初始版本并解释。