计算一天中每个小时发生的持续时间的一部分

问题描述 投票:4回答:4

我有一个带有开始时间和结束时间的数据框。如何处理数据以获取一天中特定时间的总分钟数?例如,如果它从9:45开始并在10:15结束,我希望将15分钟计入9:00小时,将15分钟计入10:00小时。

我无法通过`lubridate做到这一点,所以我发现了这个老问题here。我尝试使用POSIXct,但是输出在几个小时内正确,而在另外几个小时内不正确。我在这里想念什么?

df %>% 
  mutate(minutes = difftime(end_time,start_time),
         hourOfDay = format(as.POSIXct(start_time), "%H"),
         Day = format(as.POSIXct(start_time),"%Y-%m-%d")) %>% 
  group_by(hourOfDay, Day) %>% 
  summarize(totalMinutes = sum(minutes))

输出:

  hourOfDay Day        totalMinutes
  <chr>     <chr>      <drtn>      
1 03        2018-09-02  34 mins    
2 06        2018-09-02 163 mins    
3 07        2018-09-02  84 mins    
4 08        2018-09-02  39 mins    
5 11        2018-09-02  41 mins    
6 14        2018-09-02   3 mins

预期结果:

  hourOfDay Day        totalMinutes
  <chr>     <chr>      <drtn>      
1 03        2018-09-02  34 mins    
2 06        2018-09-02  69 mins    
3 07        2018-09-02  124 mins    
4 08        2018-09-02  93 mins    
5 11        2018-09-02  41 mins    
6 14        2018-09-02   3 mins

这里是示例数据:

 df <- data.frame(
      id = c(1,2,3,4,5,6,7,8,9),
    start_time = c("2018-09-02 11:13:00", "2018-09-02 14:34:00",
                     "2018-09-02 03:00:00", "2018-09-02 03:49:00",
                     "2018-09-02 07:05:00", "2018-09-02 06:44:00", "2018-09-02 06:04:00",
                     "2018-09-02 07:51:00", "2018-09-02 08:16:00"),
    end_time = c("2018-09-02 11:54:00", "2018-09-02 14:37:00",
                   "2018-09-02 03:30:00", "2018-09-02 03:53:00",
                   "2018-09-02 08:05:00", "2018-09-02 06:57:00", "2018-09-02 08:34:00",
                   "2018-09-02 08:15:00", "2018-09-02 08:55:00"))
r datetime lubridate posixct
4个回答
2
投票

不是最佳解决方案,因为它可以扩展数据,但我认为它是可行的:

library(dplyr)
library(lubridate)

df %>%
  mutate_at(-1, ymd_hms) %>%
  mutate(time = purrr::map2(start_time, end_time, seq, by = 'min')) %>%
  tidyr::unnest(time) %>%
  mutate(hour = hour(time), date = as.Date(time)) %>%
  count(date, hour)

# A tibble: 6 x 3
#  date        hour     n
#  <date>     <int> <int>
#1 2018-09-02     3    36
#2 2018-09-02     6    70
#3 2018-09-02     7   124
#4 2018-09-02     8    97
#5 2018-09-02    11    42
#6 2018-09-02    14     4

我们创建一个从start_timeend_time的序列,间隔为1分钟,分别提取countdate的小时数和hour的出现。


1
投票

一种不扩展数据,但需要辅助函数的替代解决方案:

library(dplyr)
library(lubridate)

count_minutes <- function(start_time, end_time) {
  time_interval <- interval(start_time, end_time)

  start_hour <- floor_date(start_time, unit = "hour")
  end_hour <- ceiling_date(end_time, unit = "hour")
  diff_hours <- as.double(difftime(end_hour, start_hour, "hours"))

  hours <- start_hour + hours(0:diff_hours)
  hour_intervals <- int_diff(hours)
  minutes_per_hour <- as.double(intersect(time_interval, hour_intervals), units = "minutes")

  hours <- hours[1:(length(hours)-1)]
  tibble(Day = date(hours),
         hourOfDay = hour(hours),
         totalMinutes = minutes_per_hour)
}


df %>% 
  mutate(start_time = as_datetime(start_time),
         end_time = as_datetime(end_time)) %>% 
  as_tibble() %>% 
  mutate(minutes_per_hour = purrr::map2(start_time, end_time, count_minutes)) %>% 
  unnest(minutes_per_hour) %>% 
  group_by(Day, hourOfDay) %>% 
  summarise(totalMinutes = sum(totalMinutes)) %>%
  ungroup()

# A tibble: 6 x 3
#   Day        hourOfDay totalMinutes
#   <date>         <int>        <dbl>
# 1 2018-09-02         3           34
# 2 2018-09-02         6           69
# 3 2018-09-02         7          124
# 4 2018-09-02         8           93
# 5 2018-09-02        11           41
# 6 2018-09-02        14            3

[helper function]在一对start_time, end_time中每小时计数一次,其中包含多少分钟,并将其作为tibble返回。然后可以将其应用于数据中的每个此类对,并进行unnest汇总和汇总以计算总计。


1
投票

这里是一个替代解决方案,类似于Ronak的解决方案,但没有创建每分钟的数据帧。

library(dplyr)
library(lubridate)

    df %>%
      mutate(hour = (purrr::map2(hour(start_time), hour(end_time), seq, by = 1))) %>%
      tidyr::unnest(hour)  %>% mutate(minu=case_when(hour(start_time)!=hour & hour(end_time)==hour ~ 1*minute(end_time),
                                 hour(start_time)==hour & hour(end_time)!=hour ~ 60-minute(start_time),
                                 hour(start_time)==hour & hour(end_time)==hour ~ 1*minute(end_time)-1*minute(start_time),
                                 TRUE ~ 60)) %>% group_by(hour) %>% summarise(sum(minu))

# A tibble: 6 x 2
   hour `sum(minu)`
  <dbl>       <dbl>
1     3          34
2     6          69
3     7         124
4     8          93
5    11          41
6    14           3

0
投票

Adata.table/ lubridate替代。

library(data.table)
library(lubridate)

setDT(df) 

df[ , ceil_start := ceiling_date(start_time, "hour")]

d = df[ , {
  if(ceil_start > end_time){
    .SD[ , .(start_time, dur = as.double(end_time - start_time, units = "mins"))]
  } else {
    time <- c(start_time,
              seq(from = ceil_start, to = floor_date(end_time, "hour"), by = "hour"),
              end_time)
    .(start = head(time, -1), dur = `units<-`(diff(time), "mins"))
  }
},
by = id]

setorder(d, start_time)
d[ , .(n_min = sum(dur)), by = .(date = as.Date(start_time), hour(start_time))]

#          date hour n_min
# 1: 2018-09-02    3    34
# 2: 2018-09-02    6    69
# 3: 2018-09-02    7   124
# 4: 2018-09-02    8    93
# 5: 2018-09-02   11    41
# 6: 2018-09-02   14     3

说明

将数据帧转换为data.tablesetDT)。将开始时间四舍五入到最近的小时(ceiling_date(start, "hour"))。

检查向上舍入时间与开始时间之间的差是否大于结束时间(if(ceil_start > end_time))。如果是这样,请选择该小时的开始时间和持续时间(as.double(end_time - start_time, units = "mins"))。

[对于其他情况(else),创建一个从上舍入开始时间到下舍入结束时间的序列,并按小时递增(seq(from = ceil_start, to = floor_date(end, "hour"), by = "hour"))。与开始时间和结束时间连接。返回除最后一个(head(time, -1))以外的所有时间,并计算以分钟为单位的每个步骤之间的时间差(`units<-`(diff(time), "mins"))。

按开始时间订购数据(setorder(d, start_time))。按日期和小时d[ , .(n_min = sum(dur)), by = .(date = as.Date(start_time), hour(start_time))]得出的总持续时间。

© www.soinside.com 2019 - 2024. All rights reserved.