提取时间间隔的重叠部分以计算可用性

问题描述 投票:0回答:2

我有一个关于故障时间的数据集。故障的开始点是

Begin
列,结束点是
End
列。
另外还有3个故障类别,它们在时间上也可以重叠。
我的实际目标是计算每天没有故障的时间段的总和。 (其实类别并不重要,我不需要细分类别)。
为了更好地说明,以下是 3 天以上故障次数的示例数据集:

> df_time
# A tibble: 8 × 3
  Category Begin               End                
  <chr>    <dttm>              <dttm>             
1 A        2023-07-15 01:40:11 2023-07-15 13:43:15
2 A        2023-07-16 05:54:44 2023-07-16 10:50:45
3 B        2023-08-16 07:43:09 2023-08-16 16:42:12
4 C        2023-08-16 12:00:00 2023-08-16 13:11:13
5 A        2023-08-16 18:00:00 2023-08-16 19:30:00
6 A        2023-08-17 08:00:00 2023-08-17 13:00:00
7 C        2023-08-17 11:12:45 2023-08-17 19:58:22
8 A        2023-08-17 19:01:45 2023-08-17 23:59:59

我现在尝试以图形方式可视化故障时间:

现在我想计算每个日历日的摘要,显示有多少分钟(或小时或秒)没有发生任何故障。
这是每天所有绿灯时间间隔的总和:

如果只有一种故障类别,只需计算 24 小时减去故障间隔即可。
然而,我没有进一步处理不同的重叠故障间隔。
谁能帮我计算无故障周期和/或每日故障周期。

我想象这样的评估有点:(结果可能与上面的示例数据集不完全相同)

  Date       OK_times malfunction_times
  <date>     <chr>    <chr>
1 2023-07-15 17 hours 7 hours
2 2023-07-16 5 Hours  19 hours
3 2023-07-17 3 hours  21 hours

这是创建示例数据框的代码:

library(tidyverse)

df_time <- tibble(
  Category = c(
    "A",
    "A",
    "B",
    "C",
    "A",
    "A",
    "C",
    "A"
  ),
  Begin = as_datetime(c(
    "2023-07-15 01:40:11",
    "2023-07-16 05:54:44",
    "2023-08-16 07:43:09",
    "2023-08-16 12:00:00",
    "2023-08-16 18:00:00",
    "2023-08-17 08:00:00",
    "2023-08-17 11:12:45",
    "2023-08-17 19:01:45"
    )),
  End = as_datetime(c(
    "2023-07-15 13:43:15",
    "2023-07-16 10:50:45",
    "2023-08-16 16:42:12",
    "2023-08-16 13:11:13",
    "2023-08-16 19:30:00",
    "2023-08-17 13:00:00",
    "2023-08-17 19:58:22",
    "2023-08-17 23:59:59"
    ))
)
r timestamp intervals lubridate period
2个回答
1
投票
interval_intersects <- function(i1, i2) { 
  # check if there's an overlap between the two intervals
  between(i1$Begin, i2$Begin, i2$End) | between(i1$End, i2$Begin, i2$End) | between(i2$Begin, i1$Begin, i1$End) | between(i2$End, i1$Begin, i1$End)
}

tidy_intervals <- function(df) {
  out <- df[0,] # empty tibble with the columns of the input df
  
  while (nrow(df) > 0) {
    matched = FALSE
    if (nrow(out) != 0) { # this is so janky but R will try to loop through the dataframe, even when it has no rows, even using seq_along
      
      # for each row in out, check if it intersects with the current row
      for (j in 1:nrow(out)) {

        # if it does, update the current output row to be the minimum of the two begin times and the maximum of the two end times
        if (interval_intersects(df[1, ], out[j, ])) {
          matched = TRUE
          out[j, ] <- tibble(
            Begin = min(df[1,]$Begin, out[j,]$Begin),
            End = max(df[1,]$End, out[j,]$End)
          )
          break
        }
      }
    }
    # if the current row didn't intersect with any of the output rows, append it to the output
    if (!matched) {
      out <- out |> add_row(
        Begin = df[1,]$Begin,
        End = df[1,]$End)
    }

    # remove the current row from the input
    df <- df[-1, ]
  }
  return(out)
}

tidy_intervals(df_time |> select(-Category)) |> 
  ##### everything from here until later is from the old answer https://stackoverflow.com/a/76905774/4145280 #####
  mutate(b = as.Date(Begin), e = as.Date(End),
        # create a sequence of dates between begin and end
        days = map2(b, e, ~ seq.Date(.x, .y, by = "1 day"))) |>
  # unnest the days column into many rows
  unnest(days) |>
  # if the beginning date is the same as the date in `days`, then use the original Begin column
  # else, use `days` as a datetime
  mutate(Begin = if_else(b == days, Begin, as_datetime(days)),
         # same with End, but subtracting one minute
         End = if_else(e == days, End, as_datetime(days) + days(1) - seconds(1)), .keep = "unused") |> 
 #### new stuff starts here ####
  mutate(Date = as.Date(Begin), 
         malfunction_times = End - Begin) |>
  reframe(malfunction_times = round(sum(malfunction_times)),
          OK_times = 24 - malfunction_times, .by = Date)

输出:

# A tibble: 4 × 3
  Date       malfunction_times OK_times
  <date>     <drtn>            <drtn>  
1 2023-07-15 12 hours          12 hours
2 2023-07-16  5 hours          19 hours
3 2023-08-16 10 hours          14 hours
4 2023-08-17 16 hours           8 hours

0
投票

我可以建议一个更直观但效率较低的替代方案吗?主要思想是将所有间隔创建为秒向量,然后消除重复项(重叠),最后检查每天有多少秒的故障发生在这一天:

# get days as numerics
begin_day <- as.numeric(as_datetime(date(df_time$Begin)))
days <- unique(begin_day)

# get times as numerics
begin_time <- as.numeric(df_time$Begin)
end_time <- as.numeric(df_time$End)

# create intervals as vectors of seconds and make them unique
# in other words: all overlaps are merged
intervals <- Map(function(x, y) x:y, begin_time, end_time)
intervals_unique <- unique(unlist(intervals))

# now we simply check how many seconds of a day overlap with the intervals
res <- sapply(days, function(x) sum(intervals_unique >= x & intervals_unique < x + 24 * 60 * 60))

data.frame(date = as_datetime(days), malfunction = res / (60 * 60))

制作:

        date malfunction
1 2023-07-15   12.051389
2 2023-07-16    4.933889
3 2023-08-16   10.484722
4 2023-08-17   16.000000

通过过滤特定日期的数据帧然后应用该过程可以提高效率。这将使

intervals_unique
向量变小。在这里,我只是想展示一下总体思路。结果与 Mark 的解决方案相符。

© www.soinside.com 2019 - 2024. All rights reserved.