在R data.table中查找时间上两个不同标签之间的时间间隔,标签-数据

问题描述 投票:0回答:1

我有一个具有(简化)结构的表格日志文件:

<time>, <event_tag>
,并且想要查找两个不同的
event_tags
problem
all fine
之间的间隔。困难在于
all fine
problem
经常重复。 因此,手动算法将找到第一个
problem
并寻找下一个
all fine
,然后继续,直到最后一个
problem
和后续
all fine

样本数据集:

library(data.table)
set.seed(156125)
DT <- data.table(time = seq(as.POSIXct(tz = "UTC", "2024-01-01"), 
                            as.POSIXct(tz = "UTC", "2024-01-10"), 
                            by = "12 hours"),
                 event_tag = c("problem", "all fine")[round(runif(19, 1.2, 2.49))])

# time                    event_tag
# <POSc>                     <char>
#  1: 2024-01-01 00:00:00  all fine
#  2: 2024-01-01 12:00:00  all fine
#  3: 2024-01-02 00:00:00   problem
#  4: 2024-01-02 12:00:00  all fine
#  5: 2024-01-03 00:00:00  all fine
#  6: 2024-01-03 12:00:00   problem
#  7: 2024-01-04 00:00:00   problem
#  8: 2024-01-04 12:00:00  all fine
#  9: 2024-01-05 00:00:00  all fine
# 10: 2024-01-05 12:00:00  all fine
# 11: 2024-01-06 00:00:00  all fine
# 12: 2024-01-06 12:00:00  all fine
# 13: 2024-01-07 00:00:00   problem
# 14: 2024-01-07 12:00:00  all fine
# 15: 2024-01-08 00:00:00  all fine
# 16: 2024-01-08 12:00:00   problem
# 17: 2024-01-09 00:00:00  all fine
# 18: 2024-01-09 12:00:00  all fine
# 19: 2024-01-10 00:00:00  all fine

想要的结果:

data.table(problem_start = DT$time[c(3, 6, 13, 16)],
           problem_end = DT$time[c(4, 8, 14, 17)])

#          problem_start         problem_end
#                 <POSc>              <POSc>
# 1: 2024-01-02 00:00:00 2024-01-02 12:00:00
# 2: 2024-01-03 12:00:00 2024-01-04 12:00:00
# 3: 2024-01-07 00:00:00 2024-01-07 12:00:00
# 4: 2024-01-08 12:00:00 2024-01-09 00:00:00

我想了一些解决方案,通过制作两个标签

boolean
并使用
cumsum
,但无法完全弄清楚。也许有一种简洁的
data.table
方法可以做到这一点,但我目前还没有看到。然而,即使我更喜欢
dplyr
,我也会对
data.table
解决方案感到满意。

DT[ , bool := ifelse(event_tag == "all fine", 0, 1)]
DT[ , cumsum(bool)]
r dplyr data.table
1个回答
0
投票

您可以将结束时间滚动连接到开始时间,然后获取每个结束时间的最早开始时间:

starts <- DT[event_tag=="problem", .(problem_start = time)]
ends <- DT[event_tag=="all fine", .(time, problem_end = time)]
result <- ends[starts, on=.(time==problem_start), roll=-Inf][, .(problem_start = first(time)), by=problem_end]
setorder(result, problem_start)
setcolorder(result, c("problem_start", "problem_end"))
© www.soinside.com 2019 - 2024. All rights reserved.