我有如下数据表:
library(data.table)
DT1<-data.table(
id=c(1,2,3,4,3,2),
in_time=c("2017-11-01 08:37:35","2017-11-01 09:07:44","2017-11-01 09:46:16","2017-11-01 10:32:29","2017-11-01 10:59:25","2017-11-01 13:24:12"),
out_time=c("2017-11-01 08:45:35","2017-11-01 09:15:30","2017-11-01 10:11:16","2017-11-01 10:37:05","2017-11-01 11:45:25","2017-11-01 14:10:09")
)
它包含有关某人何时进入商店和离开商店的每条信息。
现在我要每5分钟带人在商店里(标准的5分钟,如分钟0,5,10,15 ... 60)。如果没有人,我需要一个0值。
所以我尝试了
library(lubridate)
DT1[,time:=ymd_hms(in_time)]
DT1[,time:=ceiling_date(time,"5mins")]
DT1[,.N,by=list(time)]
这仅给出每次输入的人数,但是我现在仍然在考虑out_time的问题。例如,id 1在2017-11-01 08:37:35
输入,在2017-11-01 08:45:35
保留。因此他将位于从2017-11-01 08:40:00
到5分钟间隔购物2017-11-01 08:45:00
而不是2017-11-01 08:50:00
等。
感谢您的任何帮助。
这里是使用data.table::foverlaps
的选项:
times <- seq(as.POSIXct("2017-11-01 00:00:00", format=fmt),
as.POSIXct("2017-11-02 00:00:00", format=fmt),
by="5 min")
DT2 <- data.table(in_time=times[-length(times)], out_time=times[-1L], key=c("in_time","out_time"))
setkey(DT1, in_time, out_time)
foverlaps(DT2, DT1)[!is.na(id), uniqueN(id), .(i.in_time, i.out_time)]
输出的前八行:
i.in_time i.out_time V1
1: 2017-11-01 08:35:00 2017-11-01 08:40:00 1
2: 2017-11-01 08:40:00 2017-11-01 08:45:00 1
3: 2017-11-01 08:45:00 2017-11-01 08:50:00 1
4: 2017-11-01 09:05:00 2017-11-01 09:10:00 1
5: 2017-11-01 09:10:00 2017-11-01 09:15:00 1
6: 2017-11-01 09:15:00 2017-11-01 09:20:00 1
7: 2017-11-01 09:45:00 2017-11-01 09:50:00 1
8: 2017-11-01 09:50:00 2017-11-01 09:55:00 1
数据:
library(data.table)
DT1 <- data.table(
id=c(1,2,3,4,3,2),
in_time=c("2017-11-01 08:37:35","2017-11-01 09:07:44","2017-11-01 09:46:16","2017-11-01 10:32:29","2017-11-01 10:59:25","2017-11-01 13:24:12"),
out_time=c("2017-11-01 08:45:35","2017-11-01 09:15:30","2017-11-01 10:11:16","2017-11-01 10:37:05","2017-11-01 11:45:25","2017-11-01 14:10:09")
)
cols <- c("in_time", "out_time")
fmt <- "%Y-%m-%d %T"
DT1[, (cols) := lapply(.SD, as.POSIXct, format=fmt), .SDcols=cols]