我有以下问题。我有以下结构的数据框:
startdatetime enddatetime type amount
1 2019-02-01 03:35:00 2019-02-03 06:35:00 prod1 1e+03
2 2019-02-03 06:35:00 2019-02-05 09:35:00 prod1 5e+03
3 2019-02-05 09:35:00 2019-02-06 01:35:00 prod2 3e+07
4 2019-02-06 01:35:00 2019-02-06 03:35:00 prod1 1e+02
代表在一定时间范围内(开始日期时间和结束日期时间)产生的数量。现在,我想每天汇总这些数据。让我们忽略不完整的一天2019-02-01并从2019-02-02开始。第一产品1在2019-02-01 03:35:00到2019-02-03 06:35:00之间生产了,总共生产了1000公斤。因此,例如,在2019-02-02:由于24/51*1000
而产生了产品1的24h + 21h + 6h = 51h
= 470.58。到目前为止,我有一个基于for和while循环的解决方案,但是我想有一个更快的解决方案是基于“ lubridate”软件包的,否则我找不到。有什么建议吗?在我的代码下面
#create test data set
mydata <- data.frame(startdatetime=c(as.POSIXct("2019-02-01 03:35:00"), as.POSIXct("2019-02-03 06:35:00"),as.POSIXct("2019-02-05 09:35:00"),as.POSIXct("2019-02-06 01:35:00")),
enddatetime =c(as.POSIXct("2019-02-03 06:35:00"), as.POSIXct("2019-02-05 09:35:00"),as.POSIXct("2019-02-06 01:35:00"),as.POSIXct("2019-02-06 03:35:00")),
type=c("prod1","prod1","prod2","prod1"),
amount=c(1000,5000,30000000,100))
# take only full days into account and ignore the first and the last day
minstartday = min(mydata$startdatetime)+24*60*60
maxendday = max(mydata$enddatetime)-24*60*60
#create a day index
timesindex <- seq(from = as.Date(format(minstartday, format = "%Y/%m/%d")),
to = as.Date(format(maxendday, format = "%Y/%m/%d")), by = "day")
# create an empty dataframe which will be filled with the production data for each day
prodperday <- data.frame(Date=as.Date(timesindex),
prod1=replicate(length(timesindex),0),
prod2=replicate(length(timesindex),0),
stringsAsFactors=FALSE)
# loop over all entries and separate them into produced fractions per day
for (irow in 1:dim(mydata)[1]){
timestart = mydata[irow,"startdatetime"]
datestart = as.Date(format(timestart, format = "%Y/%m/%d"))
timeend = timestart
tota_run_time_in_h = (as.numeric((mydata[irow,"enddatetime"]-mydata[irow,"startdatetime"])))*24.
while (timeend < mydata[irow,"enddatetime"]){
timeend = min (as.POSIXct(datestart, format = "%Y/%m/%d %H:%M:%S")+23*60*60-1,
mydata[irow,"enddatetime"])
tdiff = as.numeric(timeend-timestart)
fraction_prod = (tdiff/tota_run_time_in_h)*mydata[irow,"amount"]
if (datestart %in% prodperday$Date){
prodperday[prodperday$Date == datestart,as.character(mydata[irow,"type"])] =
prodperday[prodperday$Date == datestart,as.character(mydata[irow,"type"])] + fraction_prod
}
timestart = timeend+1
datestart = as.Date(format(timestart, format = "%Y/%m/%d"))
timeend = timestart
}
}
和结果:
Date prod1 prod2
1 2019-02-02 470.5828 0
2 2019-02-03 1836.5741 0
3 2019-02-04 2352.9139 0
4 2019-02-05 939.5425 1126280
这就是我要做的:
我提议的解决方案不是完美的,因为边界存在问题,但是按小时转换生产数据并按日汇总后转换数据的想法可能是个好主意。