我有一个数据集,用于工厂机器的 OEE 监控。在其他数据中,机器会自动报告其运行状态。工厂实行三班制:
dataframe 中的行不会在 shift-changes 时拆分。以下是可用数据的示例。在此示例中,第一行从 2023-03-09 23:30:00 开始,到 2023-03-10 15:30 结束。这条线应该分成三部分:
这可能与解决方案无关,但请注意,报告日从 23:00:00 运行到 23:00:00,导致“report_date”和“starttime_report_day”之间出现明显的错位。
基于虚拟数据构建示例数据框的代码块:
df <-
data.frame(
starttime = c(
as.POSIXct("2023-03-09 23:00:00"),
as.POSIXct("2023-04-02 07:00:00"),
as.POSIXct("2023-04-10 15:00:00")
),
endtime = c(
as.POSIXct("2023-03-10 15:30:00"),
as.POSIXct("2023-04-02 15:00:00"),
as.POSIXct("2023-04-10 16:00:00")
),
operation_code = c("machine setup", "operational", "crash")
) %>%
mutate(
report_date = as.Date(ifelse(
hour(starttime) < 23,
as.Date(starttime),
as.Date(starttime) + days(1)
), origin = "1970-01-01"),
shift = case_when(
hour(starttime) + minute(starttime) / 60 >= 23 |
hour(starttime) + minute(starttime) / 60 < 6 ~ "night",
hour(starttime) + minute(starttime) / 60 >= 6 &
hour(starttime) + minute(starttime) / 60 < 14.5 ~ "morning",
hour(starttime) + minute(starttime) / 60 >= 14.5 &
hour(starttime) + minute(starttime) / 60 < 23 ~ "evening"
),
starttime_report_day = as.POSIXct(report_date - days(1) + hours(23)),
starttime_shift = case_when(
shift == "night" ~ starttime_report_day,
shift == "morning" ~ starttime_report_day + hours(7),
shift == "evening" ~ starttime_report_day + hours(15) + minutes(30)
),
endtime_shift = case_when(
shift == "night" ~ starttime_report_day + hours(7),
shift == "morning" ~ starttime_report_day + hours(15) + minutes(30),
shift == "evening" ~ starttime_report_day + hours(24)
)
)
df
df
df的输出:
| starttime | endtime | operation_code | report_date | shift | starttime_report_day | starttime_shift | endtime_shift |
| ------------------- | ------------------- | ---------------- | ------------ | ------- | -------------------- | ------------------- | ------------------- |
| 2023-03-09 23:00:00 | 2023-03-10 15:30:00 | machine setup | 2023-03-10 | night | 2023-03-09 23:00:00 | 2023-03-09 23:00:00 | 2023-03-10 06:00:00 |
| 2023-04-02 07:00:00 | 2023-04-02 15:00:00 | operational | 2023-04-02 | morning | 2023-04-01 23:00:00 | 2023-04-02 06:00:00 | 2023-04-02 14:30:00 |
| 2023-04-10 15:00:00 | 2023-04-10 16:00:00 | crash | 2023-04-10 | evening | 2023-04-09 23:00:00 | 2023-04-10 14:30:00 | 2023-04-10 23:00:00 |
我发现这个答案有点类似的问题,其中必须在日期更改时拆分行,作为练习,我成功地在我的数据集上工作了。
基于上面链接中找到的解决方案的代码块:
# Helper Functions ------------------------------------------------------------------------------------------------
endsOnOtherDay <- function(df) {
as_date(df$starttime) != as_date(df$endtime)
}
split1rowInto2Days <- function(df) {
df1 <- df
df2 <- df
df1$endtime <-
as_date(df1$starttime) + days(1) - milliseconds(1)
df2$starttime <- as_date(df2$starttime) + days(1)
rbind(df1, df2)
}
splitDates <- function(df) {
if (nrow(df) > 1) {
return(df %>%
split(f = 1:nrow(df)) %>%
lapply(splitDates) %>%
reduce(rbind))
}
if (df %>% endsOnOtherDay()) {
return(df %>%
split1rowInto2Days() %>%
splitDates())
}
df
}
# The actual Calculation ------------------------------------------------------------------------------------------
df_split <- df %>%
splitDates() %>%
mutate(duration_hours = round(difftime(endtime + 1,
starttime,
units = "hours"), 3)
,
date_calc = as_date(starttime)) %>%
group_by(starttime,
operation_code) %>%
summarise(duration_hours = sum(duration_hours)) %>%
mutate(
endtime_calc = starttime + duration_hours,
actual_date = as.Date(starttime),
report_date = as.Date(ifelse(
hour(starttime) < 23,
as.Date(starttime),
as.Date(starttime) + days(1)
), origin = "1970-01-01")
)
df_split <- df_split %>%
mutate(shift = case_when(
hour(starttime) + minute(starttime) / 60 >= 23 |
hour(starttime) + minute(starttime) / 60 < 6 ~ "night",
hour(starttime) + minute(starttime) / 60 >= 6 &
hour(starttime) + minute(starttime) / 60 < 14.5 ~ "morning",
hour(starttime) + minute(starttime) / 60 >= 14.5 &
hour(starttime) + minute(starttime) / 60 < 23 ~ "evening"
))
df_split
df_split的输出:
| starttime | operation_code | duration_hours | endtime_calc | actual_date | report_date | shift |
| ------------------- | -------------- | -------------- | ------------------- | ----------- | ----------- | ------- |
| 2023-03-09 23:00:00 | machine setup | 2.0 hours | 2023-03-10 01:00:00 | 2023-03-09 | 2023-03-10 | night |
| 2023-03-10 01:00:00 | machine setup | 14.5 hours | 2023-03-10 15:30:00 | 2023-03-10 | 2023-03-10 | night |
| 2023-04-02 07:00:00 | operational | 8.0 hours | 2023-04-02 15:00:00 | 2023-04-02 | 2023-04-02 | morning |
| 2023-04-10 15:00:00 | crash | 1.0 hours | 2023-04-10 16:00:00 | 2023-04-10 | 2023-04-10 | evening |
出于某种原因,df_split 中的拆分现在是在 01:00:00 进行的,而不是预期的 00:00:00。撇开那点小错误不谈,这至少证明了分裂是完全可能的。 但是,我不希望数据框的行在日期更改时拆分,而是在轮班更改时拆分(当结束时间超过 endtime_shift 时拆分),这似乎很难做到。我真的很感激一些帮助。
行拆分后的期望输出:
| starttime | endtime | operation_code | report_date | shift | starttime_report_day | starttime_shift | endtime_shift |
| ------------------- | ------------------- | ---------------- | ------------ | ------- | -------------------- | ------------------- | --------------------|
| 2023-03-09 23:00:00 | 2023-03-10 06:00:00 | machine setup | 2023-03-10 | night | 2023-03-09 23:00:00 | 2023-03-09 23:00:00 | 2023-03-10 06:00:00 |
| 2023-03-10 06:00:00 | 2023-03-10 14:30:00 | machine setup | 2023-03-10 | morning | 2023-03-09 23:00:00 | 2023-03-09 06:00:00 | 2023-03-10 14:30:00 |
| 2023-03-10 14:30:00 | 2023-03-10 15:30:00 | machine setup | 2023-03-10 | evening | 2023-03-09 23:00:00 | 2023-03-09 14:30:00 | 2023-03-10 23:00:00 |
| 2023-04-02 07:00:00 | 2023-04-02 14:30:00 | operational | 2023-04-02 | morning | 2023-04-01 23:00:00 | 2023-04-02 06:00:00 | 2023-04-02 14:30:00 |
| 2023-04-02 14:30:00 | 2023-04-02 15:00:00 | operational | 2023-04-02 | evening | 2023-04-01 23:00:00 | 2023-04-02 14:30:00 | 2023-04-02 23:00:00 |
| 2023-04-10 15:00:00 | 2023-04-10 16:00:00 | crash | 2023-04-10 | evening | 2023-04-09 23:00:00 | 2023-04-10 14:30:00 | 2023-04-10 22:00:00 |