根据一天中的时间将行分成多行(班次变化)

问题描述 投票:0回答:0

我有一个数据集,用于工厂机器的 OEE 监控。在其他数据中,机器会自动报告其运行状态。工厂实行三班制:

  • 晚上:从23:00:00到06:00:00(7小时)
  • 早上:从06:00:00到14:30:00(8.5小时)
  • 晚上:从14:30:00到23:00:00(8.5小时)

dataframe 中的行不会在 shift-changes 时拆分。以下是可用数据的示例。在此示例中,第一行从 2023-03-09 23:30:00 开始,到 2023-03-10 15:30 结束。这条线应该分成三部分:

  • 晚上:从23:30:00到06:00:00
  • 早上:从06:00:00到14:30:00
  • 晚上:从14:30:00到15:30:00

这可能与解决方案无关,但请注意,报告日从 23:00:00 运行到 23:00:00,导致“report_date”和“starttime_report_day”之间出现明显的错位。

基于虚拟数据构建示例数据框的代码块:

df <-
  data.frame(
    starttime = c(
      as.POSIXct("2023-03-09 23:00:00"),
      as.POSIXct("2023-04-02 07:00:00"),
      as.POSIXct("2023-04-10 15:00:00")
    ),
    endtime = c(
      as.POSIXct("2023-03-10 15:30:00"),
      as.POSIXct("2023-04-02 15:00:00"),
      as.POSIXct("2023-04-10 16:00:00")
    ),
    operation_code = c("machine setup", "operational", "crash")
  ) %>%
  mutate(
    report_date = as.Date(ifelse(
      hour(starttime) < 23,
      as.Date(starttime),
      as.Date(starttime) + days(1)
    ), origin = "1970-01-01"),
    shift = case_when(
      hour(starttime) + minute(starttime) / 60 >= 23 |
        hour(starttime) + minute(starttime) / 60 < 6 ~ "night",
      hour(starttime) + minute(starttime) / 60 >= 6 &
        hour(starttime) + minute(starttime) / 60 < 14.5 ~ "morning",
      hour(starttime) + minute(starttime) / 60 >= 14.5 &
        hour(starttime) + minute(starttime) / 60 < 23 ~ "evening"
    ),
    starttime_report_day = as.POSIXct(report_date - days(1) + hours(23)),
    starttime_shift = case_when(
      shift == "night" ~ starttime_report_day,
      shift == "morning" ~ starttime_report_day + hours(7),
      shift == "evening" ~ starttime_report_day + hours(15) + minutes(30)
    ),
    endtime_shift = case_when(
      shift == "night" ~ starttime_report_day + hours(7),
      shift == "morning" ~ starttime_report_day + hours(15) + minutes(30),
      shift == "evening" ~ starttime_report_day + hours(24)
    )
  )
df
df

df的输出:

| starttime           | endtime             |  operation_code  | report_date  | shift   | starttime_report_day | starttime_shift     | endtime_shift       |
| ------------------- | ------------------- | ---------------- | ------------ | ------- | -------------------- | ------------------- | ------------------- |
| 2023-03-09 23:00:00 | 2023-03-10 15:30:00 | machine setup    | 2023-03-10   | night   | 2023-03-09 23:00:00  | 2023-03-09 23:00:00 | 2023-03-10 06:00:00 |
| 2023-04-02 07:00:00 | 2023-04-02 15:00:00 | operational      | 2023-04-02   | morning | 2023-04-01 23:00:00  | 2023-04-02 06:00:00 | 2023-04-02 14:30:00 | 
| 2023-04-10 15:00:00 | 2023-04-10 16:00:00 | crash            | 2023-04-10   | evening | 2023-04-09 23:00:00  | 2023-04-10 14:30:00 | 2023-04-10 23:00:00 | 

我发现这个答案有点类似的问题,其中必须在日期更改时拆分行,作为练习,我成功地在我的数据集上工作了。

基于上面链接中找到的解决方案的代码块:

# Helper Functions ------------------------------------------------------------------------------------------------

endsOnOtherDay <- function(df) {
  as_date(df$starttime) != as_date(df$endtime)
}

split1rowInto2Days <- function(df) {
  df1 <- df
  df2 <- df
  df1$endtime <-
    as_date(df1$starttime) + days(1) - milliseconds(1)
  df2$starttime <- as_date(df2$starttime) + days(1)
  rbind(df1, df2)
}


splitDates <- function(df) {
  if (nrow(df) > 1) {
    return(df %>%
             split(f = 1:nrow(df)) %>%
             lapply(splitDates) %>%
             reduce(rbind))
  }
  
  if (df %>% endsOnOtherDay()) {
    return(df %>%
             split1rowInto2Days() %>%
             splitDates())
  }
  
  df
}

# The actual Calculation ------------------------------------------------------------------------------------------

df_split <- df %>%
  splitDates() %>%
  mutate(duration_hours = round(difftime(endtime + 1,
                               starttime,
                               units = "hours"), 3)
         ,
         date_calc = as_date(starttime)) %>%
  group_by(starttime,
           operation_code) %>%
  summarise(duration_hours = sum(duration_hours)) %>%
  mutate(
    endtime_calc = starttime + duration_hours,
    actual_date = as.Date(starttime),
    report_date = as.Date(ifelse(
      hour(starttime) < 23,
      as.Date(starttime),
      as.Date(starttime) + days(1)
    ), origin = "1970-01-01")
  )

df_split <- df_split %>% 
  mutate(shift = case_when(
      hour(starttime) + minute(starttime) / 60 >= 23 |
        hour(starttime) + minute(starttime) / 60 < 6 ~ "night",
      hour(starttime) + minute(starttime) / 60 >= 6 &
        hour(starttime) + minute(starttime) / 60 < 14.5 ~ "morning",
      hour(starttime) + minute(starttime) / 60 >= 14.5 &
        hour(starttime) + minute(starttime) / 60 < 23 ~ "evening"
    ))
df_split

df_split的输出:

| starttime           | operation_code | duration_hours | endtime_calc        | actual_date | report_date | shift   |
| ------------------- | -------------- | -------------- | ------------------- | ----------- | ----------- | ------- |
| 2023-03-09 23:00:00 | machine setup  | 2.0 hours      | 2023-03-10 01:00:00 | 2023-03-09  | 2023-03-10  | night   |
| 2023-03-10 01:00:00 | machine setup  | 14.5 hours     | 2023-03-10 15:30:00 | 2023-03-10  | 2023-03-10  | night   |
| 2023-04-02 07:00:00 | operational    | 8.0 hours      | 2023-04-02 15:00:00 | 2023-04-02  | 2023-04-02  | morning |
| 2023-04-10 15:00:00 | crash          | 1.0 hours      | 2023-04-10 16:00:00 | 2023-04-10  | 2023-04-10  | evening |

出于某种原因,df_split 中的拆分现在是在 01:00:00 进行的,而不是预期的 00:00:00。撇开那点小错误不谈,这至少证明了分裂是完全可能的。 但是,我不希望数据框的行在日期更改时拆分,而是在轮班更改时拆分(当结束时间超过 endtime_shift 时拆分),这似乎很难做到。我真的很感激一些帮助。

行拆分后的期望输出:

| starttime           | endtime             |  operation_code  | report_date  | shift   | starttime_report_day | starttime_shift     | endtime_shift       |
| ------------------- | ------------------- | ---------------- | ------------ | ------- | -------------------- | ------------------- | --------------------|
| 2023-03-09 23:00:00 | 2023-03-10 06:00:00 | machine setup    | 2023-03-10   | night   | 2023-03-09 23:00:00  | 2023-03-09 23:00:00 | 2023-03-10 06:00:00 |
| 2023-03-10 06:00:00 | 2023-03-10 14:30:00 | machine setup    | 2023-03-10   | morning | 2023-03-09 23:00:00  | 2023-03-09 06:00:00 | 2023-03-10 14:30:00 |
| 2023-03-10 14:30:00 | 2023-03-10 15:30:00 | machine setup    | 2023-03-10   | evening | 2023-03-09 23:00:00  | 2023-03-09 14:30:00 | 2023-03-10 23:00:00 |
| 2023-04-02 07:00:00 | 2023-04-02 14:30:00 | operational      | 2023-04-02   | morning | 2023-04-01 23:00:00  | 2023-04-02 06:00:00 | 2023-04-02 14:30:00 | 
| 2023-04-02 14:30:00 | 2023-04-02 15:00:00 | operational      | 2023-04-02   | evening | 2023-04-01 23:00:00  | 2023-04-02 14:30:00 | 2023-04-02 23:00:00 | 
| 2023-04-10 15:00:00 | 2023-04-10 16:00:00 | crash            | 2023-04-10   | evening | 2023-04-09 23:00:00  | 2023-04-10 14:30:00 | 2023-04-10 22:00:00 | 
r dataframe split row
© www.soinside.com 2019 - 2024. All rights reserved.