计算R中排除非工作时间/周末后的持续时间

问题描述 投票:0回答:1

为了评估回答客户的查询需要多少时间,我有以下数据框 (df1),其中包含列

ID, Task, Date_time
,其中
ID = customer number
Task = Task done by the employee related to the query
Date_time = POSIXct column to specify when the task was conducted

对于每个客户,我想找到

Task == "New"
Task == "Closed"
之间的持续时间(以分钟为单位)。

要计算以分钟为单位的持续时间,我必须考虑:

  1. Task == "Closed" 可能出现多次,因此计算时只考虑最后一个

    Task == "Closed"

  2. 工作时间为周一至周五上午 8 点至下午 5 点(欧洲中部时间)(8:00 至 17:00)。持续时间必须排除非工作时间(下午 5 点至上午 8 点)和周末(周六和周日)。

考虑到上述几点,有人可以建议如何计算持续时间吗?谢谢!

数据框如下所示:

           ID             Task           Date_time
1   customer1              New 2022-11-09 15:33:32
2   customer1             Edit 2022-11-09 15:38:40
4   customer1         Answered 2022-11-09 15:44:44
5   customer1 FeedbackRequired 2022-11-11 08:02:51
6   customer1           Closed 2022-11-17 15:04:23
8   customer2              New 2022-04-11 13:55:22
9   customer2             Edit 2022-04-11 13:59:53
11  customer2         Answered 2022-05-11 11:17:15
12  customer2 FeedbackRequired 2022-05-11 11:17:41
13  customer2           Closed 2022-08-17 13:23:29
15  customer2           Closed 2022-08-17 13:24:24
17  customer2           Closed 2022-08-17 13:32:41

这是一个示例数据框:

df1 <- structure(list(ID = c("customer1", "customer1", "customer1", 
"customer1", "customer1", "customer2", "customer2", "customer2", 
"customer2", "customer2", "customer2", "customer2", "customer5", 
"customer5", "customer5", "customer5", "customer5", "customer3", 
"customer3", "customer3", "customer3", "customer3", "customer3", 
"customer3", "customer3", "customer3", "customer3", "customer4", 
"customer4", "customer4", "customer4", "customer4"), Task = c("New", 
"Edit", "Answered", "FeedbackRequired", "Closed", "New", "Edit", 
"Answered", "FeedbackRequired", "Closed", "Closed", "Closed", 
"New", "Edit", "Answered", "FeedbackRequired", "Closed", "New", 
"Edit", "HubAdded", "Answered", "FeedbackRequired", "Closed", 
"Closed", "Closed", "Closed", "Closed", "New", "Edit", "Answered", 
"FeedbackRequired", "Closed"), Date_time = structure(c(1668008012.93733, 
1668008320.29733, 1668008684.57472, 1668153771.45687, 1668697463.01071, 
1649685322.67473, 1649685593.46752, 1652267835.13924, 1652267861.07935, 
1660742609.41271, 1660742664.11297, 1660743161.80927, 1678295469.58648, 
1678295749.33997, 1678359922.0184, 1678787443.43049, 1680703787.10976, 
1661514257.02831, 1661514383.23061, 1661526698.41032, 1661527095.83771, 
1661527117.512, 1662457363.51916, 1662457378.0676, 1662457519.11092, 
1663232439.58358, 1663246649.3237, 1680252406.63738, 1680253548.17636, 
1680254179.34628, 1680254196.74463, 1680257109.1508), class = c("POSIXct", 
"POSIXt"), tzone = "UTC")), row.names = c(1L, 2L, 4L, 5L, 6L, 
8L, 9L, 11L, 12L, 13L, 15L, 17L, 18L, 19L, 21L, 22L, 23L, 65L, 
66L, 68L, 69L, 70L, 71L, 73L, 75L, 77L, 79L, 994L, 995L, 997L, 
998L, 999L), class = "data.frame")
r duration posixct
1个回答
0
投票

如果每个

Task == NEW
都出现一次
ID
,我们可以进行以下数据操作:

df2 = df1[df1$Task %in% "Closed", ]
xyzzy = 
  lapply(split(df2, df2$ID), \(x) x[which.max(x$Date_time), ]) |>
  do.call(what = "rbind", args = _) |>
  # ugly line:
  { \(.) rbind(... = df1[df1$Task %in% "New", ], ... = .) }() |>
  `rownames<-`(NULL) |> # cosmectics 
  reshape(idvar = "ID", timevar = "Task",  direction = "wide")

给予

> xyzzy
         ID       Date_time.New    Date_time.Closed
1 customer1 2022-11-09 15:33:32 2022-11-17 15:04:23
2 customer2 2022-04-11 13:55:22 2022-08-17 13:32:41
3 customer5 2023-03-08 17:11:09 2023-04-05 14:09:47
4 customer3 2022-08-26 11:44:17 2022-09-15 12:57:29
5 customer4 2023-03-31 08:46:46 2023-03-31 10:05:09

管道中的

rbind
线特别难看。计算 difftime w.r.t.营业时间,从
businessDuration()
开始
{BusinessDuration}
似乎是一个选项:

library(BusinessDuration)
xyzzy$wh = 
  vapply(X = seq_len(nrow(xyzzy)), 
         FUN = \(i) businessDuration(startdate = xyzzy$Date_time.New[[i]], 
                                     enddate = xyzzy$Date_time.Closed[[i]], 
                                     starttime = "07:00:00", 
                                     endtime = "17:00:00", 
                                     unit = "hour"),
         FUN.VALUE = numeric(1L))

结果

> xyzzy
         ID       Date_time.New    Date_time.Closed         wh
1 customer1 2022-11-09 15:33:32 2022-11-17 15:04:23  59.514167
2 customer2 2022-04-11 13:55:22 2022-08-17 13:32:41 919.621944
3 customer5 2023-03-08 17:11:09 2023-04-05 14:09:47 199.163056
4 customer3 2022-08-26 11:44:17 2022-09-15 12:57:29 141.220000
5 customer4 2023-03-31 08:46:46 2023-03-31 10:05:09   1.306389

看起来

businessDuration()
没有矢量化,因此我们用
vapply
循环。看一下帮助文件:

周末清单
自定义周末列表。默认为“星期六”和“星期日”

假期清单
自定义假期列表。默认为 NULL

© www.soinside.com 2019 - 2024. All rights reserved.