我有一个表,其中保存与工作时间和任务相关的信息,我想获取每个用户的总工作时间。但每个用户都可以同时处理多个任务。 所以,结果是我的任务重叠了。 这是我的桌子
用户ID | 任务_id | 开始_日期时间 | 结束_日期时间 | 总时间 |
---|---|---|---|---|
用户1 | 任务1 | 2023-08-09 08:00:00 | 2023-08-09 09:00:00 | 01:00:00 |
用户1 | 任务2 | 2023-08-09 08:15:00 | 2023-08-09 10:00:00 | 01:45:00 |
用户2 | 任务1 | 2023-08-09 08:30:00 | 2023-08-09 10:00:00 | 01:30:00 |
用户2 | 任务2 | 2023-08-09 09:00:00 | 2023-08-09 11:30:00 | 02:30:00 |
用户1 | 任务3 | 2023-08-09 11:15:00 | 2023-08-09 13:00:00 | 02:45:00 |
用户2 | 任务3 | 2023-08-09 15:15:00 | 2023-08-09 16:00:00 | 00:45:00 |
用户2 | 任务1 | 2023-08-09 15:20:00 | 2023-08-09 16:00:00 | 00:40:00 |
如果我尝试获取每个用户的工作总时间,我会得到这个,但这不是真实的,因为用户是多任务的
用户ID | 每天总工作时间 |
---|---|
用户1 | 05:30:00 |
用户2 | 05:25:00 |
我想得到的是每个用户忙碌的总时间。 所以结果是:
用户ID | 总时间 |
---|---|
用户1 | 04:45:00 |
用户2 | 03:45:00 |
我尝试了一些查询并阅读了有关 CTE 和重叠的内容,但我无法得到正确的结果。
表中用户 1 任务 3 的总时间错误,应为 01:45:00
这里有一个解决方案
with table1 (Userid, Task_id, Start, End, Total_Time) as (
VALUES
('User1', 'Task1', timestamp '2023-08-09 08:00:00', timestamp '2023-08-09 09:00:00', time '01:00:00'),
('User1', 'Task2', '2023-08-09 08:15:00', '2023-08-09 10:00:00', '01:45:00'),
('User2', 'Task1', '2023-08-09 08:30:00', '2023-08-09 10:00:00', '01:30:00'),
('User2', 'Task2', '2023-08-09 09:00:00', '2023-08-09 11:30:00', '02:30:00'),
('User1', 'Task3', '2023-08-09 11:15:00', '2023-08-09 13:00:00', '02:45:00'),
('User2', 'Task3', '2023-08-09 15:15:00', '2023-08-09 16:00:00', '00:45:00'),
('User2', 'Task1', '2023-08-09 15:20:00', '2023-08-09 16:00:00', '00:40:00')
),
-- is case when two tasks share the exact same period
distinct_periods as (
select distinct userid, start, end from table1
),
-- extend periods
periods (n, userid, start, end) as (
-- start with periods that are not preceded by an overlaping period
select 1, userid, start, end from distinct_periods t1
where not exists (
select * from distinct_periods t2
where t2.userid = t1.userid
and (t1.start > t2.start and t1.start <= t2.end
or t1.start = t2.start and t1.end > t2.end)
)
-- extend with those that overlap
union all
select n+1, periods.userid, periods.start, distinct_periods.end
from periods, distinct_periods
where
distinct_periods.userid = periods.userid
and distinct_periods.start between periods.start and periods.end
and distinct_periods.end > periods.end
-- and n < 10
),
-- add a rank by end date descending so that the widest period has rank 1
with_rank as (
select periods.*, rank() over(partition by userid, start order by end desc) rank from periods order by userid, start
)
-- sum the lengths of periods of rank 1
select
userid,
time '00:00:00' + sum(minutes_between(end, start)) minutes
as time_busy
from with_rank where rank = 1 group by userid
用户ID | TIME_BUSY |
---|---|
用户1 | 03:45:00 |
用户2 | 03:45:00 |