我在PostgreSQL 10.5中有一个表trips
:
id start_date end_date
----------------------------
1 02/01/2019 02/03/2019
2 02/02/2019 02/03/2019
3 02/06/2019 02/07/2019
4 02/06/2019 02/14/2019
5 02/06/2019 02/06/2019
我想计算与给定周数重叠的旅行天数。表中的旅行具有包容性界限。周从周一开始,周日结束。预期结果将是:
week_of days_utilized
------------------------
01/28/19 5
02/04/19 8
02/11/19 4
对于日历参考:
Monday 01/28/19 - Sunday 02/03/19
Monday 02/04/19 - Sunday 02/10/19
Monday 02/11/19 - Sunday 02/17/19
我知道如何用我使用的编程语言写这个,但我更喜欢在Postgres中这样做,我不清楚从哪里开始......
你似乎想要generate_series()
和join
和group by
。计算所涵盖的一周:
select gs.wk, count(t.id) as num_trips
from generate_series('2019-01-28'::date, '2019-02-11'::date, interval '1 week') gs(wk) left join
trips t
on gs.wk <= t.end_date and
gs.wk + interval '6 day' >= t.start_date
group by gs.wk
order by gs.wk;
编辑:
我看到你想要的日子。这在聚合中稍微有点工作:
select gs.wk, count(t.id) as num_trips,
sum( 1 +
extract(day from (least(gs.wk + interval '6 day', t.end_date) - greatest(gs.wk, t.start_date)))
) as days_utilized
from generate_series('2019-01-28'::date, '2019-02-11'::date, interval '1 week') gs(wk) left join
trips t
on gs.wk <= t.end_date and
gs.wk + interval '6 day' >= t.start_date
group by gs.wk
order by gs.wk;
注意:这不会返回您的确切结果。我认为这些是正确的。
我会考虑range types。使用range operators使计算更简单,更清晰 - 我使用下面的重叠&&
和交叉点*
。我们可以使用功能性GiST or SP-GiST index快速查询 - 如果表格很大。喜欢:
CREATE INDEX trip_range_idx ON trip
USING gist (daterange(start_date, end_date, '[]'));
然后您的查询可以使用此索引:
SELECT week
, count(overlap) AS ct_trips
, sum(upper(overlap) - lower(overlap)) AS days_utilized
FROM (
SELECT week, trip * week AS overlap
FROM (
SELECT daterange(mon::date, mon::date + 7) AS week
FROM generate_series(timestamp '2019-01-28'
, timestamp '2019-02-11'
, interval '1 week') mon
) w
LEFT JOIN (SELECT daterange(start_date, end_date, '[]') FROM trip) t(trip) ON trip && week
) sub
GROUP BY 1
ORDER BY 1;
db <>小提琴here
请注意,默认情况下,date_range
包含一个包含的低位和独占上限。你的范围包括上限和下限,所以用:daterange
创建daterange(start_date, end_date, '[]')
。函数upper()
仍然返回独占上限。因此,表达upper(overlap) - lower(overlap)
做正确的计算天数。
有一个原因我使用generate_series()
与timestamp
输入:
有关:
或者,如果您不想使用范围类型,请考虑使用OVERLAPS
运算符: