我有PostgreSql查询如下:
SELECT DISTINCT ON (reference) reference, reference_url
FROM vehicles v
WHERE NOT EXISTS
(select reference
from daily_run_vehicle rv
WHERE ((
handled = False
AND retries >= 5 )
OR rv.timestamp::timestamp::date = now()::date)
AND v.reference=reference);
其中vehicles
表有大约400k记录,daily_run_vehicle
表有大约5000万条记录。
因此,我需要所有车辆今天没有添加到daily_run_vehicle
的车辆或处理列是False
并重试column is >= 5
。
但问题是查询执行时间太长。
有没有办法更好地编写它以便更快地执行?
我有一个理论,它可能与调用now()函数数百万次有关。您可以通过运行此查询来验证
SELECT DISTINCT ON (reference) reference, reference_url
FROM vehicles v
WHERE NOT EXISTS
(select reference
from daily_run_vehicle rv
WHERE ((
handled = False
AND retries >= 5 )
OR rv.timestamp::timestamp::date = '2019-03-06')
AND v.reference=reference);
它的性能得到改善,你必须将今天的日期设置为一个变量并在查询中使用变量,这样现在只需要调用一次。如果你使用EXISTS,那么传统就是选择SELECT ... FROM ...你不关心这些值是否至少有一个或没有。
嗯。我在想:
SELECT DISTINCT ON (v.reference) v.reference, v.reference_url
FROM vehicles v
WHERE NOT EXISTS (select 1
from daily_run_vehicle rv
where rv.reference = v.reference and
rv.handled = False and
rv.retries >= 5
) and
NOT EXISTS (select 1
from daily_run_vehicle rv
where rv.reference = v.reference and
rv.timestamp >= current_date::timestamp and
rv.timestamp >= (current_date + interval '1 day'::timestamp
)
ORDER BY v.reference;
对于此查询,您需要索引:
daily_run_vehicle(reference, handled, retries)
daily_run_vehicle(reference, timestamp)
reference_url(reference, reference_url)