我有一个表“analytics_event”(有 970 万行),我需要查询给定时间后“创建”的记录。该表具有各种索引,这些索引被忽略以支持对表进行 seq 扫描。我已将问题归结为以下查询。
此查询不使用索引(大约需要 40 秒):
explain with batch_params as (
select
(now() - '1 minute'::interval)::timestamptz as created
) select * from private.analytics_event
where
created > (select created from batch_params);
-- output --
Seq Scan on analytics_event (cost=0.01..1620760.63 rows=3251510 width=1025)
Filter: (created > $0)
InitPlan 1 (returns $0)
-> Limit (cost=0.00..0.01 rows=1 width=8)
-> Result (cost=0.00..0.01 rows=1 width=8)
而此查询确实使用索引(保留 CTE,尽管无用)(花费不到一毫秒):
explain with batch_params as (
select
(now() - '1 minute'::interval)::timestamptz as created
) select * from private.analytics_event
where
created > (now() - '1 minute'::interval)::timestamptz;
-- output --
Index Scan using analytics_event_payload_event on analytics_event (cost=0.44..7.33 rows=1 width=1025)
Index Cond: (created > (now() - '00:01:00'::interval))
我的索引和表格如下:
CREATE TABLE private.analytics_event (
id uuid NOT NULL,
environment_id varchar NOT NULL,
payload jsonb NULL,
created timestamptz(3) DEFAULT now() NOT NULL,
CONSTRAINT analytics_event_pkey PRIMARY KEY (environment_id, id)
);
CREATE INDEX analytics_event_payload_event ON private.analytics_event USING btree (created, ((payload ->> 'foobar'::text)));
在原始的较大查询中,
batch_params
查询特定游标表中的这些值。
我已经尝试过:
在 CTE 表达式上添加
limit 1
(没有区别)。
将
batch_params
包装在函数中(没有区别),例如
CREATE OR REPLACE FUNCTION get_created_date() RETURNS timestamptz AS $BODY$
select created from cursor_table;
$BODY$ LANGUAGE SQL stable
将整个查询包装在一个具有声明变量
created
的函数中。该函数在查询分析表之前首先选择变量值into
。这使用了索引,但是它的可维护性要差得多,并且令人烦恼的是我无法再explain analyze
查询(因为它被包装在函数中)。
我认为的一个解决方法是在我的主机应用程序中使用一个事务来进行多个中间查询,但是如果可能的话我想避免这种情况。
如何重组它以利用我的索引,同时仍然保持我的参数动态?
Postgres 版本:
15.6
编辑并解释:
explain (analyze, verbose, buffers, settings) ... <the slow query with sub-query>.
-- output --
Seq Scan on private.analytics_event (cost=0.01..1620760.63 rows=3251510 width=1025) (actual time=52571.409..52571.410 rows=0 loops=1)
Output: analytics_event.id, analytics_event.environment_id, analytics_event.payload, analytics_event.created
Filter: (analytics_event.created > $0)
Rows Removed by Filter: 9798348
Buffers: shared hit=473037 read=1025792
I/O Timings: shared read=49404.088
InitPlan 1 (returns $0)
-> Result (cost=0.00..0.01 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=1)
Output: (now() - '00:01:00'::interval)
Settings: effective_cache_size = '7959688kB', jit = 'off', search_path = 'public, public, "$user"'
Query Identifier: -2998777079014550499
Planning Time: 0.087 ms
Execution Time: 52571.433 ms
explain (analyze, verbose, buffers, settings) ... <the fast/indexed query with inline condition>
-- output --
Index Scan using analytics_event_payload_event on private.analytics_event (cost=0.44..7.33 rows=1 width=1025) (actual time=0.006..0.006 rows=0 loops=1)
Output: id, environment_id, payload, created
Index Cond: (analytics_event.created > (now() - '00:01:00'::interval))
Buffers: shared hit=3
Settings: effective_cache_size = '7959688kB', jit = 'off', search_path = 'public, public, "$user"'
Query Identifier: -1698698247486258523
Planning:
Buffers: shared read=5
I/O Timings: shared read=3.082
Planning Time: 3.208 ms
Execution Time: 0.023 ms
214M
。last_autoanalyze
(数据是在很长一段时间内添加的)。PostgreSQL 优化器不够智能,无法拉出子查询,所以你应该自己做:
WITH batch_params AS (
SELECT current_timestamp - '1 minute'::interval AS created
)
SELECT *
FROM private.analytics_event
CROSS JOIN batch_params
WHERE analytics_event.created > batch_params.created;