我使用 PostgreSQL v15.4。我有下一个疑问:
explain (analyze, costs, verbose, buffers)
with vimages as (
SELECT
vim.view_id,
array_to_json(array_agg(row_to_json( row(
vim.created_at,
igim.id,
igim.name,
vim.sequence_no
)) order by vim.sequence_no desc)) as json
FROM view_images vim
JOIN xx_images igim ON vim.xx_image_id = igim.id
GROUP BY vim.view_id
)
select vimages.*
from xxs ig
left join vimages on vimages.view_id = ig.id
WHERE ig.alias = '257_belmont_cir_brunswick_ga'
如果我对结果行的 id 进行硬编码,例如。
and ig.id = 682430783638437250
然后查询只需 仅 6 毫秒。
是否可以以这种方式重新格式化我的查询,以便当我按
vim
列查询 ig
表时,立即过滤然后聚合它们 alias
表?就像我对 ID 的值进行硬编码时发生的情况一样。
示例数据集(小 10 倍):
CREATE UNLOGGED TABLE views( view_id INTEGER NOT NULL, alias INTEGER NOT NULL );
INSERT INTO views SELECT n,n/3 FROM generate_series(1,10000) n;
ALTER TABLE views ADD PRIMARY KEY (view_id);
CREATE INDEX ON views(alias);
CREATE UNLOGGED TABLE images( image_id INTEGER NOT NULL, image_foo INT NOT NULL );
INSERT INTO images SELECT n,n FROM generate_series(1,400000) n;
ALTER TABLE images ADD PRIMARY KEY (image_id);
CREATE UNLOGGED TABLE view_images( view_id INTEGER NOT NULL, image_id INTEGER NOT NULL, vim_foo INT NOT NULL );
INSERT INTO view_images SELECT (1+random()*10000)::INTEGER view_id, (1+random()*400000)::INTEGER image_id, n FROM generate_series(1,1500000) n;
CREATE INDEX ON view_images( view_id );
CREATE INDEX ON view_images( image_id );
VACUUM ANALYZE;
EXPLAIN ANALYZE
with vimages as (
SELECT
vim.view_id,
array_to_json(array_agg(row_to_json( row(
vim.vim_foo,
im.image_id,
im.image_foo
)) order by vim.vim_foo desc)) as json
FROM view_images vim
JOIN images im USING (image_id)
GROUP BY vim.view_id
)
select vimages.*
from views
left join vimages USING (view_id)
WHERE views.alias = 1234;
它准确地再现了您的慢速查询计划。
将 CTE 移至查询中:没有变化。
简化查询(如下):自从我在聚合中删除了 ORDER BY 后,计划不再涉及排序。我没有粘贴计划,问题仍然存在,它仍在对图像和 images_views 进行 seq 扫描和哈希连接:
EXPLAIN SELECT * FROM views LEFT JOIN (
SELECT vim.view_id, array_agg( im.image_id ) as agg
FROM view_images vim
JOIN images im USING (image_id)
GROUP BY vim.view_id
) vimages USING (view_id)
WHERE views.alias = 1234;
在我的示例数据中,alias=1234 对应于 WHERE view_id IN (3702,3703,3704)。如果我将其放在查询末尾,则不会发生任何变化。如果我把它放在子查询中,我就会得到快速计划。
因此问题似乎是它没有在子查询内传播 view_id 上的连接条件。
解决方案#1:移动 GROUP BY
EXPLAIN SELECT views.view_id, array_agg( im.image_id ) as agg
FROM views
LEFT JOIN view_images vim USING (view_id)
LEFT JOIN images im USING (image_id)
WHERE views.alias = 1234
GROUP BY views.view_id;
解决方案#2:使用 LATERAL 能够显式移动有问题的连接谓词
EXPLAIN SELECT * FROM views LEFT JOIN LATERAL (
SELECT vim.view_id, array_agg( im.image_id ) as agg
FROM view_images vim
JOIN images im USING (image_id)
WHERE vim.view_id=views.view_id
GROUP BY vim.view_id
) vimages USING (view_id)
WHERE views.alias = 1234;
两者都通过对图像进行索引扫描来获得快速计划,并且 view_images 仅命中它们应该命中的行。
如果views.alias上的条件被删除并且实际上需要扫描整个表,#1将恢复为seq扫描,在这种情况下效果更好,但#2将继续运行嵌套循环,因此速度会更慢。因此,#1 将是我的首选选项,除非您在将其粘贴到问题中之前切断了其他一些查询部分,并且会妨碍 GROUP BY。