Postgres:优化多对多关系表的连接

问题描述 投票:0回答:1

我有一个正在尝试优化的查询,但遇到了一些令人惊讶/令人困惑的结果。

我正在使用的表是功能和区域,每个功能和区域都有自己的 id 和几何图形。

                            Table "features"
   Column    |  Type    | Collation | Nullable | Default 
-------------+----------+-----------+----------+---------
 id          | bigint   |           | not null | 
 category    | text     |           | not null | 
 geom        | geometry |           | not null | 

Indexes:
    "features_pkey" PRIMARY KEY, btree (id)
    "features_category_idx" btree (category)

                            Table "areas"
   Column    |  Type    | Collation | Nullable | Default 
-------------+----------+-----------+----------+---------
 id          | bigint   |           | not null | 
 geom        | geometry |           | not null | 

Indexes:
    "features_pkey" PRIMARY KEY, btree (id)

下一个表存储

features
areas
之间的多对多关系,具有外键约束。每个特征可能位于零个、一个或多个区域中(如果它们不在任何区域中,则它们在
feature_area
表中没有条目),并且每个区域都有许多特征。


                        Table "feature_area"
   Column     |    Type  | Collation | Nullable | Default 
--------------+----------+-----------+----------+---------
 feature_id   | bigint   |           | not null | 
 area_id      | bigint   |           | not null | 
 category     | text     |           |          | 

Indexes:
    "feature_area_pkey" PRIMARY KEY, btree (feature_id, area_id)
    "feature_area_category_idx" btree (category)

Foreign-key constraints:
    "feature_area_feature_id_fkey" FOREIGN KEY (feature_id) REFERENCES features(feature_id)
    "feature_area_area_id_fkey" FOREIGN KEY (area_id) REFERENCES areas(area_id)

我想要得到的是这样的结果 -

type_x
类别的所有功能都属于任何区域:

  feature_id   |    areas    |    geom    
---------------+-------------+-------------
  1            | {45,123}    | xxxxxx
  3            | {8}         | xxxxxx

这是我正在处理的查询。非常慢(~35 秒)。

WITH area_type_x AS (
  SELECT 
    feature_id,
    array_agg(area_id) AS areas
  FROM feature_area
  WHERE category = 'long name for type x'
  GROUP BY feature_id
)
SELECT
  features.id feature_id,
  features.geom,
  area_type_x.areas
FROM area_type_x
JOIN features ON features.id = area_type_x.feature_id;

偶然的机会,我尝试了这个,速度快多了(<3 seconds).

WITH area_type_x AS (
  SELECT 
    feature_id,
    array_agg(area_id) AS areas
  FROM feature_area
  WHERE short_name(category) = 'type_x' -- this line is the only difference
  GROUP BY feature_id
)
SELECT
  features.id feature_id,
  features.geom,
  area_type_x.areas
FROM area_type_x
JOIN features ON features.id = area_type_x.feature_id;

我用

EXPLAIN ANALYZE
运行了每个结果,如果有帮助的话可以分享这些结果,但我自己还无法理解它们。

知道发生了什么事吗?我想弄清楚,因为我怀疑如果我可以跳过将

category
转换为其简短版本,但保留它给我带来的任何改进,我可能能够做得比 3 秒更好。

postgresql postgis postgresql-15
1个回答
0
投票

不要使用 CTE:

SELECT
  features.id feature_id,
  features.geom,
  array_agg(area_id) AS areas
FROM feature_area
JOIN features ON features.id = feature_area.feature_id
WHERE category = 'long name for type x'
GROUP BY 1, 2
© www.soinside.com 2019 - 2024. All rights reserved.