Postgres：优化多对多关系表的连接

Question

我有一个正在尝试优化的查询，但遇到了一些令人惊讶/令人困惑的结果。

我正在使用的表是功能和区域，每个功能和区域都有自己的 id 和几何图形。

                            Table "features"
   Column    |  Type    | Collation | Nullable | Default 
-------------+----------+-----------+----------+---------
 id          | bigint   |           | not null | 
 category    | text     |           | not null | 
 geom        | geometry |           | not null | 

Indexes:
    "features_pkey" PRIMARY KEY, btree (id)
    "features_category_idx" btree (category)

                            Table "areas"
   Column    |  Type    | Collation | Nullable | Default 
-------------+----------+-----------+----------+---------
 id          | bigint   |           | not null | 
 geom        | geometry |           | not null | 

Indexes:
    "features_pkey" PRIMARY KEY, btree (id)

下一个表存储

features

和

areas

之间的多对多关系，具有外键约束。每个特征可能位于零个、一个或多个区域中（如果它们不在任何区域中，则它们在

feature_area

表中没有条目），并且每个区域都有许多特征。


                        Table "feature_area"
   Column     |    Type  | Collation | Nullable | Default 
--------------+----------+-----------+----------+---------
 feature_id   | bigint   |           | not null | 
 area_id      | bigint   |           | not null | 
 category     | text     |           |          | 

Indexes:
    "feature_area_pkey" PRIMARY KEY, btree (feature_id, area_id)
    "feature_area_category_idx" btree (category)

Foreign-key constraints:
    "feature_area_feature_id_fkey" FOREIGN KEY (feature_id) REFERENCES features(feature_id)
    "feature_area_area_id_fkey" FOREIGN KEY (area_id) REFERENCES areas(area_id)

我想要得到的是这样的结果 -

type_x

类别的所有功能都属于任何区域：

  feature_id   |    areas    |    geom    
---------------+-------------+-------------
  1            | {45,123}    | xxxxxx
  3            | {8}         | xxxxxx

这是我正在处理的查询。非常慢（~35 秒）。

WITH area_type_x AS (
  SELECT 
    feature_id,
    array_agg(area_id) AS areas
  FROM feature_area
  WHERE category = 'long name for type x'
  GROUP BY feature_id
)
SELECT
  features.id feature_id,
  features.geom,
  area_type_x.areas
FROM area_type_x
JOIN features ON features.id = area_type_x.feature_id;

偶然的机会，我尝试了这个，速度快多了（<3 seconds).

WITH area_type_x AS (
  SELECT 
    feature_id,
    array_agg(area_id) AS areas
  FROM feature_area
  WHERE short_name(category) = 'type_x' -- this line is the only difference
  GROUP BY feature_id
)
SELECT
  features.id feature_id,
  features.geom,
  area_type_x.areas
FROM area_type_x
JOIN features ON features.id = area_type_x.feature_id;

我用

EXPLAIN ANALYZE

运行了每个结果，如果有帮助的话可以分享这些结果，但我自己还无法理解它们。

知道发生了什么事吗？我想弄清楚，因为我怀疑如果我可以跳过将

category

转换为其简短版本，但保留它给我带来的任何改进，我可能能够做得比 3 秒更好。

Answer 1

不要使用 CTE：

SELECT
  features.id feature_id,
  features.geom,
  array_agg(area_id) AS areas
FROM feature_area
JOIN features ON features.id = feature_area.feature_id
WHERE category = 'long name for type x'
GROUP BY 1, 2

Postgres：优化多对多关系表的连接

问题描述投票：0回答：1

1个回答

最新问题

Postgres：优化多对多关系表的连接

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1