我有一个正在尝试优化的查询,但遇到了一些令人惊讶/令人困惑的结果。
我正在使用的表是功能和区域,每个功能和区域都有自己的 id 和几何图形。
Table "features"
Column | Type | Collation | Nullable | Default
-------------+----------+-----------+----------+---------
id | bigint | | not null |
category | text | | not null |
geom | geometry | | not null |
Indexes:
"features_pkey" PRIMARY KEY, btree (id)
"features_category_idx" btree (category)
Table "areas"
Column | Type | Collation | Nullable | Default
-------------+----------+-----------+----------+---------
id | bigint | | not null |
geom | geometry | | not null |
Indexes:
"features_pkey" PRIMARY KEY, btree (id)
下一个表存储
features
和 areas
之间的多对多关系,具有外键约束。每个特征可能位于零个、一个或多个区域中(如果它们不在任何区域中,则它们在 feature_area
表中没有条目),并且每个区域都有许多特征。
Table "feature_area"
Column | Type | Collation | Nullable | Default
--------------+----------+-----------+----------+---------
feature_id | bigint | | not null |
area_id | bigint | | not null |
category | text | | |
Indexes:
"feature_area_pkey" PRIMARY KEY, btree (feature_id, area_id)
"feature_area_category_idx" btree (category)
Foreign-key constraints:
"feature_area_feature_id_fkey" FOREIGN KEY (feature_id) REFERENCES features(feature_id)
"feature_area_area_id_fkey" FOREIGN KEY (area_id) REFERENCES areas(area_id)
我想要得到的是这样的结果 -
type_x
类别的所有功能都属于任何区域:
feature_id | areas | geom
---------------+-------------+-------------
1 | {45,123} | xxxxxx
3 | {8} | xxxxxx
这是我正在处理的查询。非常慢(~35 秒)。
WITH area_type_x AS (
SELECT
feature_id,
array_agg(area_id) AS areas
FROM feature_area
WHERE category = 'long name for type x'
GROUP BY feature_id
)
SELECT
features.id feature_id,
features.geom,
area_type_x.areas
FROM area_type_x
JOIN features ON features.id = area_type_x.feature_id;
偶然的机会,我尝试了这个,速度快多了(<3 seconds).
WITH area_type_x AS (
SELECT
feature_id,
array_agg(area_id) AS areas
FROM feature_area
WHERE short_name(category) = 'type_x' -- this line is the only difference
GROUP BY feature_id
)
SELECT
features.id feature_id,
features.geom,
area_type_x.areas
FROM area_type_x
JOIN features ON features.id = area_type_x.feature_id;
我用
EXPLAIN ANALYZE
运行了每个结果,如果有帮助的话可以分享这些结果,但我自己还无法理解它们。
知道发生了什么事吗?我想弄清楚,因为我怀疑如果我可以跳过将
category
转换为其简短版本,但保留它给我带来的任何改进,我可能能够做得比 3 秒更好。
不要使用 CTE:
SELECT
features.id feature_id,
features.geom,
array_agg(area_id) AS areas
FROM feature_area
JOIN features ON features.id = feature_area.feature_id
WHERE category = 'long name for type x'
GROUP BY 1, 2