快速查询
select ...
from table1 t1
join table2 t2 on t2.org_id = t1.org_id
where t1.org_id = 1
慢查询
select ...
from table1 t1
join table2 t2 on t2.org_id = t1.org_id
where t1.org_id = (select org_id from table3 where org_name = "abc" limit 1)
两个查询的唯一区别是用子查询替换了文字。我已经在带有RDS的AWS上的PostgreSQL 12.2和11.6上进行了尝试。 table1和table2都在org_id列上分区。 table3有一个主键org_id和一个唯一的索引在org_name上。 “限制1”已添加到慢查询的子查询中,以尝试帮助优化器。
对于大多数组织,快速查询会在10秒内返回。对于大多数组织来说,缓慢的查询需要30-100秒。
我尝试过将分区大小设置为128、256、384、512、1024、2048和4096,最好是384。
快速查询的解释分析计划为15行,并且仅使用1个分区。慢查询的解释计划是对384个分区使用2,388行,并且似乎仅使用1个分区,但它考虑了所有分区。
您可以尝试创建SQL稳定函数来替换子查询。我在PostgreSQL 12.2中有以下情况:
EXPLAIN ANALYZE
select *
from table1 t1
join table2 t2 on t2.org_id = t1.org_id
where t1.org_id = 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..67.87 rows=78 width=44) (actual time=0.017..0.017 rows=0 loops=1)
-> Seq Scan on table2 t2 (cost=0.00..41.88 rows=13 width=4) (actual time=0.011..0.012 rows=1 loops=1)
Filter: (org_id = 1)
-> Materialize (cost=0.00..25.03 rows=6 width=40) (actual time=0.003..0.003 rows=0 loops=1)
-> Seq Scan on part1 t1 (cost=0.00..25.00 rows=6 width=40) (actual time=0.001..0.002 rows=0 loops=1)
Filter: (org_id = 1)
Planning Time: 0.432 ms
Execution Time: 0.046 ms
(8 rows)
EXPLAIN ANALYZE
select *
from table1 t1
join table2 t2 on t2.org_id = t1.org_id
where t1.org_id = (select org_id from table3 where org_name = 'abc' limit 1);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Nested Loop (cost=4.31..176.25 rows=390 width=44) (actual time=0.023..0.023 rows=0 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.00..4.31 rows=1 width=4) (actual time=0.013..0.013 rows=1 loops=1)
-> Seq Scan on table3 (cost=0.00..25.88 rows=6 width=4) (actual time=0.010..0.010 rows=1 loops=1)
Filter: (org_name = 'abc'::text)
-> Append (cost=0.00..125.15 rows=30 width=40) (actual time=0.022..0.023 rows=0 loops=1)
-> Seq Scan on part1 t1 (cost=0.00..25.00 rows=6 width=40) (actual time=0.002..0.002 rows=0 loops=1)
Filter: (org_id = $0)
-> Seq Scan on part2 t1_1 (cost=0.00..25.00 rows=6 width=40) (never executed)
Filter: (org_id = $0)
-> Seq Scan on part3 t1_2 (cost=0.00..25.00 rows=6 width=40) (never executed)
Filter: (org_id = $0)
-> Seq Scan on part4 t1_3 (cost=0.00..25.00 rows=6 width=40) (never executed)
Filter: (org_id = $0)
-> Seq Scan on part5 t1_4 (cost=0.00..25.00 rows=6 width=40) (never executed)
Filter: (org_id = $0)
-> Materialize (cost=0.00..41.94 rows=13 width=4) (never executed)
-> Seq Scan on table2 t2 (cost=0.00..41.88 rows=13 width=4) (never executed)
Filter: (org_id = $0)
Planning Time: 0.397 ms
Execution Time: 0.129 ms
(21 rows)
create function f_get_org_id() returns int
language sql
stable
as
$$
select org_id from table3 where org_name = 'abc' limit 1
$$
;
CREATE FUNCTION
EXPLAIN ANALYZE
select *
from table1 t1
join table2 t2 on t2.org_id = t1.org_id
where t1.org_id = f_get_org_id()
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..2309.43 rows=390 width=44) (actual time=0.003..0.003 rows=0 loops=1)
-> Append (cost=0.00..1625.15 rows=30 width=40) (actual time=0.003..0.003 rows=0 loops=1)
Subplans Removed: 4
-> Seq Scan on part1 t1 (cost=0.00..325.00 rows=6 width=40) (actual time=0.002..0.002 rows=0 loops=1)
Filter: (org_id = f_get_org_id())
-> Materialize (cost=0.00..679.44 rows=13 width=4) (never executed)
-> Seq Scan on table2 t2 (cost=0.00..679.38 rows=13 width=4) (never executed)
Filter: (org_id = f_get_org_id())
Planning Time: 0.655 ms
Execution Time: 0.091 ms
(10 rows)