为什么 LATERAL JOIN 比 Postgres 中的相关子查询更快?

问题描述 投票:0回答:1

我已将以下查询从使用

correlated subquery
重写为使用
LATERAL JOIN
。现在速度更快了,我想了解为什么?

这个SO答案说没有黄金法则,这实际上取决于情况。但是,我想知道这是否有某种直觉?

如果相关,此查询将用作

VIEW

带有

correlated subquery
的旧查询(技术上是 2):

SELECT
  ts.id,
  ts.val1,
  tv.val2
FROM table_static AS ts
JOIN table_version AS tv ON ts.id = tv.id
  AND tv.effective_time = (
    SELECT MAX(tv1.effective_time)
    FROM table_version AS tv1
    WHERE
      tv1.id = ts.id
      AND
      tv1.effective_time <= CLOCK_TIMESTAMP()
  )
  AND tv.create_time = (
    SELECT MAX(tv2.create_time)
    FROM table_version AS tv2
    WHERE
      tv2.id = tv.id
      AND
      tv2.effective_time = tv.effective_time
      AND
      tv2.create_time <= CLOCK_TIMESTAMP()
  )
JOIN table_status AS t_status ON tv.status_id = t_status.id
WHERE t_status.status != 'deleted'
LIMIT 1000;

查询计划(相关子查询):

Limit  (cost=4.96..13876200.85 rows=141 width=64) (actual time=0.078..10.788 rows=1000 loops=1)
  ->  Nested Loop  (cost=4.96..13876200.85 rows=141 width=64) (actual time=0.077..10.641 rows=1000 loops=1)
        Join Filter: (tv.status_id = t_status.id)
        ->  Nested Loop  (cost=4.96..13876190.47 rows=176 width=64) (actual time=0.065..10.169 rows=1000 loops=1)
              ->  Seq Scan on table_static ts  (cost=0.00..17353.01 rows=1000001 width=32) (actual time=0.010..0.176 rows=1000 loops=1)
              ->  Index Scan using table_version_pkey on table_version tv  (cost=4.96..13.85 rows=1 width=40) (actual time=0.005..0.006 rows=1 loops=1000)
                    Index Cond: ((id = ts.id) AND (effective_time = (SubPlan 2)))
                    Filter: (create_time = (SubPlan 4))
                    SubPlan 4
                      ->  Result  (cost=8.46..8.47 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1000)
                            InitPlan 3 (returns $4)
                              ->  Limit  (cost=0.43..8.46 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=1000)
                                    ->  Index Only Scan Backward using table_version_pkey on table_version tv2  (cost=0.43..8.46 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=1000)
                                          Index Cond: ((id = tv.id) AND (effective_time = tv.effective_time) AND (create_time IS NOT NULL))
                                          Filter: (create_time <= clock_timestamp())
                                          Heap Fetches: 0
                    SubPlan 2
                      ->  Result  (cost=4.52..4.53 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1000)
                            InitPlan 1 (returns $1)
                              ->  Limit  (cost=0.43..4.52 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1000)
                                    ->  Index Only Scan Backward using table_version_pkey on table_version tv1  (cost=0.43..8.61 rows=2 width=8) (actual time=0.002..0.002 rows=1 loops=1000)
                                          Index Cond: ((id = ts.id) AND (effective_time IS NOT NULL))
                                          Filter: (effective_time <= clock_timestamp())
                                          Heap Fetches: 0
        ->  Materialize  (cost=0.00..1.08 rows=4 width=8) (actual time=0.000..0.000 rows=1 loops=1000)
              ->  Seq Scan on table_status t_status  (cost=0.00..1.06 rows=4 width=8) (actual time=0.006..0.006 rows=1 loops=1)
                    Filter: (status <> 'deleted'::text)
Planning Time: 0.827 ms
Execution Time: 10.936 ms

新查询带有

LATERAL JOIN
:

SELECT
  ts.id,
  ts.val1,
  tv.val2
FROM table_static AS ts
JOIN LATERAL (
  SELECT *
  FROM table_version AS tv
  WHERE
    ts.id = tv.id
    AND
    tv.effective_time <= CLOCK_TIMESTAMP()
    AND
    tv.create_time <= CLOCK_TIMESTAMP()
  ORDER BY
    tv.effective_time DESC,
    tv.create_time DESC
  LIMIT 1
) AS tv ON TRUE
JOIN table_status AS t_status ON tv.status_id = t_status.id
WHERE t_status.status != 'deleted'
LIMIT 1000;

查询计划(

LATERAL JOIN
):

Limit  (cost=0.43..40694.36 rows=1000 width=64) (actual time=0.218..4.431 rows=1000 loops=1)
  ->  Nested Loop  (cost=0.43..32555183.83 rows=800001 width=64) (actual time=0.217..4.280 rows=1000 loops=1)
        Join Filter: (tv.status_id = t_status.id)
        ->  Nested Loop  (cost=0.43..32502382.70 rows=1000001 width=64) (actual time=0.189..3.815 rows=1000 loops=1)
              ->  Seq Scan on table_static ts  (cost=0.00..17353.01 rows=1000001 width=32) (actual time=0.059..0.297 rows=1000 loops=1)
              ->  Limit  (cost=0.43..32.46 rows=1 width=48) (actual time=0.003..0.003 rows=1 loops=1000)
                    ->  Index Scan Backward using table_version_pkey on table_version tv  (cost=0.43..32.46 rows=1 width=48) (actual time=0.003..0.003 rows=1 loops=1000)
                          Index Cond: (id = ts.id)
                          Filter: ((effective_time <= clock_timestamp()) AND (create_time <= clock_timestamp()))
        ->  Materialize  (cost=0.00..1.08 rows=4 width=8) (actual time=0.000..0.000 rows=1 loops=1000)
              ->  Seq Scan on table_status t_status  (cost=0.00..1.06 rows=4 width=8) (actual time=0.021..0.021 rows=1 loops=1)
                    Filter: (status <> 'deleted'::text)
Planning Time: 1.315 ms
Execution Time: 4.746 ms

对于这两种情况,都存在以下索引(主键):

ALTER TABLE ONLY table_static
    ADD CONSTRAINT table_static_pkey PRIMARY KEY (id);
ALTER TABLE ONLY table_version
    ADD CONSTRAINT table_version_pkey PRIMARY KEY (id, effective_time, create_time);
ALTER TABLE ONLY table_status
    ADD CONSTRAINT table_status_pkey PRIMARY KEY (id);

也许答案只是因为少了一个“子查询”?据我了解,这两个查询都可以使用索引。

如果有任何其他方法可以优化此查询,我很想听听。

谢谢!

sql database postgresql performance query-optimization
1个回答
0
投票

更好的版本可能是使用窗口函数。

SELECT
  ts.id,
  ts.val1,
  tv.val2
FROM table_static AS ts
LEFT JOIN (
  SELECT *,
    ROW_NUMBER() OVER (PARTITION BY tv.id ORDER BY tv.effective_time DESC, tv.create_time DESC) AS rn
  FROM table_version AS tv
  WHERE
    tv.effective_time <= CLOCK_TIMESTAMP()
    AND
    tv.create_time <= CLOCK_TIMESTAMP()
) AS tv
   ON tv.id = ts.id
  AND tv.rn = 1
JOIN table_status AS t_status ON tv.status_id = t_status.id
WHERE t_status.status != 'deleted'
LIMIT 1000;  -- LIMIT without ORDER BY is non-deterministic
© www.soinside.com 2019 - 2024. All rights reserved.