为什么这个 Postgresql UPDATE 语句即使不更新任何行也这么慢?

问题描述 投票:0回答:1

我看到计划进行全表扫描,但它从未执行,并且 UPDATE 无论如何都需要很长时间。为什么??

这是解释输出

Update on public.hone_cohortuser  (cost=3180.32..8951.51 rows=83498 width=564) (actual time=309.154..309.156 rows=0 loops=1)
  ->  Hash Join  (cost=3180.32..8951.51 rows=83498 width=564) (actual time=33.922..52.839 rows=42329 loops=1)
        Output: hone_cohortuser.id, hone_cohortuser.created_at, hone_cohortuser.updated_at, hone_cohortuser.cohort_id, hone_cohortuser.user_id, hone_cohortuser.onboarding_completed_at_datetime, 'COMPLETED'::character varying(254), hone_cohortuser.ctid, u0.ctid
        Inner Unique: true
        Hash Cond: ((hone_cohortuser.cohort_id = u0.cohort_id) AND (hone_cohortuser.user_id = u0.user_id))
        ->  Seq Scan on public.hone_cohortuser  (cost=0.00..4309.98 rows=83498 width=42) (actual time=0.009..6.899 rows=83498 loops=1)
              Output: hone_cohortuser.id, hone_cohortuser.created_at, hone_cohortuser.updated_at, hone_cohortuser.cohort_id, hone_cohortuser.user_id, hone_cohortuser.onboarding_completed_at_datetime, hone_cohortuser.ctid
        ->  Hash  (cost=2792.57..2792.57 rows=25850 width=14) (actual time=32.784..32.785 rows=47630 loops=1)
              Output: u0.ctid, u0.cohort_id, u0.user_id
              Buckets: 65536 (originally 32768)  Batches: 1 (originally 1)  Memory Usage: 2745kB
              ->  HashAggregate  (cost=2534.07..2792.57 rows=25850 width=14) (actual time=24.829..28.675 rows=47645 loops=1)
                    Output: u0.ctid, u0.cohort_id, u0.user_id
                    Group Key: u0.cohort_id, u0.user_id
                    Batches: 1  Memory Usage: 3857kB
                    ->  Seq Scan on public.hone_programparticipant u0  (cost=0.00..2295.03 rows=47808 width=14) (actual time=0.006..14.322 rows=48036 loops=1)
                          Output: u0.ctid, u0.cohort_id, u0.user_id
                          Filter: ((u0.learner_group_status)::text = 'COMPLETED'::text)
                          Rows Removed by Filter: 41086
Planning Time: 0.768 ms
Execution Time: 309.481 ms

这是查询:

UPDATE
  "hone_cohortuser"
SET
  "learner_program_status" = 'COMPLETED'
WHERE
  EXISTS(
    SELECT
      1 AS "a"
    FROM
      "hone_programparticipant" U0
    WHERE
      (
        U0."cohort_id" = ("hone_cohortuser"."cohort_id")
        AND U0."learner_group_status" = 'COMPLETED'
        AND U0."user_id" = ("hone_cohortuser"."user_id")
      )
    LIMIT
      1
  )

我最终优化它的方法是首先执行

SELECT id...
,然后运行
UPDATE
将 ID 直接放入 WHERE 子句中。当不需要更新时,它将总时间减少到之前基准测试的约 10%。

*如果您对数据模型有任何感觉不舒服,那就是。此查询将新的数据模型转换为旧表以实现向后兼容性。

sql postgresql performance query-optimization sql-execution-plan
1个回答
0
投票

运行更多测试后,我意识到 UPDATE 实际上是在 40k+ 行上执行的,正如 @Laurenz Albe 所指出的。

对查询的直接修复是将

"learner_program_status" != 'COMPLETED'
条件添加到 WHERE 子句,因为冗余开销是由“'COMPLETED' 更新为 'COMPLETED'”的行引起的

© www.soinside.com 2019 - 2024. All rights reserved.