我看到计划进行全表扫描,但它从未执行,并且 UPDATE 无论如何都需要很长时间。为什么??
这是解释输出
Update on public.hone_cohortuser (cost=3180.32..8951.51 rows=83498 width=564) (actual time=309.154..309.156 rows=0 loops=1)
-> Hash Join (cost=3180.32..8951.51 rows=83498 width=564) (actual time=33.922..52.839 rows=42329 loops=1)
Output: hone_cohortuser.id, hone_cohortuser.created_at, hone_cohortuser.updated_at, hone_cohortuser.cohort_id, hone_cohortuser.user_id, hone_cohortuser.onboarding_completed_at_datetime, 'COMPLETED'::character varying(254), hone_cohortuser.ctid, u0.ctid
Inner Unique: true
Hash Cond: ((hone_cohortuser.cohort_id = u0.cohort_id) AND (hone_cohortuser.user_id = u0.user_id))
-> Seq Scan on public.hone_cohortuser (cost=0.00..4309.98 rows=83498 width=42) (actual time=0.009..6.899 rows=83498 loops=1)
Output: hone_cohortuser.id, hone_cohortuser.created_at, hone_cohortuser.updated_at, hone_cohortuser.cohort_id, hone_cohortuser.user_id, hone_cohortuser.onboarding_completed_at_datetime, hone_cohortuser.ctid
-> Hash (cost=2792.57..2792.57 rows=25850 width=14) (actual time=32.784..32.785 rows=47630 loops=1)
Output: u0.ctid, u0.cohort_id, u0.user_id
Buckets: 65536 (originally 32768) Batches: 1 (originally 1) Memory Usage: 2745kB
-> HashAggregate (cost=2534.07..2792.57 rows=25850 width=14) (actual time=24.829..28.675 rows=47645 loops=1)
Output: u0.ctid, u0.cohort_id, u0.user_id
Group Key: u0.cohort_id, u0.user_id
Batches: 1 Memory Usage: 3857kB
-> Seq Scan on public.hone_programparticipant u0 (cost=0.00..2295.03 rows=47808 width=14) (actual time=0.006..14.322 rows=48036 loops=1)
Output: u0.ctid, u0.cohort_id, u0.user_id
Filter: ((u0.learner_group_status)::text = 'COMPLETED'::text)
Rows Removed by Filter: 41086
Planning Time: 0.768 ms
Execution Time: 309.481 ms
这是查询:
UPDATE
"hone_cohortuser"
SET
"learner_program_status" = 'COMPLETED'
WHERE
EXISTS(
SELECT
1 AS "a"
FROM
"hone_programparticipant" U0
WHERE
(
U0."cohort_id" = ("hone_cohortuser"."cohort_id")
AND U0."learner_group_status" = 'COMPLETED'
AND U0."user_id" = ("hone_cohortuser"."user_id")
)
LIMIT
1
)
我最终优化它的方法是首先执行
SELECT id...
,然后运行 UPDATE
将 ID 直接放入 WHERE 子句中。当不需要更新时,它将总时间减少到之前基准测试的约 10%。
*如果您对数据模型有任何感觉不舒服,那就是。此查询将新的数据模型转换为旧表以实现向后兼容性。
运行更多测试后,我意识到 UPDATE 实际上是在 40k+ 行上执行的,正如 @Laurenz Albe 所指出的。
对查询的直接修复是将
"learner_program_status" != 'COMPLETED'
条件添加到 WHERE 子句,因为冗余开销是由“'COMPLETED' 更新为 'COMPLETED'”的行引起的