SELECT FOR UPDATE随着时间变慢

Question

我们有一个带有1B条目的表，并且有4个进程同时在进行此工作。他们声明其会话ID一次具有1000行的行，然后在10,000行之后更新表。用于声明的查询是：

EXPLAIN (ANALYZE,BUFFERS) WITH b AS 
(
   SELECT
      userid,
      address
   FROM
      UserEvents 
   WHERE
      deliveryId = 2108625 
      AND 
      (
         tsExpire > GetDate() 
         OR tsExpire IS NULL
      )
      AND sendTime <= GetDate() 
      AND session_id = 0 
      AND level IN 
      (
         'default'
      )
   ORDER BY
      sendTime FOR 
      UPDATE
         SKIP LOCKED LIMIT 1000 
)
UPDATE
   UserEvents e 
SET
   session_id = 1 
FROM
b
WHERE
   e.userid = b.userid RETURNING b.userid,
   b.address

当所有四个进程同时运行时，此查询通常在500毫秒内运行。在最近的几次运行中，突然之间，随着时间的推移，它的速度已显着下降。这是解释计划：

"Update on UserEvents e  (cost=5753.03..8567.46 rows=1000 width=1962) (actual time=1373.284..1422.244 rows=1000 loops=1)"
"  Buffers: shared hit=1146920 read=59 dirtied=194"
"  I/O Timings: read=13.916"
"  CTE b"
"    ->  Limit  (cost=0.56..5752.46 rows=1000 width=82) (actual time=1373.094..1380.853 rows=1000 loops=1)"
"          Buffers: shared hit=1121721 read=27 dirtied=23"
"          I/O Timings: read=3.440"
"          ->  LockRows  (cost=0.56..179683.91 rows=31239 width=82) (actual time=1373.093..1380.775 rows=1000 loops=1)"
"                Buffers: shared hit=1121721 read=27 dirtied=23"
"                I/O Timings: read=3.440"
"                ->  Index Scan using UserEvents_nextpass2 on UserEvents  (cost=0.56..179371.52 rows=31239 width=82) (actual time=1366.046..1373.339 rows=4186 loops=1)"
"                      Index Cond: ((deliveryId = 2108625) AND (sendTime <= '2020-04-15 08:33:57.372282+00'::timestamp with time zone))"
"                      Filter: (((tsexpire > '2020-04-15 08:33:57.372282+00'::timestamp with time zone) OR (tsexpire IS NULL)) AND (session_id = 0) AND ((level)::text = 'default'::text))"
"                      Rows Removed by Filter: 29614"
"                      Buffers: shared hit=1113493 read=27"
"                      I/O Timings: read=3.440"
"  ->  Nested Loop  (cost=0.58..2815.00 rows=1000 width=1962) (actual time=1373.218..1389.995 rows=1000 loops=1)"
"        Buffers: shared hit=1126728 read=27 dirtied=23"
"        I/O Timings: read=3.440"
"        ->  CTE Scan on b  (cost=0.00..20.00 rows=1000 width=1692) (actual time=1373.106..1382.263 rows=1000 loops=1)"
"              Buffers: shared hit=1121721 read=27 dirtied=23"
"              I/O Timings: read=3.440"
"        ->  Index Scan using UserEvents_id on UserEvents e  (cost=0.58..2.79 rows=1 width=268) (actual time=0.007..0.007 rows=1 loops=1000)"
"              Index Cond: (userid = b.userid)"
"              Buffers: shared hit=5007"
"Planning Time: 0.331 ms"
"Execution Time: 1422.457 ms"

令人惊讶的是，在调用该查询几千次之后，UserEvents_nextpass2上的索引扫描速度明显降低。这是非空sendTime值的部分索引。 sendTime在每个进程更新行并删除其会话ID之后进行更新。但是，最近的1B事件就是这种情况，这种缓慢的原因可能是什么？任何帮助，将不胜感激。

解释计划以700ms相对更快的速度运行：

"Update on UserEvents e  (cost=5707.45..8521.87 rows=1000 width=1962) (actual time=695.897..751.557 rows=1000 loops=1)"
"  Buffers: shared hit=605921 read=68 dirtied=64"
"  I/O Timings: read=27.139"
"  CTE b"
"    ->  Limit  (cost=0.56..5706.87 rows=1000 width=82) (actual time=695.616..707.835 rows=1000 loops=1)"
"          Buffers: shared hit=580158 read=33 dirtied=29"
"          I/O Timings: read=10.491"
"          ->  LockRows  (cost=0.56..179686.41 rows=31489 width=82) (actual time=695.615..707.770 rows=1000 loops=1)"
"                Buffers: shared hit=580158 read=33 dirtied=29"
"                I/O Timings: read=10.491"
"                ->  Index Scan using UserEvents_nextpass2 on UserEvents  (cost=0.56..179371.52 rows=31489 width=82) (actual time=691.529..704.076 rows=3000 loops=1)"
"                      Index Cond: ((deliveryId = 2108625) AND (sendTime <= '2020-04-15 07:42:42.856859+00'::timestamp with time zone))"
"                      Filter: (((tsexpire > '2020-04-15 07:42:42.856859+00'::timestamp with time zone) OR (tsexpire IS NULL)) AND (session_id = 0) AND ((level)::text = 'default'::text))"
"                      Rows Removed by Filter: 29722"
"                      Buffers: shared hit=573158 read=33"
"                      I/O Timings: read=10.491"
"  ->  Nested Loop  (cost=0.58..2815.00 rows=1000 width=1962) (actual time=695.658..716.356 rows=1000 loops=1)"
"        Buffers: shared hit=585165 read=33 dirtied=29"
"        I/O Timings: read=10.491"
"        ->  CTE Scan on b  (cost=0.00..20.00 rows=1000 width=1692) (actual time=695.628..709.116 rows=1000 loops=1)"
"              Buffers: shared hit=580158 read=33 dirtied=29"
"              I/O Timings: read=10.491"
"        ->  Index Scan using UserEvents_id on UserEvents e  (cost=0.58..2.79 rows=1 width=268) (actual time=0.007..0.007 rows=1 loops=1000)"
"              Index Cond: (userid = b.userid)"
"              Buffers: shared hit=5007"
"Planning Time: 0.584 ms"
"Execution Time: 751.713 ms"

我在此表上的索引是：

CREATE INDEX UserEvents_nextpass2 ON public.UserEvents USING btree (deliveryid ASC NULLS LAST, sendTime ASC NULLS LAST) WHERE sendTime IS NOT NULL;

Answer 1

Index Scan using UserEvents_nextpass2 on UserEvents  (cost=0.56..179371.52 rows=31239 width=82) (actual time=1366.046..1373.339 rows=4186 loops=1)"
      Buffers: shared hit=1113493 read=27"

看来“ UserEvents_nextpass2”索引中有很多过时的数据。返回的每一行都要访问266页，这有点荒谬。您是否有任何长期开放的交易阻碍VACUUM和特定于btree的微真空有效地发挥作用？

查看pg_stat_activity。另外，hotstandby_feedback是否打开？ vacuum_defer_cleanup_age不为零吗？

Answer 2

没有简单的方法来减少每行访问的页面，因为我的所有索引列都正在同时更新。由于我的过滤器要丢弃约80％的行，因此我决定将过滤器行添加到多列索引中。所以我的索引从：

CREATE INDEX UserEvents_nextpass2 ON public.UserEvents USING btree (deliveryid ASC NULLS LAST, sendTime ASC NULLS LAST) WHERE sendTime IS NOT NULL;

至：

CREATE INDEX UserEvents_nextpass2 ON public.UserEvents USING btree (deliveryid ASC NULLS LAST, sendTime ASC NULLS LAST, tsexpired, session_id, level) WHERE sendTime IS NOT NULL;

这将经过过滤删除的行减少为0，而我仅访问了实际需要的行。我的缓冲区命中大小从1,121,721减少到<100,000。查询时间从1.5秒降至200ms。

经验教训：Always prefer a multi-column index over filtering

SELECT FOR UPDATE随着时间变慢

问题描述投票：1回答：2

2个回答

最新问题

SELECT FOR UPDATE随着时间变慢

问题描述 投票：1回答：2

2个回答

最新问题

问题描述投票：1回答：2