为什么 Postgres 查询规划器会受到 LIMIT 的影响？

Question

EXPLAIN ANALYZE SELECT     "alerts"."id", 
            "alerts"."created_at", 
            't1'::text AS src_table 
 FROM       "alerts" 
 INNER JOIN "devices" 
 ON         "devices"."id" = "alerts"."device_id" 
 INNER JOIN "sites" 
 ON         "sites"."id" = "devices"."site_id" 
 WHERE      "sites"."cloud_id" = 111
 AND        "alerts"."created_at" >= '2019-08-30'
 ORDER BY   "created_at" DESC limit 9;

 Limit  (cost=1.15..36021.60 rows=9 width=16) (actual time=30.505..29495.765 rows=9 loops=1)
  ->  Nested Loop  (cost=1.15..232132.92 rows=58 width=16) (actual time=30.504..29495.755 rows=9 loops=1)
        ->  Nested Loop  (cost=0.86..213766.42 rows=57231 width=24) (actual time=0.029..29086.323 rows=88858 loops=1)
              ->  Index Scan Backward using alerts_created_at_index on alerts  (cost=0.43..85542.16 rows=57231 width=24) (actual time=0.014..88.137 rows=88858 loops=1)
                    Index Cond: (created_at >= '2019-08-30 00:00:00'::timestamp without time zone)
              ->  Index Scan using devices_pkey on devices  (cost=0.43..2.23 rows=1 width=16) (actual time=0.016..0.325 rows=1 loops=88858)
                    Index Cond: (id = alerts.device_id)
        ->  Index Scan using sites_pkey on sites  (cost=0.29..0.31 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=88858)
              Index Cond: (id = devices.site_id)
              Filter: (cloud_id = 7231)
              Rows Removed by Filter: 1
Total runtime: 29495.816 ms

现在我们改为LIMIT 10：

 EXPLAIN ANALYZE SELECT     "alerts"."id", 
            "alerts"."created_at", 
            't1'::text AS src_table 
 FROM       "alerts" 
 INNER JOIN "devices" 
 ON         "devices"."id" = "alerts"."device_id" 
 INNER JOIN "sites" 
 ON         "sites"."id" = "devices"."site_id" 
 WHERE      "sites"."cloud_id" = 111
 AND        "alerts"."created_at" >= '2019-08-30'
 ORDER BY   "created_at" DESC limit 10;

Limit  (cost=39521.79..39521.81 rows=10 width=16) (actual time=1.557..1.559 rows=10 loops=1)
  ->  Sort  (cost=39521.79..39521.93 rows=58 width=16) (actual time=1.555..1.555 rows=10 loops=1)
        Sort Key: alerts.created_at
        Sort Method: quicksort  Memory: 25kB
        ->  Nested Loop  (cost=5.24..39520.53 rows=58 width=16) (actual time=0.150..1.543 rows=11 loops=1)
              ->  Nested Loop  (cost=4.81..16030.12 rows=2212 width=8) (actual time=0.137..0.643 rows=31 loops=1)
                    ->  Index Scan using sites_cloud_id_index on sites  (cost=0.29..64.53 rows=31 width=8) (actual time=0.014..0.057 rows=23 loops=1)
                          Index Cond: (cloud_id = 7231)
                    ->  Bitmap Heap Scan on devices  (cost=4.52..512.32 rows=270 width=16) (actual time=0.020..0.025 rows=1 loops=23)
                          Recheck Cond: (site_id = sites.id)
                          ->  Bitmap Index Scan on devices_site_id_index  (cost=0.00..4.46 rows=270 width=0) (actual time=0.006..0.006 rows=9 loops=23)
                                Index Cond: (site_id = sites.id)
              ->  Index Scan using alerts_device_id_index on alerts  (cost=0.43..10.59 rows=3 width=24) (actual time=0.024..0.028 rows=0 loops=31)
                    Index Cond: (device_id = devices.id)
                    Filter: (created_at >= '2019-08-30 00:00:00'::timestamp without time zone)
                    Rows Removed by Filter: 12
Total runtime: 1.603 ms

alerts 表有数百万条记录，其他表有数千条记录。

我已经可以通过简单地不使用限制来优化查询< 10. What I don't understand is why the LIMIT affects the performance. Perhaps there's a better way than hardcoding this magic number "10".

Answer 1

结果行数影响 PostgreSQL 优化器，因为快速返回前几行的计划不一定是尽快返回整个结果的计划。

在你的例子中，PostgreSQL认为对于

LIMIT

的小值，通过使用索引按照

alerts

子句的顺序扫描

ORDER BY

表会更快，然后使用嵌套循环连接其他表，直到它已找到 9 行。

这种策略的好处是，它不必计算连接的完整结果，然后对其进行排序并丢弃除前几个结果行之外的所有结果行。危险在于找到 9 个匹配行所需的时间比预期要长，这就是你遇到的问题：

Index Scan Backward using alerts_created_at_index on alerts  (cost=0.43..85542.16 rows=57231 width=24) (actual time=0.014..88.137 rows=88858 loops=1)

因此 PostgreSQL 必须处理 88858 行并使用嵌套循环连接（如果必须经常循环，效率很低），直到找到 9 个结果行。这可能是因为它低估了条件的选择性，或者因为许多匹配行都恰好具有低

created_at

。

数字 10 恰好是 PostgreSQL 认为使用该策略不再更有效的分界点，它是一个会随着数据库中的数据变化而变化的值。

您可以通过使用与索引不匹配的

ORDER BY

子句来完全避免使用该计划：

ORDER BY created_at DESC NULLS LAST

为什么 Postgres 查询规划器会受到 LIMIT 的影响？

问题描述投票：0回答：1

1个回答

最新问题

为什么 Postgres 查询规划器会受到 LIMIT 的影响？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1