为什么pg查询会在一段时间后停止使用索引？

Question

我在Postgres 12.0中有此查询：

SELECT "articles"."id"
FROM "articles"
WHERE ((jsonfields ->> 'etat') = '0' OR (jsonfields ->> 'etat') = '1' OR (jsonfields ->> 'etat') = '2')
ORDER BY ordre ASC;

目前：

    QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort  (cost=1274.09..1274.97 rows=354 width=8) (actual time=13.000..13.608 rows=10435 loops=1)
Sort Key: ordre
Sort Method: quicksort  Memory: 874kB
->  Bitmap Heap Scan on articles  (cost=15.81..1259.10 rows=354 width=8) (actual time=1.957..10.807 rows=10435 loops=1)
Recheck Cond: (((jsonfields ->> 'etat'::text) = '1'::text) OR ((jsonfields ->> 'etat'::text) = '2'::text) OR ((jsonfields ->> 'etat'::text) = '0'::text))
Heap Blocks: exact=6839
->  BitmapOr  (cost=15.81..15.81 rows=356 width=0) (actual time=1.171..1.171 rows=0 loops=1)
->  Bitmap Index Scan on myidx  (cost=0.00..5.18 rows=119 width=0) (actual time=0.226..0.227 rows=2110 loops=1)
Index Cond: ((jsonfields ->> 'etat'::text) = '1'::text)
->  Bitmap Index Scan on myidx  (cost=0.00..5.18 rows=119 width=0) (actual time=0.045..0.045 rows=259 loops=1)
Index Cond: ((jsonfields ->> 'etat'::text) = '2'::text)
->  Bitmap Index Scan on myidx  (cost=0.00..5.18 rows=119 width=0) (actual time=0.899..0.899 rows=8066 loops=1)
Index Cond: ((jsonfields ->> 'etat'::text) = '0'::text)
Planning Time: 0.382 ms
Execution Time: 14.234 ms
(15 lignes)

一段时间后：

    QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Sort  (cost=7044.04..7079.35 rows=14127 width=8) (actual time=613.445..614.679 rows=15442 loops=1)
Sort Key: ordre
Sort Method: quicksort  Memory: 1108kB
->  Seq Scan on articles  (cost=0.00..6070.25 rows=14127 width=8) (actual time=0.060..609.477 rows=15442 loops=1)
Filter: (((jsonfields ->> 'etat'::text) = '1'::text) OR ((jsonfields ->> 'etat'::text) = '2'::text) OR ((jsonfields ->> 'etat'::text) = '3'::text))
Rows Removed by Filter: 8288
Planning Time: 0.173 ms
Execution Time: 615.744 ms
(8 lignes)

我需要重新创建索引：

DROP INDEX myidx;
CREATE INDEX myidx ON articles ( (jsonfields->>'etat') );

为什么？如何解决这个问题？

我试图减少内存以禁用seqscan。没用我尝试做select pg_stat_reset();。没用

Answer 1

pg_stat_reset()不会重置表统计信息。它仅重置counters（例如使用索引的频率），对查询计划没有影响。

要更新表统计信息，请使用pg_stat_reset()（或同时使用ANALYZE）。VACUUM ANALYZE应该通常会自动进行此处理。

您的第一个查询找到autovacuum，第二个查询找到rows=10435。 Postgres expects在第一个中找到rows=15442（！），但是在第二个中找到rows=354。它大大低估了第一个结果行的数量，这有利于索引。因此，您的第一个查询只是快速的[[偶然。

表统计信息已更改，可能有表和索引膨胀。最重要的是，您的费用设置可能会产生误导。考虑一个较低的rows=14127设置（可能还有random_page_cost和其他设置）。
相关：
random_page_cost
如果重新创建索引导致使用不同的查询计划，则该索引可能已膨胀。（过大的索引也会使Postgres不能使用它。）更激进的cpu_index_tuple_cost设置，通常或仅用于表，甚至仅索引也可能有所帮助。
[另外，表达式索引引入了其他统计信息（在您的情况下，cpu_index_tuple_cost上必不可少的统计信息）。删除索引也会删除那些索引。并且新的表达式索引以空的统计信息开始，该统计信息由下一份手册Keep PostgreSQL from sometimes choosing a bad query plan或自动真空填充。因此，通常，您应该在创建表达式索引后在表上运行autovacuum-除非您目前情况是基于误导性的统计信息，您目前似乎仅能获得快速查询，所以请先解决此问题。
也许重新访问您的数据库设计。该jsonfields->>'etat'值是否真的必须嵌套在JSON列中？将其作为单独的列可能总体上便宜很多。
尽管如此，您的第一个（快速）查询计划中最昂贵的部分是ANALYZE，Postgres在其中读取实际数据页以返回ANALYZE值。从Postgres 11开始，可以使用带有“覆盖”索引的快捷方式：
etat
但这依赖Bitmap Heap Scan更加及时地完成工作，因为它要求可见性图是最新的。
或者，
if
您的id子句是常量（始终过滤CREATE INDEX myidx ON articles ((jsonfields->>'etat')) INCLUDE (ordre, id);），部分索引将占上风：autovacuum

Answer 2

创建功能索引后，它立即没有任何统计信息收集，因此PostgreSQL必须做出一些通用假设。自动分析一旦有机会运行，便可以使用真正的统计数据。现在事实证明，更准确的估算实际上导致了更糟糕的计划，这是非常不幸的。

为什么pg查询会在一段时间后停止使用索引？

问题描述投票：0回答：2

2个回答

最新问题

为什么pg查询会在一段时间后停止使用索引？

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2