我正在尝试优化在 Postgresql 版本 13 或 14 上运行的查询。
假设我们有具有相应索引的表:
CREATE TABLE journal
(
account_id TEXT NOT NULL,
event_at TEXT NOT NULL,
id INTEGER PRIMARY KEY
);
CREATE INDEX journal_account_id_event_at_id_idx
ON journal (account_id, event_at, id);
INSERT INTO journal VALUES ('A', '2023-01-01', 1);
INSERT INTO journal VALUES ('A', '2023-01-10', 50);
INSERT INTO journal VALUES ('A', '2023-01-30', 15);
INSERT INTO journal VALUES ('A', '2023-03-02', 28);
INSERT INTO journal VALUES ('A', '2023-03-05', 16);
INSERT INTO journal VALUES ('B', '2023-01-01', 101);
INSERT INTO journal VALUES ('B', '2023-01-01', 102);
INSERT INTO journal VALUES ('B', '2023-01-01', 103);
INSERT INTO journal VALUES ('C', '2022-12-01', 2);
INSERT INTO journal VALUES ('C', '2023-01-02', 10);
INSERT INTO journal VALUES ('C', '2023-01-30', 6);
INSERT INTO journal VALUES ('C', '2023-01-30', 20);
INSERT INTO journal VALUES ('C', '2023-02-02', 29);
INSERT INTO journal VALUES ('C', '2023-03-03', 31);
我需要通过分页从所述表格中进行选择。如果我们要知道需要为其选择日记帐分录的帐户,那么我们可以这样做:
查询#1:
SELECT *
FROM journal
WHERE
account_id = 'C' -- account condition
AND (event_at, id) > ('2022-12-01', 2) -- pagination condition
ORDER BY event_at, id
LIMIT 2; -- page size for pagination
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.14..8.15 rows=1 width=68) (actual time=0.029..0.036 rows=2 loops=1)
Output: account_id, event_at, id
-> Index Only Scan using journal_account_id_event_at_id_idx on public.journal (cost=0.14..8.15 rows=1 width=68) (actual time=0.026..0.029 rows=2 loops=1)
Output: account_id, event_at, id
Index Cond: ((journal.account_id = 'C'::text) AND (ROW(journal.event_at, journal.id) > ROW('2022-12-01'::text, 2)))
Heap Fetches: 2
Planning Time: 0.204 ms
Execution Time: 0.085 ms
(8 rows)
工作得很好,postgres 使用索引,并且正如您从查询计划中看到的那样,根据 limit,仅读取了 2 个日志条目(请参阅“仅索引扫描”节点的行数= 2)。 但是,如果需要为多个帐户选择日记帐分录,并且我们事先不知道确切的帐户,Postgres 似乎会选择所有条目并对它们进行排序,以便正确限制输出:
查询#2:
WITH accounts AS
(
-- let's pretend here is some select from 'accounts' table
SELECT *
FROM unnest(ARRAY['A', 'C']::TEXT[]) AS account(id)
)
SELECT j.*
FROM journal j JOIN accounts a ON a.id = j.account_id -- account condition
WHERE (event_at, j.id) > ('2022-12-01', 2) -- pagination condition
ORDER BY event_at, j.id
LIMIT 2; -- page size for pagination
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=12.36..12.37 rows=1 width=68) (actual time=0.120..0.134 rows=2 loops=1)
Output: j.account_id, j.event_at, j.id
-> Sort (cost=12.36..12.37 rows=1 width=68) (actual time=0.117..0.125 rows=2 loops=1)
Output: j.account_id, j.event_at, j.id
Sort Key: j.event_at, j.id
Sort Method: top-N heapsort Memory: 25kB
-> Nested Loop (cost=0.14..12.35 rows=1 width=68) (actual time=0.049..0.091 rows=10 loops=1)
Output: j.account_id, j.event_at, j.id
-> Function Scan on pg_catalog.unnest account (cost=0.00..0.02 rows=2 width=32) (actual time=0.008..0.011 rows=2 loops=1)
Output: account.id
Function Call: unnest('{A,C}'::text[])
-> Index Only Scan using journal_account_id_event_at_id_idx on public.journal j (cost=0.14..6.15 rows=1 width=68) (actual time=0.020..0.025 rows=5 loops=2)
Output: j.account_id, j.event_at, j.id
Index Cond: ((j.account_id = account.id) AND (ROW(j.event_at, j.id) > ROW('2022-12-01'::text, 2)))
Heap Fetches: 10
Planning Time: 0.307 ms
Execution Time: 0.188 ms
(17 rows)
我们现在从查询计划中看到,“Nested Loop”节点的 rows=10,这意味着 postgres 读取所有满足分页条件的条目,只是为了稍后在“Limit”节点中省略它们。我的想法是,由于我们有日记表的复合索引(account_id、event_at、id),postgres 可以从“账户”CTE 中读取每个账户的limit日记条目,然后合并它们,甚至无需排序,因为条目已经按 (event_at, id) 排序,本质上是这样做的:
查询#3:
SELECT *
FROM (
SELECT *
FROM journal
WHERE (account_id = 'A')
AND (event_at, id) > ('2022-12-01', 2) -- pagination condition
ORDER BY event_at, id
LIMIT 2
) j1
UNION ALL
SELECT *
FROM (
SELECT *
FROM journal
WHERE (account_id = 'C')
AND (event_at, id) > ('2022-12-01', 2) -- pagination condition
ORDER BY event_at, id
LIMIT 2
) j2
ORDER BY event_at, id
LIMIT 2;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.28..16.36 rows=2 width=68) (actual time=0.053..0.070 rows=2 loops=1)
Output: journal.account_id, journal.event_at, journal.id
-> Merge Append (cost=0.28..16.36 rows=2 width=68) (actual time=0.050..0.063 rows=2 loops=1)
Sort Key: journal.event_at, journal.id
-> Limit (cost=0.14..8.15 rows=1 width=68) (actual time=0.033..0.038 rows=2 loops=1)
Output: journal.account_id, journal.event_at, journal.id
-> Index Only Scan using journal_account_id_event_at_id_idx on public.journal (cost=0.14..8.15 rows=1 width=68) (actual time=0.031..0.033 rows=2 loops=1)
Output: journal.account_id, journal.event_at, journal.id
Index Cond: ((journal.account_id = 'A'::text) AND (ROW(journal.event_at, journal.id) > ROW('2022-12-01'::text, 2)))
Heap Fetches: 2
-> Limit (cost=0.14..8.15 rows=1 width=68) (actual time=0.014..0.016 rows=1 loops=1)
Output: journal_1.account_id, journal_1.event_at, journal_1.id
-> Index Only Scan using journal_account_id_event_at_id_idx on public.journal journal_1 (cost=0.14..8.15 rows=1 width=68) (actual time=0.012..0.013 rows=1 loops=1)
Output: journal_1.account_id, journal_1.event_at, journal_1.id
Index Cond: ((journal_1.account_id = 'C'::text) AND (ROW(journal_1.event_at, journal_1.id) > ROW('2022-12-01'::text, 2)))
Heap Fetches: 1
Planning Time: 0.362 ms
Execution Time: 0.132 ms
(18 rows)
请帮助我理解如何编写像查询#2这样的查询(对于 多个帐户),这样它就会有一个像查询 #3 一样的查询计划(它利用已经排序的数据,并且读取的数据比查询 #2 少)。
只有将
account_id
与 =
进行比较时,索引扫描才有效。由于 PostgreSQL 没有索引“跳过扫描”,因此您必须像以前一样重写查询。
PostgreSQL 优化器并未实现所有可能的功能。