复合键分页

问题描述 投票:0回答:1

我正在尝试优化在 Postgresql 版本 13 或 14 上运行的查询。

假设我们有具有相应索引的表:

CREATE TABLE journal 
(
    account_id TEXT NOT NULL,
    event_at TEXT NOT NULL,
    id INTEGER PRIMARY KEY
);

CREATE INDEX journal_account_id_event_at_id_idx 
    ON journal (account_id, event_at, id);

INSERT INTO journal VALUES ('A', '2023-01-01', 1);
INSERT INTO journal VALUES ('A', '2023-01-10', 50);
INSERT INTO journal VALUES ('A', '2023-01-30', 15);
INSERT INTO journal VALUES ('A', '2023-03-02', 28);
INSERT INTO journal VALUES ('A', '2023-03-05', 16);
INSERT INTO journal VALUES ('B', '2023-01-01', 101);
INSERT INTO journal VALUES ('B', '2023-01-01', 102);
INSERT INTO journal VALUES ('B', '2023-01-01', 103);
INSERT INTO journal VALUES ('C', '2022-12-01', 2);
INSERT INTO journal VALUES ('C', '2023-01-02', 10);
INSERT INTO journal VALUES ('C', '2023-01-30', 6);
INSERT INTO journal VALUES ('C', '2023-01-30', 20);
INSERT INTO journal VALUES ('C', '2023-02-02', 29);
INSERT INTO journal VALUES ('C', '2023-03-03', 31);

我需要通过分页从所述表格中进行选择。如果我们要知道需要为其选择日记帐分录的帐户,那么我们可以这样做:

查询#1:

SELECT *
FROM journal
WHERE
    account_id = 'C' -- account condition
    AND (event_at, id) > ('2022-12-01', 2) -- pagination condition
ORDER BY event_at, id
LIMIT 2;  -- page size for pagination

                                                                          QUERY PLAN                                                                           
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.14..8.15 rows=1 width=68) (actual time=0.029..0.036 rows=2 loops=1)
   Output: account_id, event_at, id
   ->  Index Only Scan using journal_account_id_event_at_id_idx on public.journal  (cost=0.14..8.15 rows=1 width=68) (actual time=0.026..0.029 rows=2 loops=1)
         Output: account_id, event_at, id
         Index Cond: ((journal.account_id = 'C'::text) AND (ROW(journal.event_at, journal.id) > ROW('2022-12-01'::text, 2)))
         Heap Fetches: 2
 Planning Time: 0.204 ms
 Execution Time: 0.085 ms
(8 rows)

工作得很好,postgres 使用索引,并且正如您从查询计划中看到的那样,根据 limit,仅读取了 2 个日志条目(请参阅“仅索引扫描”节点的行数= 2)。 但是,如果需要为多个帐户选择日记帐分录,并且我们事先不知道确切的帐户,Postgres 似乎会选择所有条目并对它们进行排序,以便正确限制输出:

查询#2:

WITH accounts AS 
(
    -- let's pretend here is some select from 'accounts' table
    SELECT *
    FROM unnest(ARRAY['A', 'C']::TEXT[]) AS account(id)
)
SELECT j.*
FROM journal j JOIN accounts a ON a.id = j.account_id -- account condition
WHERE (event_at, j.id) > ('2022-12-01', 2) -- pagination condition
ORDER BY event_at, j.id
LIMIT 2;  -- page size for pagination
                                                                                 QUERY PLAN                                                                                  
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=12.36..12.37 rows=1 width=68) (actual time=0.120..0.134 rows=2 loops=1)
   Output: j.account_id, j.event_at, j.id
   ->  Sort  (cost=12.36..12.37 rows=1 width=68) (actual time=0.117..0.125 rows=2 loops=1)
         Output: j.account_id, j.event_at, j.id
         Sort Key: j.event_at, j.id
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Nested Loop  (cost=0.14..12.35 rows=1 width=68) (actual time=0.049..0.091 rows=10 loops=1)
               Output: j.account_id, j.event_at, j.id
               ->  Function Scan on pg_catalog.unnest account  (cost=0.00..0.02 rows=2 width=32) (actual time=0.008..0.011 rows=2 loops=1)
                     Output: account.id
                     Function Call: unnest('{A,C}'::text[])
               ->  Index Only Scan using journal_account_id_event_at_id_idx on public.journal j  (cost=0.14..6.15 rows=1 width=68) (actual time=0.020..0.025 rows=5 loops=2)
                     Output: j.account_id, j.event_at, j.id
                     Index Cond: ((j.account_id = account.id) AND (ROW(j.event_at, j.id) > ROW('2022-12-01'::text, 2)))
                     Heap Fetches: 10
 Planning Time: 0.307 ms
 Execution Time: 0.188 ms
(17 rows)

我们现在从查询计划中看到,“Nested Loop”节点的 rows=10,这意味着 postgres 读取所有满足分页条件的条目,只是为了稍后在“Limit”节点中省略它们。我的想法是,由于我们有日记表的复合索引(account_id、event_at、id),postgres 可以从“账户”CTE 中读取每个账户的limit日记条目,然后合并它们,甚至无需排序,因为条目已经按 (event_at, id) 排序,本质上是这样做的:

查询#3:

SELECT *
FROM (
  SELECT *
  FROM journal
  WHERE (account_id = 'A')
    AND (event_at, id) > ('2022-12-01', 2) -- pagination condition
  ORDER BY event_at, id
  LIMIT 2
) j1
UNION ALL
SELECT *
FROM (
  SELECT *
  FROM journal
  WHERE (account_id = 'C')
    AND (event_at, id) > ('2022-12-01', 2) -- pagination condition
  ORDER BY event_at, id
  LIMIT 2
) j2
ORDER BY event_at, id
LIMIT 2;
                                                                                     QUERY PLAN                                                                                      
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.28..16.36 rows=2 width=68) (actual time=0.053..0.070 rows=2 loops=1)
   Output: journal.account_id, journal.event_at, journal.id
   ->  Merge Append  (cost=0.28..16.36 rows=2 width=68) (actual time=0.050..0.063 rows=2 loops=1)
         Sort Key: journal.event_at, journal.id
         ->  Limit  (cost=0.14..8.15 rows=1 width=68) (actual time=0.033..0.038 rows=2 loops=1)
               Output: journal.account_id, journal.event_at, journal.id
               ->  Index Only Scan using journal_account_id_event_at_id_idx on public.journal  (cost=0.14..8.15 rows=1 width=68) (actual time=0.031..0.033 rows=2 loops=1)
                     Output: journal.account_id, journal.event_at, journal.id
                     Index Cond: ((journal.account_id = 'A'::text) AND (ROW(journal.event_at, journal.id) > ROW('2022-12-01'::text, 2)))
                     Heap Fetches: 2
         ->  Limit  (cost=0.14..8.15 rows=1 width=68) (actual time=0.014..0.016 rows=1 loops=1)
               Output: journal_1.account_id, journal_1.event_at, journal_1.id
               ->  Index Only Scan using journal_account_id_event_at_id_idx on public.journal journal_1  (cost=0.14..8.15 rows=1 width=68) (actual time=0.012..0.013 rows=1 loops=1)
                     Output: journal_1.account_id, journal_1.event_at, journal_1.id
                     Index Cond: ((journal_1.account_id = 'C'::text) AND (ROW(journal_1.event_at, journal_1.id) > ROW('2022-12-01'::text, 2)))
                     Heap Fetches: 1
 Planning Time: 0.362 ms
 Execution Time: 0.132 ms
(18 rows)

请帮助我理解如何编写像查询#2这样的查询(对于 多个帐户),这样它就会有一个像查询 #3 一样的查询计划(它利用已经排序的数据,并且读取的数据比查询 #2 少)。

sql postgresql query-optimization postgresql-13 postgresql-14
1个回答
0
投票

只有将

account_id
=
进行比较时,索引扫描才有效。由于 PostgreSQL 没有索引“跳过扫描”,因此您必须像以前一样重写查询。

PostgreSQL 优化器并未实现所有可能的功能。

© www.soinside.com 2019 - 2024. All rights reserved.