Postgres:优化查询“WHERE id IN(...)”

问题描述 投票:1回答:3

我有一张表(2M +记录)跟踪分类帐。一些条目添加点,而其他条目减去点(只有两种条目)。减去点数的条目总是引用从referenceentryid中减去的(添加)条目。添加条目总是在NULL中有referenceentryid

该表有一个dead列,当一些添加物耗尽或过期时,或当减法指向“死”添加时,工作者将设置为true。由于该表在dead=false上有部分索引,因此实时行上的SELECT工作速度非常快。

我的问题是将dead设置为NULL的工人的表现。

流程将是:1。获取每个添加的条目,表示添加的数量,减去的数量以及是否已过期。 2.过滤掉既没有过期又有除加减之外的条目。 3.在dead=trueid在过滤的条目集中的每一行上更新referenceentryid

WITH entries AS 
(
    SELECT 
        additions.id AS id,
        SUM(subtractions.amount) AS subtraction,
        additions.amount AS addition,
        additions.expirydate <= now() AS expired
    FROM 
        loyalty_ledger AS subtractions
    INNER JOIN 
        loyalty_ledger AS additions
    ON 
        additions.id = subtractions.referenceentryid
    WHERE
        subtractions.dead = FALSE
        AND subtractions.referenceentryid IS NOT NULL
    GROUP BY 
        subtractions.referenceentryid, additions.id
), dead_entries AS (
    SELECT
        id
    FROM
        entries
    WHERE
        subtraction >= addition OR expired = TRUE
)
-- THE SLOW BIT:
SELECT
    *
FROM 
    loyalty_ledger AS ledger
WHERE
    ledger.dead = FALSE AND
    (ledger.id IN (SELECT id FROM dead_entries) OR ledger.referenceentryid IN (SELECT id FROM dead_entries));

在上面的查询中,内部部分运行得非常快(几秒钟),而最后一部分将永远运行。

我在表上有以下索引:

CREATE TABLE IF NOT EXISTS loyalty_ledger (
        id SERIAL PRIMARY KEY,
        programid bigint NOT NULL,   
        FOREIGN KEY (programid) REFERENCES loyalty_programs(id) ON DELETE CASCADE,
        referenceentryid    bigint,
        FOREIGN KEY (referenceentryid) REFERENCES loyalty_ledger(id) ON DELETE CASCADE,
        customerprofileid bigint NOT NULL,
        FOREIGN KEY (customerprofileid) REFERENCES customer_profiles(id) ON DELETE CASCADE,
        amount int NOT NULL,
        expirydate TIMESTAMPTZ,
        dead boolean DEFAULT false,
        expired boolean DEFAULT false
);

CREATE index loyalty_ledger_referenceentryid_idx ON loyalty_ledger (referenceprofileid) WHERE dead = false;
CREATE index loyalty_ledger_customer_program_idx ON loyalty_ledger (customerprofileid, programid) WHERE dead = false;

我正在尝试优化查询的最后部分。 EXPLAIN给了我以下内容:

"Index Scan using loyalty_ledger_referenceentryid_idx on loyalty_ledger ledger  (cost=103412.24..4976040812.22 rows=986583 width=67)"
"  Filter: ((SubPlan 3) OR (SubPlan 4))"
"  CTE entries"
"    ->  GroupAggregate  (cost=1.47..97737.83 rows=252177 width=25)"
"          Group Key: subtractions.referenceentryid, additions.id"
"          ->  Merge Join  (cost=1.47..91390.72 rows=341928 width=28)"
"                Merge Cond: (subtractions.referenceentryid = additions.id)"
"                ->  Index Scan using loyalty_ledger_referenceentryid_idx on loyalty_ledger subtractions  (cost=0.43..22392.56 rows=341928 width=12)"
"                      Index Cond: (referenceentryid IS NOT NULL)"
"                ->  Index Scan using loyalty_ledger_pkey on loyalty_ledger additions  (cost=0.43..80251.72 rows=1683086 width=16)"
"  CTE dead_entries"
"    ->  CTE Scan on entries  (cost=0.00..5673.98 rows=168118 width=4)"
"          Filter: ((subtraction >= addition) OR expired)"
"  SubPlan 3"
"    ->  CTE Scan on dead_entries  (cost=0.00..3362.36 rows=168118 width=4)"
"  SubPlan 4"
"    ->  CTE Scan on dead_entries dead_entries_1  (cost=0.00..3362.36 rows=168118 width=4)"

好像我的查询的最后一部分是非常低效的。关于如何加快速度的任何想法?

postgresql query-optimization postgresql-10
3个回答
1
投票

对于大型数据集,我发现半连接比查询列表具有更好的性能:

from
  loyalty_ledger as ledger
WHERE
    ledger.dead = FALSE AND (
    exists (
      select null
      from dead_entries d
      where d.id = ledger.id
      ) or
    exists (
      select null
      from dead_entries d
      where d.id = ledger.referenceentryid
      )
    )

老实说,我不知道,但我认为其中每一个都值得一试。它的代码更少,更直观,但不能保证它们能更好地工作:

ledger.dead = FALSE AND
exists (
  select null
  from dead_entries d
  where d.id = ledger.id or d.id = ledger.referenceentryid 
)

要么

ledger.dead = FALSE AND
exists (
  select null
  from dead_entries d
  where d.id in (ledger.id, ledger.referenceentryid) 
)

0
投票

最后帮助我的是在第二个id IN步骤中做WITH过滤部分,用IN语法替换ANY

   WITH entries AS 
        (
            SELECT 
                additions.id AS id,
                additions.amount - coalesce(SUM(subtractions.amount),0) AS balance,
                additions.expirydate <= now() AS passed_expiration
            FROM 
                loyalty_ledger AS additions
            LEFT JOIN 
                loyalty_ledger AS subtractions
            ON 
                subtractions.dead = FALSE AND
                additions.id = subtractions.referenceentryid
            WHERE
                additions.dead = FALSE AND additions.referenceentryid IS NULL
            GROUP BY 
                subtractions.referenceentryid, additions.id
        ), dead_rows AS (
            SELECT
                l.id AS id,
                -- only additions that still have usable points can expire
                l.referenceentryid IS NULL AND e.balance > 0 AND e.passed_expiration AS expired
            FROM
                loyalty_ledger AS l
            INNER JOIN
                entries AS e
            ON
                (l.id = e.id OR l.referenceentryid = e.id)
            WHERE
                l.dead = FALSE AND
                (e.balance <= 0 OR e.passed_expiration)
           ORDER BY e.balance DESC
        )
        UPDATE
            loyalty_ledger AS l
        SET 
            (dead, expired) = (TRUE, d.expired)
        FROM 
            dead_rows AS d
        WHERE
            l.id = d.id AND
            l.dead = FALSE;

0
投票

我也相信

-- THE SLOW BIT:
SELECT
    *
FROM 
    loyalty_ledger AS ledger
WHERE
    ledger.dead = FALSE AND
    (ledger.id IN (SELECT id FROM dead_entries) OR ledger.referenceentryid IN (SELECT id FROM dead_entries));

可以重写为JOINUNION ALL,这很可能也会生成其他执行计划,并且可能更快。 但如果没有其他表结构,很难确定。

SELECT
    *
FROM 
    loyalty_ledger AS ledger
INNER JOIN (SELECT id FROM dead_entries) AS dead_entries
ON ledger.id = dead_entries.id AND ledger.dead = FALSE

UNION ALL 

SELECT
    *
FROM 
    loyalty_ledger AS ledger
INNER JOIN (SELECT id FROM dead_entries) AS dead_entries
ON ledger.referenceentryid = dead_entries.id AND ledger.dead = FALSE

而且因为PostgreSQL中的CTE已实现且未编入索引。你很可能最好从CTE中删除dead_entries别名并在CTE之外重复。

 SELECT
    *
FROM 
    loyalty_ledger AS ledger
INNER JOIN (SELECT
    id
FROM
    entries
WHERE
    subtraction >= addition OR expired = TRUE) AS dead_entries
ON ledger.id = dead_entries.id AND ledger.dead = FALSE

UNION ALL 

SELECT
    *
FROM 
    loyalty_ledger AS ledger
INNER JOIN (SELECT
    id
FROM
    entries
WHERE
    subtraction >= addition OR expired = TRUE) AS dead_entries
ON ledger.referenceentryid = dead_entries.id AND ledger.dead = FALSE
© www.soinside.com 2019 - 2024. All rights reserved.