与 UNION ALL 结合使用时,Postgres 查询的运行速度比单独运行查询时慢 1000 倍

问题描述 投票:0回答:0

我有两个查询,当分别运行时,每个查询需要 1-10 秒来执行。当我将他们的结果与

UNION ALL
结合起来时,执行时间跃升至 4500 秒(超过 1 小时)!

我用

EXPLAIN ANALYZE
运行了长查询,它似乎迭代了
man_jour_tracking
1.6B 次——即使该表只有 ~11k 行长。它看起来是在每个连接上创建
man_jour_tracking
——但只有当 UNION 存在时,而不是在它自己运行时。 Explain Analyze Output

不知道是不是和

json_object
数据类型有关。但是不知何故,计划者没有正确处理这个问题,因为他们自己执行并且所有记录都是唯一的。

(我已经删除了各种选择列以缩短查询,因为它们不相关)。 PostgreSQL 13.7

单独运行这个查询(第一个从联合查询中选择)需要大约 5 秒。

WITH most_recent_journum as (
    SELECT
        max(journal_number) as journum,
        max(journal_date) as jourdate,
        max(source_type)
    FROM journal
    GROUP BY source_id
),

man_jour_tracking as (
    SELECT
        json_object(Array_agg(tracking_category.name), array_agg(option)) as tracking,
        manual_journal_line_id,
        manual_journal_id
    FROM
        manual_journal_line_has_tracking_category line_tracking
        LEFT JOIN tracking_category on line_tracking.tracking_category_id = tracking_category.tracking_category_id
    GROUP BY manual_journal_line_id, manual_journal_id
)

SELECT
    journal.journal_id as journal_id,
    journal.journal_number as journal_number,
    gross_amount as gross_amount,
    line.account_code as account_code,
    line.account_name as account_name,
    line.account_type as account_type,
    line.description as description,
    manual_journal.status as status,
    trim('"' FROM (man_jour_tracking.tracking -> 'Tracking')::text) as tracking,
    trim('"' FROM (man_jour_tracking.tracking -> 'Events Tracking')::text) as events_tracking,
    NULL as invoice_number,
    NULL as contact_name,
    manual_journal.narration as source_description
FROM journal
    LEFT JOIN journal_line line on line.journal_id = journal.journal_id
    LEFT JOIN manual_journal ON journal.source_id = manual_journal.manual_journal_id
    LEFT JOIN manual_journal_line manual_line ON
        manual_line.manual_journal_id = manual_journal.manual_journal_id AND line.account_code = manual_line.account_code AND line.description = manual_line.description AND line.gross_amount = manual_line.line_amount
    LEFT JOIN 
        man_jour_tracking ON
        manual_journal.manual_journal_id = man_jour_tracking.manual_journal_id AND man_jour_tracking.manual_journal_line_id::int = manual_line.line
WHERE
    journal.source_type = 'MANJOURNAL' AND
    journal.journal_number in (select journum from most_recent_journum) AND
    journal.journal_date AT TIME ZONE 'UTC' >= '2022-10-01'::date AND journal_date AT TIME ZONE 'UTC' < '2023-01-01'::date AND
    line.journal_line_id is not null AND
    line.account_code <> '2650' AND
    manual_journal.status = 'POSTED'

单独运行此查询(联合查询中的第二个 SELECT)需要大约 5 秒

WITH most_recent_journum as (
    SELECT
        max(journal_number) as journum,
        max(journal_date) as jourdate,
        max(source_type)
    FROM journal
    GROUP BY source_id
),

man_jour_tracking as (
    SELECT
        json_object(Array_agg(tracking_category.name), array_agg(option)) as tracking,
        manual_journal_line_id,
        manual_journal_id
    FROM
        manual_journal_line_has_tracking_category line_tracking
        LEFT JOIN tracking_category on line_tracking.tracking_category_id = tracking_category.tracking_category_id
    GROUP BY manual_journal_line_id, manual_journal_id
)

SELECT
    journal.journal_id as journal_id,
    journal.journal_number as journal_number,
    NULL::int as source_line,
    gross_amount as gross_amount,
    line.account_code as account_code,
    line.account_name as account_name,
    line.account_type as account_type,
    line.description as description,
    invoice.status as status,
    NULL as tracking,
    NULL as events_tracking,
    invoice.invoice_number,
    contact.name as contact_name,
    NULL as source_description
FROM journal
    LEFT JOIN journal_line line on line.journal_id = journal.journal_id 
    LEFT JOIN invoice ON journal.source_id = invoice.invoice_id
    LEFT JOIN contact on invoice.contact_id = contact.contact_id
WHERE
    journal.source_type in ('ACCPAY', 'ACCREC') AND
    journal.journal_number in (select journum from most_recent_journum) AND
    journal.journal_date AT TIME ZONE 'UTC' >= '2022-10-01'::date AND journal_date AT TIME ZONE 'UTC' < '2023-01-01'::date AND
    line.journal_line_id is not null AND
    invoice.status not in ('VOIDED', 'DELETED')

但是当你将它们组合在一起时,查询需要1个多小时。

WITH most_recent_journum as (
    SELECT
        max(journal_number) as journum,
        max(journal_date) as jourdate,
        max(source_type)
    FROM journal
    GROUP BY source_id
),

man_jour_tracking as (
    SELECT
        json_object(Array_agg(tracking_category.name), array_agg(option)) as tracking,
        manual_journal_line_id,
        manual_journal_id
    FROM
        manual_journal_line_has_tracking_category line_tracking
        LEFT JOIN tracking_category on line_tracking.tracking_category_id = tracking_category.tracking_category_id
    GROUP BY manual_journal_line_id, manual_journal_id
)

SELECT
    journal.journal_id as journal_id,
    journal.journal_number as journal_number,
    gross_amount as gross_amount,
    line.account_code as account_code,
    line.account_name as account_name,
    line.account_type as account_type,
    line.description as description,
    manual_journal.status as status,
    trim('"' FROM (man_jour_tracking.tracking -> 'Tracking')::text) as tracking,
    trim('"' FROM (man_jour_tracking.tracking -> 'Events Tracking')::text) as events_tracking,
    NULL as invoice_number,
    NULL as contact_name,
    manual_journal.narration as source_description
FROM journal
    LEFT JOIN journal_line line on line.journal_id = journal.journal_id
    LEFT JOIN manual_journal ON journal.source_id = manual_journal.manual_journal_id
    LEFT JOIN manual_journal_line manual_line ON
        manual_line.manual_journal_id = manual_journal.manual_journal_id AND line.account_code = manual_line.account_code AND line.description = manual_line.description AND line.gross_amount = manual_line.line_amount
    LEFT JOIN 
        man_jour_tracking ON
        manual_journal.manual_journal_id = man_jour_tracking.manual_journal_id AND man_jour_tracking.manual_journal_line_id::int = manual_line.line
WHERE
    journal.source_type = 'MANJOURNAL' AND
    journal.journal_number in (select journum from most_recent_journum) AND
    journal.journal_date AT TIME ZONE 'UTC' >= '2022-10-01'::date AND journal_date AT TIME ZONE 'UTC' < '2023-01-01'::date AND
    line.journal_line_id is not null AND
    line.account_code <> '2650' AND
    manual_journal.status = 'POSTED'

UNION ALL

SELECT
    journal.journal_id as journal_id,
    journal.journal_number as journal_number,
    NULL::int as source_line,
    gross_amount as gross_amount,
    line.account_code as account_code,
    line.account_name as account_name,
    line.account_type as account_type,
    line.description as description,
    invoice.status as status,
    NULL as tracking,
    NULL as events_tracking,
    invoice.invoice_number,
    contact.name as contact_name,
    NULL as source_description
FROM journal
    LEFT JOIN journal_line line on line.journal_id = journal.journal_id 
    LEFT JOIN invoice ON journal.source_id = invoice.invoice_id
    LEFT JOIN contact on invoice.contact_id = contact.contact_id
WHERE
    journal.source_type in ('ACCPAY', 'ACCREC') AND
    journal.journal_number in (select journum from most_recent_journum) AND
    journal.journal_date AT TIME ZONE 'UTC' >= '2022-10-01'::date AND journal_date AT TIME ZONE 'UTC' < '2023-01-01'::date AND
    line.journal_line_id is not null AND
    invoice.status not in ('VOIDED', 'DELETED')

非常感谢任何帮助!

我尝试过的事情:

  1. man_jour_tracking
    从 CTE 分离到临时表 (
    CREATE TEMP TABLE man_jour_tracking
    ) 并创建索引 (
    CREATE INDEX idx_man_jour_tracking ON man_jour_tracking (manual_journal_id, manual_journal_line_id)
    ),以期加快连接或强制计划器缓存连接表。没有更快。
  2. union 中 SELECT 语句的切换顺序
  3. UNION ALL
    替换为
    UNION
    - 没有帮助。
  4. 即使完全移除
    man_jour_tracking
    CTE,组合时仍然需要比单独时更长的时间。
postgresql performance union-all explain
© www.soinside.com 2019 - 2024. All rights reserved.