我有两个查询,当分别运行时,每个查询需要 1-10 秒来执行。当我将他们的结果与
UNION ALL
结合起来时,执行时间跃升至 4500 秒(超过 1 小时)!
我用
EXPLAIN ANALYZE
运行了长查询,它似乎迭代了 man_jour_tracking
1.6B 次——即使该表只有 ~11k 行长。它看起来是在每个连接上创建man_jour_tracking
——但只有当 UNION 存在时,而不是在它自己运行时。
不知道是不是和
json_object
数据类型有关。但是不知何故,计划者没有正确处理这个问题,因为他们自己执行并且所有记录都是唯一的。
(我已经删除了各种选择列以缩短查询,因为它们不相关)。 PostgreSQL 13.7
单独运行这个查询(第一个从联合查询中选择)需要大约 5 秒。
WITH most_recent_journum as (
SELECT
max(journal_number) as journum,
max(journal_date) as jourdate,
max(source_type)
FROM journal
GROUP BY source_id
),
man_jour_tracking as (
SELECT
json_object(Array_agg(tracking_category.name), array_agg(option)) as tracking,
manual_journal_line_id,
manual_journal_id
FROM
manual_journal_line_has_tracking_category line_tracking
LEFT JOIN tracking_category on line_tracking.tracking_category_id = tracking_category.tracking_category_id
GROUP BY manual_journal_line_id, manual_journal_id
)
SELECT
journal.journal_id as journal_id,
journal.journal_number as journal_number,
gross_amount as gross_amount,
line.account_code as account_code,
line.account_name as account_name,
line.account_type as account_type,
line.description as description,
manual_journal.status as status,
trim('"' FROM (man_jour_tracking.tracking -> 'Tracking')::text) as tracking,
trim('"' FROM (man_jour_tracking.tracking -> 'Events Tracking')::text) as events_tracking,
NULL as invoice_number,
NULL as contact_name,
manual_journal.narration as source_description
FROM journal
LEFT JOIN journal_line line on line.journal_id = journal.journal_id
LEFT JOIN manual_journal ON journal.source_id = manual_journal.manual_journal_id
LEFT JOIN manual_journal_line manual_line ON
manual_line.manual_journal_id = manual_journal.manual_journal_id AND line.account_code = manual_line.account_code AND line.description = manual_line.description AND line.gross_amount = manual_line.line_amount
LEFT JOIN
man_jour_tracking ON
manual_journal.manual_journal_id = man_jour_tracking.manual_journal_id AND man_jour_tracking.manual_journal_line_id::int = manual_line.line
WHERE
journal.source_type = 'MANJOURNAL' AND
journal.journal_number in (select journum from most_recent_journum) AND
journal.journal_date AT TIME ZONE 'UTC' >= '2022-10-01'::date AND journal_date AT TIME ZONE 'UTC' < '2023-01-01'::date AND
line.journal_line_id is not null AND
line.account_code <> '2650' AND
manual_journal.status = 'POSTED'
单独运行此查询(联合查询中的第二个 SELECT)需要大约 5 秒
WITH most_recent_journum as (
SELECT
max(journal_number) as journum,
max(journal_date) as jourdate,
max(source_type)
FROM journal
GROUP BY source_id
),
man_jour_tracking as (
SELECT
json_object(Array_agg(tracking_category.name), array_agg(option)) as tracking,
manual_journal_line_id,
manual_journal_id
FROM
manual_journal_line_has_tracking_category line_tracking
LEFT JOIN tracking_category on line_tracking.tracking_category_id = tracking_category.tracking_category_id
GROUP BY manual_journal_line_id, manual_journal_id
)
SELECT
journal.journal_id as journal_id,
journal.journal_number as journal_number,
NULL::int as source_line,
gross_amount as gross_amount,
line.account_code as account_code,
line.account_name as account_name,
line.account_type as account_type,
line.description as description,
invoice.status as status,
NULL as tracking,
NULL as events_tracking,
invoice.invoice_number,
contact.name as contact_name,
NULL as source_description
FROM journal
LEFT JOIN journal_line line on line.journal_id = journal.journal_id
LEFT JOIN invoice ON journal.source_id = invoice.invoice_id
LEFT JOIN contact on invoice.contact_id = contact.contact_id
WHERE
journal.source_type in ('ACCPAY', 'ACCREC') AND
journal.journal_number in (select journum from most_recent_journum) AND
journal.journal_date AT TIME ZONE 'UTC' >= '2022-10-01'::date AND journal_date AT TIME ZONE 'UTC' < '2023-01-01'::date AND
line.journal_line_id is not null AND
invoice.status not in ('VOIDED', 'DELETED')
但是当你将它们组合在一起时,查询需要1个多小时。
WITH most_recent_journum as (
SELECT
max(journal_number) as journum,
max(journal_date) as jourdate,
max(source_type)
FROM journal
GROUP BY source_id
),
man_jour_tracking as (
SELECT
json_object(Array_agg(tracking_category.name), array_agg(option)) as tracking,
manual_journal_line_id,
manual_journal_id
FROM
manual_journal_line_has_tracking_category line_tracking
LEFT JOIN tracking_category on line_tracking.tracking_category_id = tracking_category.tracking_category_id
GROUP BY manual_journal_line_id, manual_journal_id
)
SELECT
journal.journal_id as journal_id,
journal.journal_number as journal_number,
gross_amount as gross_amount,
line.account_code as account_code,
line.account_name as account_name,
line.account_type as account_type,
line.description as description,
manual_journal.status as status,
trim('"' FROM (man_jour_tracking.tracking -> 'Tracking')::text) as tracking,
trim('"' FROM (man_jour_tracking.tracking -> 'Events Tracking')::text) as events_tracking,
NULL as invoice_number,
NULL as contact_name,
manual_journal.narration as source_description
FROM journal
LEFT JOIN journal_line line on line.journal_id = journal.journal_id
LEFT JOIN manual_journal ON journal.source_id = manual_journal.manual_journal_id
LEFT JOIN manual_journal_line manual_line ON
manual_line.manual_journal_id = manual_journal.manual_journal_id AND line.account_code = manual_line.account_code AND line.description = manual_line.description AND line.gross_amount = manual_line.line_amount
LEFT JOIN
man_jour_tracking ON
manual_journal.manual_journal_id = man_jour_tracking.manual_journal_id AND man_jour_tracking.manual_journal_line_id::int = manual_line.line
WHERE
journal.source_type = 'MANJOURNAL' AND
journal.journal_number in (select journum from most_recent_journum) AND
journal.journal_date AT TIME ZONE 'UTC' >= '2022-10-01'::date AND journal_date AT TIME ZONE 'UTC' < '2023-01-01'::date AND
line.journal_line_id is not null AND
line.account_code <> '2650' AND
manual_journal.status = 'POSTED'
UNION ALL
SELECT
journal.journal_id as journal_id,
journal.journal_number as journal_number,
NULL::int as source_line,
gross_amount as gross_amount,
line.account_code as account_code,
line.account_name as account_name,
line.account_type as account_type,
line.description as description,
invoice.status as status,
NULL as tracking,
NULL as events_tracking,
invoice.invoice_number,
contact.name as contact_name,
NULL as source_description
FROM journal
LEFT JOIN journal_line line on line.journal_id = journal.journal_id
LEFT JOIN invoice ON journal.source_id = invoice.invoice_id
LEFT JOIN contact on invoice.contact_id = contact.contact_id
WHERE
journal.source_type in ('ACCPAY', 'ACCREC') AND
journal.journal_number in (select journum from most_recent_journum) AND
journal.journal_date AT TIME ZONE 'UTC' >= '2022-10-01'::date AND journal_date AT TIME ZONE 'UTC' < '2023-01-01'::date AND
line.journal_line_id is not null AND
invoice.status not in ('VOIDED', 'DELETED')
非常感谢任何帮助!
我尝试过的事情:
man_jour_tracking
从 CTE 分离到临时表 (CREATE TEMP TABLE man_jour_tracking
) 并创建索引 (CREATE INDEX idx_man_jour_tracking ON man_jour_tracking (manual_journal_id, manual_journal_line_id)
),以期加快连接或强制计划器缓存连接表。没有更快。UNION ALL
替换为 UNION
- 没有帮助。man_jour_tracking
CTE,组合时仍然需要比单独时更长的时间。