我有一张桌子
invoices
,其中有一个字段invoice_number
。这就是我执行select invoice_number from invoice
时发生的情况:
invoice_number
--------------
1
2
3
5
6
10
11
我想要一个能够给出以下结果的 SQL:
gap_start | gap_end
4 | 4
7 | 9
如何编写 SQL 来执行此类查询? 我正在使用 PostgreSQL。
这个问题的名称是“间隙和孤岛问题”,可以使用任何现代 SQL,使用窗口函数来完成:
select invoice_number + 1 as gap_start,
next_nr - 1 as gap_end
from (
select invoice_number,
lead(invoice_number) over (order by invoice_number) as next_nr
from invoices
) nr
where invoice_number + 1 <> next_nr;
SQLFiddle:http://sqlfiddle.com/#!15/1e807/1
这里使用 row_number 进行分区和间隔的演练示例:Postgres 连续天数、间隙和岛屿、Tabibitosan
我们可以使用更简单的技术来首先获取所有缺失值,方法是连接生成的序列列,如下所示:
select series
from generate_series(1, 11, 1) series
left join invoices on series = invoices.invoice_number
where invoice_number is null;
这为我们提供了一系列缺失的数字,在某些情况下它本身就很有用。
要获取间隙开始/结束范围,我们可以将源表与其自身连接起来。
select invoices.invoice_number + 1 as start,
min(fr.invoice_number) - 1 as stop
from invoices
left join invoices r on invoices.invoice_number = r.invoice_number - 1
left join invoices fr on invoices.invoice_number < fr.invoice_number
where r.invoice_number is null
and fr.invoice_number is not null
group by invoices.invoice_number,
r.invoice_number;
dbfiddle:https://dbfiddle.uk/?rdbms=postgres_14&fiddle=32c5f3c021b0f1a876305a2bd3afafc9
这可能不如上述解决方案优化,但在不支持
lead()
功能的 SQL 服务器中可能很有用。
完全归功于 SILOTA 文档中这个出色的页面: http://www.silota.com/docs/recipes/sql-gap-analysis-missing-values-sequence.html
我强烈建议阅读它,因为它逐步解释了解决方案。
我发现了另一个查询:
select invoice_number + lag gap_start,
invoice_number + lead - 1 gap_end
from (select invoice_number,
invoice_number - lag(invoice_number) over w lag,
lead(invoice_number) over w - invoice_number lead
from invoices window w as (order by invoice_number)) x
where lag = 1 and lead > 1;