当日期范围连续时按列分组

问题描述 投票:0回答:1

我的数据格式如下:

DECLARE @WidgetPrice TABLE (WidgetPriceId BIGINT IDENTITY(1,1), WidgitId INT, Price MONEY, 
    StartEffectiveWhen DATE, EndEffectiveWhen DATE)

INSERT INTO @WidgetPrice(WidgitId, Price, StartEffectiveWhen, EndEffectiveWhen)
VALUES
(100,      21.48, '2020-1-1',         '2021-8-5'),
(100,      19.34, '2021-8-6',         '2021-12-31'),
(100,      19.34, '2022-1-1',         '2022-12-31'),
(100,      19.34, '2023-1-1',         '2023-1-31'),
-- There is a date gap here (No price from 2023-1-31 to 2023-3-5)
(100,      19.34, '2023-3-5',         '2023-12-31'),
(100,      12.87, '2024-1-1',         '2024-1-31'),
(100,      12.87, '2024-2-1',         '2100-12-31'),
-- Next Widget's prices          
(200,      728.25, '2020-1-1',         '2021-12-31'),
(200,      728.25, '2022-1-1',         '2022-12-31'),
(200,      861.58, '2023-1-1',         '2024-5-21'),
(200,      601.19, '2024-5-22',        '2100-12-31')

我需要按

WidgetId
Price
进行分组,但前提是日期是连续的。

所以,在我的示例数据中,2023-1-31 和 2023-3-5 之间的数据存在差距。因为那里有差距,所以我需要有两个价格 19.34 的条目。

这是我希望得到的数据图像:

此输出中的关键行是第 2 行和第 3 行。由于日期之间存在间隙,因此两次列出相同的价格。

我曾想过制作一个递归 CTE,可以查看

LAG
StartEffeciveWhen
EndEffectiveWhen
值,但我无法弄清楚。

关于如何构建执行此操作的查询有什么想法吗?

注意:我的实际数据超过 113,000,000 行,还有更多列。我刚刚为这个问题提供了一个简化版本。

注 2:我正在运行 Microsoft SQL Server 2017

sql sql-server common-table-expression gaps-and-islands
1个回答
0
投票

SQLFiddle: 我将 SQLFiddle 放在一起,以便您可以运行此查询并调整它以查看不同的结果。非常有用的网站:SQL Fiddle With Answer

策略: 您可以专注于查找边缘,而不是专注于将多行链接在一起。然后,您可以按起始边的运行总计进行分组。我相信你可以缩短这个,但这可以让你明白这个想法:

--Find interesting edges
Select 
    WidgitId
    ,Price
    ,Min(StartEffectiveWhen) as StartEffectiveWhen
    ,MAX(EndEffectiveWhen) as EndEffectiveWhen
FROM
(
select 
    SUM(i.StartingEdge) OVER (Partition By WidgitId ORDER BY StartEffectiveWhen) as LeadingCount
    ,WidgitId,Price,StartEffectiveWhen,EndEffectiveWhen
FROM
(
SELECT
    convert(int,case when ISNULL(LagPrice,-1) != Price OR ISNULL(LagDate,'1900-01-01') != DATEADD(day,-1,StartEffectiveWhen) 
        then 1 else 0 end) as StartingEdge
    ,convert(int,case when ISNULL(LeadPrice,-1) != Price OR ISNULL(LeadDate,'1900-01-01') != DATEADD(day,1,EndEffectiveWhen) 
        then 1 else 0 end) as EndingEdge
    ,WidgitId,Price,StartEffectiveWhen,EndEffectiveWhen
FROM
(
select 
    lag(price) over (partition by WidgitID order by StartEffectiveWhen asc) as LagPrice
    ,LEAD(price) over (partition by WidgitID order by StartEffectiveWhen asc) as LeadPrice
    ,LAG(EndEffectiveWhen) over (partition by WidgitID order by StartEffectiveWhen asc) as LagDate
    ,LEAD(StartEffectiveWhen) over (partition by WidgitID order by StartEffectiveWhen asc) as LeadDate
    ,* 
from @WidgetPrice
) p
) i
) g
group by g.Price, g.WidgitID, LeadingCount
© www.soinside.com 2019 - 2024. All rights reserved.