删除“日期范围”行中的重复数据

问题描述 投票:0回答:2

我有一个类似以下的表格。

ID    StartDate    EndDate     AttributeA     AttributeB
--    ---------    -------     ----------     ----------
1     1/1/2009     2/1/2009    0              C
1     2/1/2009     3/1/2009    1              C
1     3/1/2009     4/1/2009    1              C
2     1/1/2010     2/1/2010    0              D
2     3/1/2010     4/1/2010    1              D

日期范围用于知道其余属性在哪个时间段内有效,我的问题是属性有多个连续的时间范围,其中属性保持相同,我想要的是获取相同的数据但没有重复的行。

从上一个示例中,我预期的最终结果将是这样:

ID    StartDate    EndDate     AttributeA     AttributeB
--    ---------    -------     ----------     ----------
1     1/1/2009     2/1/2009    0              C
1     2/1/2009     4/1/2009    1              C
2     1/1/2010     2/1/2010    0              D
2     3/1/2010     4/1/2010    1              D

[我所做的是将第二行和第三行合并为一个(除了日期以外的所有属性都相同),但是我保留了第二行的StartDate和第三行的endDate。

我首先想到按这样分组的值来获得MAX和MIN

SELECT ID, MIN(StartDate), MAX(EndDate), attributeA, attributeB
FROM MyTable
Group BY ID, AttributeA, AttributeB

但是,当我运行它时,我意识到,当属性多次更改并返回其原始值时,我将以重叠的间隔结束。我已经被困了一段时间,试图找出解决这个问题的方法。

这里是我之前的陈述中的一个例子。

当初始数据如下时:

ID    StartDate    EndDate     AttributeA     AttributeB
--    ---------    -------     ----------     ----------
1     1/1/2009     2/1/2009    0              C
1     2/1/2009     3/1/2009    0              D
1     3/1/2009     4/1/2009    0              D
1     4/1/2009     5/1/2009    1              D
1     6/1/2010     6/1/2009    0              D

将结果分组将如下所示

ID    StartDate    EndDate     AttributeA     AttributeB
--    ---------    -------     ----------     ----------
1     1/1/2009     2/1/2009    0              C
1     2/1/2009     6/1/2009    0              D
1     4/1/2009     5/1/2009    1              D

我想获得的是这个

ID    StartDate    EndDate     AttributeA     AttributeB
--    ---------    -------     ----------     ----------
1     1/1/2009     2/1/2009    0              C
1     2/1/2009     4/1/2009    0              D
1     4/1/2009     5/1/2009    1              D
1     6/1/2010     6/1/2009    0              D

欢迎任何帮助:)

编辑:我将尽快上传一些示例数据,以使我的问题更容易理解。

编辑2:Here's a script with some of my data。我希望从该样本中获得以下各行。

ID        StartDate     EndDate       A     B      C     D     E     F
--        ---------     -------       --    --     --    --    --    --
708513    1980-01-01    2006-07-23    15    ASDB   A     ACT   130   0
708513    2006-07-24    2009-12-08    15    ASDB   A     ACT   130   2
708513    2009-12-09    2010-01-12    0     ASDB   A     ACT   130   2
708513    2010-01-13    2079-05-30    15    ASDB   A     ACT   130   2
sql sql-server sql-server-2005 tsql
2个回答
1
投票

编辑,以下评论。试试:

;with cte as (
select m1.ID, m1.StartDate, m1.EndDate, m1.a, m1.b, m1.c, m1.d, m1.e, m1.f
from sampledata m1
where not exists
(select null from sampledata m0
 where m1.ID = m0.ID and 
       m1.a = m0.a and 
       m1.b = m0.b and 
       m1.c = m0.c and 
       m1.d = m0.d and 
       m1.e = m0.e and 
       m1.f = m0.f and 
       dateadd(day, -1, m1.StartDate) = m0.EndDate)
union all
select m1.ID, m1.StartDate, m2.EndDate, m1.a, m1.b, m1.c, m1.d, m1.e, m1.f
from cte m1
join sampledata m2 
       on m1.ID = m2.ID and 
          m1.a = m2.a and 
          m1.b = m2.b and 
          m1.c = m2.c and 
          m1.d = m2.d and 
          m1.e = m2.e and 
          m1.f = m2.f and 
          dateadd(day, 1, m1.EndDate) = m2.StartDate)
select ID, StartDate, max(EndDate) EndDate, a, b, c, d, e, f
from cte 
group by ID, StartDate, a, b, c, d, e, f
OPTION (MAXRECURSION 32767)

0
投票

如果有人感兴趣,我制作了一个没有递归的版本。我没有真正弄清楚如何添加上一个示例中未使用的额外列。

IF OBJECT_ID('tempdb..#test') IS NOT NULL drop table #test

create table #test (
    id int identity(1, 1)
    , ship nvarchar(64)
    , color nvarchar(16)
    , [length] int
    , height int
    , [type] nvarchar(16)
    , country nvarchar(16)
    , StartDate date
)

insert into #test(ship, color, [length], height, [type], country, StartDate)
values 
    ('Ship 1', 'Blue', 200, 13, 'sailboat', 'sweden', '2019-01-01')
    , ('Ship 1', 'Blue', 200, 13, 'sailboat', 'sweden', '2019-02-01')
    , ('Ship 1', 'Blue', 200, 13, 'sailboat', 'sweden', '2019-03-01')
    , ('Ship 1', 'Red', 200, 13, 'motorboat', 'sweden', '2019-11-01')
    , ('Ship 1', 'Blue', 200, 13, 'sailboat', 'sweden', '2019-12-01')
    , ('Ship 2', 'Green', 400, 27, 'RoRo', 'denmark', '2019-02-01')

;
with step1 as (
    select t.*
        , [EndDate] = dateadd(day, -1, lead(t.StartDate, 1, '9999-12-31') over(partition by t.ship order by t.StartDate))
    from #test t
    where 1 = 1
)
, step2 as (
    select t.*
        -- Check if preceeding row with same attribute has enddate between this startdate
        , [IdenticalPreceeding] = case 
                                    when t.StartDate = dateadd(day, 1, lag(t.EndDate, 1, '1900-01-01') over (partition by t.ship, t.color, t.[length], t.height, t.[type], t.country order by t.Startdate)) then 1
                                    else 0
                                end
    from step1 t
)

select t.*
    , [EndDateFinal] = dateadd(day, -1, lead(t.StartDate, 1, '9999-12-31') over(partition by t.ship order by t.StartDate))
from step2 t
where 1 = 1
-- Remove rows with identical preceeders
and t.IdenticalPreceeding = 0
order by t.ship
    , t.StartDate
© www.soinside.com 2019 - 2024. All rights reserved.