我有 2 个表:最新表和历史表。历史记录将包含之前加载的所有行,而最新行将包含最新数据。我想提出一个基于 load_date 的综合表,该表应指示更改的时间。
最新表:
id col1 col2 load_date
1001 a g 1/3/2024
1003 q r 1/3/202
历史表:
id col1 col2 load_date
1001 a b 1/1/2024
1002 d e 1/1/2024
1001 a g 1/2/2024
我希望我的综合表是这样的:
id col1 col2 load_date change
1001 a b 1/1/2024 new entry
1001 a g 1/2/2024 col2 changed
1001 a g 1/3/2024
1002 d e 1/1/2024 new entry
1002 d e 1/2/2024 no show
1003 q r 1/3/2024 new entry
我在跟踪特定日期的更改和更新时遇到困难。特别是 ID 出现的日期以及 ID 最后出现的日期。
以下是我尝试过的: https://sqlfiddle.com/sql-server/online-compiler?id=5d406760-006a-4acb-9c96-e3a5236b1209
代码:
create table latest(id int, col1 varchar, col2 varchar, load_date date);
insert into latest
values(1001,'a','g','1/3/2024'),
(1003,'q','r','1/3/2024');--newly showed up on 1/3
--select * from latest;
create table history(id int, col1 varchar, col2 varchar, load_date date);
insert into history
values
(1001,'a','b','1/1/2024'),
(1002,'d','e','1/1/2024'),
(1001,'a','g','1/2/2024');--colb changed on 1/2
--(1002,'d','e','1/2/2024')--did not show up
--select * from history;
with combined as
(
select *,'latest' as source from latest l
union all
select *, 'history' as source from history h
),
changes AS (
SELECT
ct1.id,
ct1.col1,
ct1.col2,
ct1.load_date,
CASE
WHEN ct1.col1 <> ct2.col1 THEN 'col1 changed'
WHEN ct1.col2 <> ct2.col2 THEN 'col2 changed'
ELSE NULL
END AS changes
FROM
combined ct1
LEFT JOIN combined ct2
ON ct1.id = ct2.id
)
select * from changes
我可以寻求帮助来解决这个问题吗?
首先您
union
两个表,然后使用窗口函数 lag()
获取上一行值。要获得最终的 change
,请使用 case
表达式将当前行值与上一行值进行比较。
要处理“未显示”,请使用
history
中的最新行并使用 latest
检查。这个逻辑在联合查询的最后一部分处理
with cte as
(
select id, col1, col2, load_date, change = null
from latest
union all
select id, col1, col2, load_date, change = null
from history
union all
select id, col1, col2, load_date = dateadd(day, 1, load_date),
change = 'no show'
from (
select id, col1, col2, load_date,
rn = row_number() over (partition by id order by load_date desc)
from history
) h
where h.rn = 1
and not exists
(
select *
from latest x
where x.id = h.id
)
),
cte2 as
(
select id, col1, col2, load_date, change,
prev_col1 = lag(col1) over (partition by id order by load_date),
prev_col2 = lag(col2) over (partition by id order by load_date)
from cte
)
select *,
change = isnull(change, '')
+ case when prev_col1 is null and prev_col2 is null
then 'new entry'
else ''
end
+ case when prev_col1 <> col1
then 'col1 changed '
else ''
end
+ case when prev_col2 <> col2
then 'col2 changed '
else ''
end
from cte2
order by id, load_date