我有一个表,其中的数据如下所示:
ID | 姓名 | 日期 | 订购 |
---|---|---|---|
15 | 米歇尔 | 2023-01-01 | 2 |
15 | 米歇尔 | 2023-01-04 | 3 |
15 | 米奇 | 2023-01-15 | 7 |
(这是我的特定 id 表数据的一个小样本)
我想得到一张表,返回 ID,以及特定时间点的值是什么/它变成了什么
例如,我希望我的桌子看起来像
ID | 姓名 | 从_DATE | TO_DATE |
---|---|---|---|
15 | 米歇尔 | 2023-01-01 | 2023-01-15 |
15 | 米奇 | 2023-01-15 | 2023-03-10 |
对于给定 id 的最后一个值,它的 TO_DATE 将是当前日期。
完成了
SELECT ID,
LAG(NAME) OVER (PARTITION BY ID ORDER BY ORDER) AS NAME,
LAG(DATE) OVER (PARTITION BY ID, ORDER BY ORDER) AS FROM_DATE,
DATE AS TO_DATE
FROM MY_TABLE
然而这又回来了
ID | 姓名 | 从_DATE | TO_DATE |
---|---|---|---|
15 | 米歇尔 | 2023-01-01 | 2023-01-04 |
15 | 米歇尔 | 2023-01-04 | 2023-01-15 |
因为从 2023-01-01 到 2023-01-15 没有变化,所以我不能将 MICHELLE 排成一行,而且我无法证明名称已从 MICHELLE 更改为 MITCH,并且该更改一直持续到今天(因为后面没有记录)
我有办法做到这一点吗?谢谢!
如果我没看错,你可以使用
lead()
两次:
select id, name,
date as from_date,
lead(date) over(partition by id order by ord) as to_date
from (
select t.*, lead(name) over(partition by id order by ord) as lead_name
from mytable t
) t
where lead_name is distinct from lead_name
order by id, ord
子查询检索相同 id 的“下一个”名称;然后我们可以使用此信息过滤掉不对应于名称更改的行,最后再次使用
lead()
来检索相关的结束日期。
您可以像这样在 min(DATE) 聚合上使用前导窗口函数:
select ID, NAME, min(DATE) as FROM_DATE
,lead(FROM_DATE, 1, current_date)
over (partition by ID order by FROM_DATE) as TO_DATE
from MY_TABLE
group by ID, NAME;
这是一个独立的示例:
with MY_TABLE as
(
select
COLUMN1::int as "ID",
COLUMN2::string as "NAME",
COLUMN3::date as "DATE",
COLUMN4::int as "ORDER"
from (values
('15','MICHELLE','2023-01-01','2'),
('15','MICHELLE','2023-01-04','3'),
('15','MITCH','2023-01-15','7')
)
)
select ID, NAME, min(DATE) as FROM_DATE, lead(FROM_DATE, 1, current_date) over (partition by ID order by FROM_DATE) as TO_DATE
from MY_TABLE
group by ID, NAME;
编辑:如果名称可以更改回之前具有相同 ID 的名称,这里有一个包含更多测试数据和更新解决方案的示例。它使用 conditional_change_event 函数来形成新的组。它对日期进行降序排序,以便于知道哪些行在末尾并且需要更改为 current_date:
with MY_TABLE as
(
select
COLUMN1::int as "ID",
COLUMN2::string as "NAME",
COLUMN3::date as "DATE",
COLUMN4::int as "ORDER"
from (values
('15','MICHELLE','2023-01-01','2'),
('15','MICHELLE','2023-01-04','3'),
('15','MICHELLE','2023-01-05','4'),
('15','MITCH', '2023-01-15','7'),
('15','MICHELLE','2023-02-04','9'),
('16','BOB','2023-01-01','8')
)
)
select ID
,NAME
,min("DATE") as START_DATE
,iff(NAME_CHANGE = 0, current_date, max("DATE")) as END_DATE
from (
select *
,conditional_change_event("NAME") over (partition by ID order by "DATE" desc) NAME_CHANGE
from MY_TABLE
)
group by ID, NAME, NAME_CHANGE
order by ID, START_DATE
;
ID | 姓名 | START_DATE | END_DATE |
---|---|---|---|
15 | 米歇尔 | 2023-01-01 00:00:00 | 2023-01-05 |
15 | 米奇 | 2023-01-15 00:00:00 | 2023-01-15 |
15 | 米歇尔 | 2023-02-04 00:00:00 | 2023-03-11 |
16 | 鲍勃 | 2023-01-01 00:00:00 | 2023-03-11 |