Lag() 获取 snowflake 的值变化

问题描述 投票:0回答:2

我有一个表,其中的数据如下所示:

ID 姓名 日期 订购
15 米歇尔 2023-01-01 2
15 米歇尔 2023-01-04 3
15 米奇 2023-01-15 7

(这是我的特定 id 表数据的一个小样本)

我想得到一张表,返回 ID,以及特定时间点的值是什么/它变成了什么

例如,我希望我的桌子看起来像

ID 姓名 从_DATE TO_DATE
15 米歇尔 2023-01-01 2023-01-15
15 米奇 2023-01-15 2023-03-10

对于给定 id 的最后一个值,它的 TO_DATE 将是当前日期。

完成了

SELECT ID, 
       LAG(NAME) OVER (PARTITION BY ID ORDER BY ORDER) AS NAME,
       LAG(DATE) OVER (PARTITION BY ID, ORDER BY ORDER) AS FROM_DATE, 
       DATE AS TO_DATE
FROM MY_TABLE

然而这又回来了

ID 姓名 从_DATE TO_DATE
15 米歇尔 2023-01-01 2023-01-04
15 米歇尔 2023-01-04 2023-01-15

因为从 2023-01-01 到 2023-01-15 没有变化,所以我不能将 MICHELLE 排成一行,而且我无法证明名称已从 MICHELLE 更改为 MITCH,并且该更改一直持续到今天(因为后面没有记录)

我有办法做到这一点吗?谢谢!

sql snowflake-cloud-data-platform window-functions lag gaps-and-islands
2个回答
1
投票

如果我没看错,你可以使用

lead()
两次:

select id, name, 
    date as from_date, 
    lead(date) over(partition by id order by ord) as to_date
from (
    select t.*, lead(name) over(partition by id order by ord) as lead_name
    from mytable t
) t
where lead_name is distinct from lead_name
order by id, ord

子查询检索相同 id 的“下一个”名称;然后我们可以使用此信息过滤掉不对应于名称更改的行,最后再次使用

lead()
来检索相关的结束日期。


0
投票

您可以像这样在 min(DATE) 聚合上使用前导窗口函数:

select ID, NAME, min(DATE) as FROM_DATE
      ,lead(FROM_DATE, 1, current_date) 
         over (partition by ID order by FROM_DATE) as TO_DATE
from MY_TABLE
group by ID, NAME;

这是一个独立的示例:

with MY_TABLE as
    (
    select 
    COLUMN1::int as "ID",
    COLUMN2::string as "NAME",
    COLUMN3::date as "DATE",
    COLUMN4::int as "ORDER"
    from (values
    ('15','MICHELLE','2023-01-01','2'),
    ('15','MICHELLE','2023-01-04','3'),
    ('15','MITCH','2023-01-15','7')
    )
)
select ID, NAME, min(DATE) as FROM_DATE, lead(FROM_DATE, 1, current_date) over (partition by ID order by FROM_DATE) as TO_DATE
from MY_TABLE
group by ID, NAME;

编辑:如果名称可以更改回之前具有相同 ID 的名称,这里有一个包含更多测试数据和更新解决方案的示例。它使用 conditional_change_event 函数来形成新的组。它对日期进行降序排序,以便于知道哪些行在末尾并且需要更改为 current_date:

with MY_TABLE as
    (
    select 
    COLUMN1::int as "ID",
    COLUMN2::string as "NAME",
    COLUMN3::date as "DATE",
    COLUMN4::int as "ORDER"
    from (values
    ('15','MICHELLE','2023-01-01','2'),
    ('15','MICHELLE','2023-01-04','3'),
    ('15','MICHELLE','2023-01-05','4'),
    ('15','MITCH',   '2023-01-15','7'),
    ('15','MICHELLE','2023-02-04','9'),
    ('16','BOB','2023-01-01','8')
    )
)
select   ID
        ,NAME
        ,min("DATE") as START_DATE
        ,iff(NAME_CHANGE = 0, current_date, max("DATE")) as END_DATE
from (
    select    *
        ,conditional_change_event("NAME") over (partition by ID order by "DATE" desc) NAME_CHANGE 
    from     MY_TABLE
)
group by ID, NAME, NAME_CHANGE
order by ID, START_DATE
;
ID 姓名 START_DATE END_DATE
15 米歇尔 2023-01-01 00:00:00 2023-01-05
15 米奇 2023-01-15 00:00:00 2023-01-15
15 米歇尔 2023-02-04 00:00:00 2023-03-11
16 鲍勃 2023-01-01 00:00:00 2023-03-11
© www.soinside.com 2019 - 2024. All rights reserved.