我有一张过去 5 个月的客户活动表。该示例如下所示:
月 | id_client | 月份_数字 | 活动 |
---|---|---|---|
2023-10-01 | 1234 | 1 | 空 |
2023-11-01 | 1234 | 2 | 空 |
2023-12-01 | 1234 | 3 | 1 |
2024-01-01 | 1234 | 4 | 0 |
2024-02-01 | 1234 | 5 | 0 |
其中1=活跃,0=不活跃,NULL=不活跃,因为他还没有注册(我想保留这些NULL)
我想要达到的目标是:
月 | id_client | month_number_1 | month_number_2 | month_number_3 | 月份_编号_4 月份_编号_5 |
---|---|---|---|---|---|
2023-10-01 | 1234 | 空 | 空 | 1 | 0 |
2023-11-01 | 1234 | 空 | 1 | 0 | 0 |
2023-12-01 | 1234 | 1 | 0 | 0 | 空 |
2024-01-01 | 1234 | 0 | 0 | 空 | 空 |
2024-02-01 | 1234 | 0 | 空 | 空 | 空 |
我想我应该使用某种枢轴,但我不知道如何。
使用pivot,数据从行转换为列。然而,这里的任务是未来行的数据显示在额外的列中。这是通过 window 函数执行的。
我将
null
更改为 -1
,以便更容易看到 SQL 查询对缺失条目的反应。
这里计算出的
yyyymm
列用于查找下个月,也许可以使用month_number
列来代替。
WITH sample as (Select month, 1234 id_client, offset+1 as month_number,
case offset when 2 then 1 when 3 then 0 when 4 then 0 else null end as activity
from
unnest(generate_date_array(date"2023-10-01", date"2024-02-01",interval 1 month)) as month with offset
),
tbl1 as (
Select * except(activity),
ifnull(activity,-1) as activity, # replace null entries with -1
extract(year from month)*12 + extract(month from month) as yyyymm, # we need to access the next month by a value
from sample
)
select *,
any_value(activity) over win1 as activity_after_1month,
any_value(activity) over win2 as activity_after_2month,
any_value(activity) over win3 as activity_after_3month,
any_value(activity) over win4 as activity_after_4month,
any_value(activity) over win5 as activity_after_5month,
from tbl1
window
win1 as (partition by id_client order by yyyymm range between 1 following and 1 following),
win2 as (partition by id_client order by yyyymm range between 2 following and 2 following),
win3 as (partition by id_client order by yyyymm range between 3 following and 3 following),
win4 as (partition by id_client order by yyyymm range between 4 following and 4 following),
win5 as (partition by id_client order by yyyymm range between 5 following and 5 following)