从表中,我现在有第一张表,并试图从这个客户购买的第一天开始每7天的销售额。例子是表 2
购买日期 | 客户编号 | 销售单位 |
---|---|---|
2018-01-01 | 1 | 10 |
2018-01-02 | 1 | 5 |
2018-01-05 | 2 | 3 |
2018-01-15 | 1 | 10 |
2018-01-20 | 2 | 4 |
2018-01-21 | 2 | 5 |
购买日期 | 客户编号 | 销售单位 | 每7天累计销售额 |
---|---|---|---|
2018-01-01 | 1 | 10 | 10 |
2018-01-02 | 1 | 5 | 15 |
2018-01-15 | 1 | 10 | 10 |
2018-01-05 | 2 | 3 | 3 |
2018-01-20 | 2 | 4 | 9 |
2018-01-21 | 2 | 5 | 9 |
决赛桌应该是这样的:
采购周 | 客户编号 | 7天销售单位 |
---|---|---|
2018-01-01 | 1 | 15 |
2018-01-05 | 2 | 3 |
2018-01-15 | 1 | 10 |
2018-01-20 | 2 | 4 |
然后我可以计算每个客户的平均销售额
客户编号 | 每 7 天销售单位的平均值 | 计算 |
---|---|---|
1 | 12.5 | (15+10) /2 |
2 | 3.5 | (3+4) /2 |
困难的部分是:
每个顾客的第一天购买都不一样
购买日期不是后继的,所以我不能使用unbonded或following 6 rows等
整个数据集有5年,所以我不能手动-7、-14等
我尝试使用 date_trunc('week',date, min(date) over (partition by customerid))
还尝试在 6 个处理行和当前行之间按行进行分区。但是日期不是结果所以不起作用
您可以通过查看日期的案例陈述来做到这一点。我在 SQL Server 中执行此操作,但我相信它适用于 Presto。我认为 DATEADD 在 Presto 中可能需要是“Date_Add”(带引号)。
你还提到你可能需要 14 天,所以我为此添加了一个专栏。你可以看到这只是在 DateAdd 函数中更改日期的问题。
SELECT t1.purchaseDate,
t1.CustomerID,
t1.SalesUnit,
SUM(CASE
WHEN t2.purchaseDate BETWEEN DATEADD(DAY, -6, t1.purchaseDate) AND t1.purchaseDate THEN t2.salesUnit
END) AS SalesLast7,
SUM(CASE
WHEN t2.purchaseDate BETWEEN DATEADD(DAY, -13, t1.purchaseDate) AND t1.purchaseDate THEN t2.salesUnit
END) AS SalesLast14
FROM temp t1
LEFT JOIN temp t2 ON t1.customerID = t2.customerID AND t2.purchaseDate IS NOT NULL
GROUP BY t1.purchaseDate, t1.customerID, t1.salesUnit
购买日期 | 客户编号 | 销售部 | SalesLast7 | SalesLast14 |
---|---|---|---|---|
2018-01-01 | 1 | 10 | 10 | 10 |
2018-01-02 | 1 | 5 | 15 | 15 |
2018-01-05 | 2 | 3 | 3 | 3 |
2018-01-15 | 1 | 10 | 10 | 15 |
2018-01-20 | 2 | 4 | 4 | 4 |
您可以通过 2 个步骤使用 SQL 窗口函数来获得您想要的结果:
步骤 1. 按每个客户应用窗口分区并获取每个客户的 first_purchase_date。之后,使用 Presto date_diff() 函数计算从第一次购买日期到当前购买日期的日期差异。将它除以 7 得到从购买的第一个日期算起的 week_bucket。
第 2 步。按每个 (customer, customer_sale_week_bucket) 分组并在每个 (customer, customer_sale_week_bucket) 分区中获取 sum(sales_unit) 和 min(purchase_date)。
这里是查询:
with orders_with_customer_week_bucket AS
(
select
purchase_date,
customer_id,
sales_unit,
date_diff(day,min(purchase_date) over (partition by customer_id), purchase_date) / 7 as customer_sale_week_bucket
from
orders
)
select
purchase_week,
customer_id,
seven_day_sales_unit
from
(select
customer_id,
customer_sale_week_bucket,
min(purchase_date) as purchase_week,
sum(sales_unit) as seven_day_sales_unit
from
orders_with_customer_week_bucket
GROUP BY
customer_id,
customer_sale_week_bucket
)r
采购周 | customer_id | seven_day_sales_unit |
---|---|---|
2018-01-01 | 1 | 15 |
2018-01-05 | 2 | 3 |
2018-01-15 | 1 | 10 |
2018-01-20 | 2 | 9 |
这是伪代码,因为我不知道 Presto 函数。在我回来查看之前,你需要翻译日期数学:
select distinct customerid,
(
case when lag(purchasedate) over (partition by customerid order by purchasedate) >= purchasedate - 6 then sum(salesunit) end +
case when lag(purchasedate) over (partition by customerid order by purchasedate) >= purchasedate - 5 then sum(salesunit) end +
case when lag(purchasedate) over (partition by customerid order by purchasedate) >= purchasedate - 4 then sum(salesunit) end +
case when lag(purchasedate) over (partition by customerid order by purchasedate) >= purchasedate - 3 then sum(salesunit) end +
case when lag(purchasedate) over (partition by customerid order by purchasedate) >= purchasedate - 2 then sum(salesunit) end +
case when lag(purchasedate) over (partition by customerid order by purchasedate) >= purchasedate - 1 then sum(salesunit) end +
sum(salesunit)
) / count(*) over (partition by customerid order)
from T
group by customerid, purchasedate