我正在尝试获取连续几天下订单的客户的ID。该表创建如下:
create table orders(
orderid INT,
orderdate date,
customerid int
);
价值观:
insert into orders (orderid, orderdate, customerid)
values(1,'2023-06-20',1),
(2, '2023-06-21', 2),
(3, '2023-06-22', 3),
(4, '2023-06-22', 1),
(5, '2023-06-23', 3),
(6, '2023-06-22', 1),
(7, '2023-06-26', 4),
(8, '2023-06-27', 4),
(9, '2023-06-29', 4),
(10, '2023-06-29', 5),
(11, '2023-06-30', 5),
(12, '2023-06-28', 5),
(13, '2023-06-25', 4),
(14, '2023-06-24', 4),
(15, '2023-06-30', 4);
我编写的代码给出了连续几天有订单的 id 的输出,但留下了订单中有间隙的客户的 id,尽管在间隙实际发生之前订单数量较多。 我写的代码:
with t1 as(
select customerid, orderdate,
case when lead(orderdate) over (partition by customerid order by orderdate) is null then 1
else abs(orderdate - lead(orderdate) over (partition by customerid order by orderdate)) end as gap
from orders)
select customerid, sum(gap) as consecutive
from t1
where gap>0
group by customerid
having count(*)=sum(gap) and count(*)>1;
输出:
+------------+------------------+
| customerid | consecutive_days |
+------------+------------------+
| 3 | 2 |
| 5 | 3 |
+------------+------------------+
我想要的输出:
+------------+------------------+
| customerid | consecutive_days |
+------------+------------------+
| 3 | 2 |
| 4 | 4 |
| 4 | 2 |
| 5 | 3 |
+------------+------------------+
由于 customerid 4 的客户已在 2023-06-24 至 2023-06-27 期间订购。同一客户的下一个订单是在 2023 年 6 月 29 日和 2023 年 6 月 30 日,因此不连续,应作为单独的行出现。
编辑:下的订单必须是连续几天的,无论单日下的订单数量是多少。
这是一个典型的间隙和孤岛问题,这是获得所需结果的一种方法:
SELECT customerid, COUNT(DISTINCT orderdate) AS consecutive_days
FROM (
SELECT *, orderdate - INTERVAL DENSE_RANK() OVER (PARTITION BY customerid ORDER BY orderdate) DAY AS grp
FROM orders
) AS order_groups
GROUP BY customerid, grp
HAVING consecutive_days > 1;
输出:
客户ID | 连续_天 |
---|---|
3 | 2 |
4 | 4 |
4 | 2 |
5 | 3 |
这是一个db<>小提琴。
如果您查看派生表的结果,您可以看到,通过从订单日期中减去 DENSE_RANK,我们创建了一个可用于分组的值。
订单编号 | 订购日期 | 客户ID | grp |
---|---|---|---|
1 | 2023-06-20 | 1 | 2023-06-19 |
4 | 2023-06-22 | 1 | 2023-06-20 |
6 | 2023-06-22 | 1 | 2023-06-20 |
2 | 2023-06-21 | 2 | 2023-06-20 |
3 | 2023-06-22 | 3 | 2023-06-21 |
5 | 2023-06-23 | 3 | 2023-06-21 |
14 | 2023-06-24 | 4 | 2023-06-23 |
13 | 2023-06-25 | 4 | 2023-06-23 |
7 | 2023-06-26 | 4 | 2023-06-23 |
8 | 2023-06-27 | 4 | 2023-06-23 |
9 | 2023-06-29 | 4 | 2023-06-24 |
15 | 2023-06-30 | 4 | 2023-06-24 |
12 | 2023-06-28 | 5 | 2023-06-27 |
10 | 2023-06-29 | 5 | 2023-06-27 |
11 | 2023-06-30 | 5 | 2023-06-27 |
请注意,我使用了 DENSE_RANK,而不是更典型的 ROW_NUMBER,来处理同一天下多个订单的客户。