我的数据:
INSERT INTO martin_test (id, user_id, created_at) VALUES
('1bb20295-fd7b-4918-a496-313e5babd482', 'abc', '2024-01-04 15:54:51')
, ('08565423-3371-4720-abb3-80c7aef8333d', 'abc', '2024-01-11 15:54:51')
, ('b17443fe-5a4f-4b7c-934d-3d2910a65f44', 'abc', '2024-01-18 15:54:51')
, ('3d267dc3-ee86-44e9-b1fe-fd918a64b77c', 'abc', '2024-02-01 15:54:51')
, ('d28d73d3-bc9c-4192-998a-a6bafce604a5', 'abc', '2024-02-08 15:54:51')
, ('401d38f8-d277-4605-af33-b4a9bd2eef25', 'abc', '2024-02-22 15:54:51')
, ('b804fa29-23af-4d93-a5c9-187767fec3c9', 'abc', '2024-02-29 15:54:51')
;
我的询问:
SELECT *,CASE WHEN gap_weeks > 1 THEN 1 END, avg(CASE WHEN gap_weeks > 1 THEN 7 END) OVER (PARTITION BY user_id ORDER BY week_number) AS grp_avg,
sum(CASE WHEN gap_weeks > 1 THEN 6 END) OVER (PARTITION BY user_id ORDER BY week_number) AS grp_sum,
COUNT(CASE WHEN gap_weeks > 1 THEN 1 END) OVER (PARTITION BY user_id ORDER BY week_number) AS grp
FROM (
SELECT user_id,
week_number,
lag(week_number) OVER (PARTITION BY user_id ORDER BY week_number) AS pre_week_number,
week_number - lag(week_number) OVER (PARTITION BY user_id ORDER BY week_number) AS gap_weeks
FROM (
select distinct EXTRACT(
WEEK
FROM
CAST(created_at AS DATE)
) AS week_number ,user_id from martin_test ) AS a
) AS subquery1
为什么abc的grp是'0.1.2.3'?我不知道为什么下一个gap_weeks > 1然后grp将被添加1),并且count&avg&sum(case when)over(PARTITION by)如何工作?
我的要求是:
“找到最连续的几周!”
我找到了这篇相关博客文章。
我专注于你宣布的目标:
找到最连续的几周!
可以这样做:
SELECT dense_rank() OVER (ORDER BY count(*) DESC) AS rank
, user_id
, count(*) AS consecutive_weeks
, concat_ws(' - ', min(week_nr), max(week_nr)) AS week_range
FROM (
SELECT *
, week_nr - row_number() OVER (PARTITION BY user_id ORDER BY week_nr) AS grp
FROM (
SELECT DISTINCT user_id, extract(week FROM created_at)::int AS week_nr
FROM martin_test
ORDER BY 1,2
) sub
) sub1
GROUP BY user_id, grp
ORDER BY consecutive_weeks DESC;
连续周数将与同一分区上的
row_number()
结果同步增加。如果您从实际周数 (week_nr
) 中减去该行号,则连续周将获得相同的 grp
数字。然后就可以聚合了。
相关: