我想获取从第一个事件到接下来的 24 小时的每天行数,每 24 小时分组一次。我尝试了很多方法,但如果不使用循环就无法弄清楚。我想在不实际使用循环的情况下执行此操作,因为我使用的 sql 服务器不支持循环语句。到目前为止,我能够加入他们,并让我独自思考使用
LAG
或跟踪第一次活动后 24 小时的情况,但无法走得太远。
我怀疑这是否可以用 SQL 实现。
event_at,id,event_log_id
2024-01-17 10:22:20.000,1,1
2024-01-17 11:40:04.000,2,2
2024-01-18 11:18:50.000,3,3
2024-01-18 11:18:51.000,4,4
2024-01-18 11:18:52.000,5,5
2024-01-18 11:39:03.000,6,6
2024-01-18 14:17:48.000,7,7
我预期的结果,从日期 17 的 24 小时内的第一个事件开始,事件计数为 2,接下来的 24 小时为 5
event_at,count
2024-01-17 2
2024-01-18 5
我尝试了一些有延迟的实验,但无法走得太远
WITH lagged_data AS (
SELECT
"event_at",
LAG("event_at") OVER (ORDER BY "event_at") AS prev_timestamp
FROM
"table"
where ** and DATE("event_at") BETWEEN DATE '2024-01-01' AND DATE '2024-01-18'
)
SELECT
"event_at",
prev_timestamp AS period_start,
COUNT(*) AS row_count
FROM (
SELECT
"event_at",
LAG("event_at") OVER (ORDER BY "event_at") AS prev_timestamp,
CASE
WHEN LAG(DATE("event_at")) OVER (ORDER BY "event_at") IS NULL
OR DATE("event_at") != LAG(DATE("event_at")) OVER (ORDER BY "event_at") THEN 1
ELSE 0
END AS date_changed
FROM
"table"
where ** is not null and DATE("event_at") BETWEEN DATE '2024-01-01' AND DATE '2024-01-18'
) lagged_with_date_change
WHERE
(date_diff('second', prev_timestamp, "event_at") <= 86400 AND date_changed = 1)
OR date_changed = 1 -- To include the first row of each date
GROUP BY
"event_at", prev_timestamp, date_changed
ORDER BY
"event_at";
例如这个数据
2024-01-17 11:40:04.000 2 2
2024-01-18 11:18:50.000 3 3
2024-01-18 11:18:51.000 4 4
2024-01-18 11:18:52.000 5 5
2024-01-18 11:39:03.000 6 6
2024-01-18 14:17:48.000 7 7
2024-01-18 14:17:48.000 8 8
2024-01-19 14:17:48.000 10 10
2024-01-19 14:17:48.000 11 11
2024-01-19 15:17:48.000 12 12
2024-01-19 15:17:48.000 13 13
应返回:5,4,2
为此,您可以使用
RECURSIVE CTE
。
重要提示:
SQL Server
,所以这就是示例的生成方式。 您可以将查询语法修改为您正在使用的任何 DBMS。如何:
ROW NUMBER
表中的每个事件分配唯一的 CTE_EVENTS_BASE
。这是 CTE 的准备工作。RECURSIVE CTE
来计算当前行与下一行之间的小时数 (HOURS_SINCE_LAST_EVENT
)。然后我们将这些小时添加到小时的运行总和 (RS_HOURS_LAST_REQUEST
)。当运行总和超过 24 小时时,我们重置 RS_HOURS_LAST_REQUEST
归零。EVENT_GROUP_NUM
。我们从 GROUP_NUM 为 1 开始,然后每次 RS_HOURS_LAST_REQUEST
超过 24 小时时递增该值。EVENT_GROUP_NUM
SQL 小提琴: 工作示例
查询:
CREATE TABLE events_tbl (
event_at DATETIME,
id INTEGER,
event_log_id INTEGER
);
INSERT INTO events_tbl
(event_at, id, event_log_id)
VALUES
('2024-01-17 10:22:20.000', '1', '1'),
('2024-01-17 11:40:04.000', '2', '2'),
('2024-01-18 11:18:50.000', '3', '3'),
('2024-01-18 11:18:51.000', '4', '4'),
('2024-01-18 11:18:52.000', '5', '5'),
('2024-01-18 11:39:03.000', '6', '6'),
('2024-01-18 14:17:48.000', '7', '7'),
('2024-01-19 18:01:03.000', '8', '8'),
('2024-01-19 20:22:01.000', '9', '9'),
('2024-01-19 22:39:03.000', '10', '10'),
('2024-01-20 01:01:01.000', '11', '11'),
('2024-01-21 00:01:01.000', '12', '12');
WITH CTE_EVENTS_BASE AS
(SELECT A.EVENT_AT,
A.ID,
A.EVENT_LOG_ID,
ROW_NUMBER() OVER (
ORDER BY A.EVENT_AT) AS ROW_NUM
FROM EVENTS_TBL A),
RECUR_CTE_EVENTS AS
(SELECT A.EVENT_AT,
A.ID,
A.EVENT_LOG_ID,
A.ROW_NUM,
CAST(0 AS DECIMAL(22, 5)) AS HOURS_SINCE_LAST_EVENT,
CAST(0 AS DECIMAL(22, 5)) AS RS_HOURS_SINCE_LAST_EVENT,
1 AS EVENT_COUNT,
1 AS EVENT_GROUP_NUM
FROM CTE_EVENTS_BASE A
WHERE A.ROW_NUM = 1
UNION ALL SELECT A.EVENT_AT,
A.ID,
A.EVENT_LOG_ID,
A.ROW_NUM,
/* GET THE HOURS SINCE LAST EVENT */
CAST(DATEDIFF(SECOND, B.EVENT_AT, A.EVENT_AT) / 3600.00 AS DECIMAL(22, 5)) AS HOURS_SINCE_LAST_EVENT,
/* IF THE RUNNING SUM OF HOURS EXCEEDS 24 HOURS THEN RESET THE RUNNING SUM TO ZERO */
CAST(CASE
WHEN CAST(DATEDIFF(SECOND, B.EVENT_AT, A.EVENT_AT) / 3600.00 AS DECIMAL(22, 5)) + B.RS_HOURS_SINCE_LAST_EVENT >= 24 THEN 0.00000
ELSE CAST(DATEDIFF(SECOND, B.EVENT_AT, A.EVENT_AT) / 3600.00 AS DECIMAL(22, 5)) + B.RS_HOURS_SINCE_LAST_EVENT
END AS DECIMAL(22,5)) AS RS_HOURS_LAST_REQUEST,
/* EVENT COUNT EVERY 24 HOURS */
CASE
WHEN CAST(DATEDIFF(SECOND, B.EVENT_AT, A.EVENT_AT) / 3600.00 AS DECIMAL(22, 5)) + B.RS_HOURS_SINCE_LAST_EVENT >= 24 THEN 1
ELSE b.EVENT_COUNT + 1
END AS EVENT_COUNT,
/* ASSIGN SAME EVENTS TO A GROUP */
CASE
WHEN CAST(DATEDIFF(SECOND, B.EVENT_AT, A.EVENT_AT) / 3600.00 AS DECIMAL(22, 5)) + B.RS_HOURS_SINCE_LAST_EVENT >= 24 THEN b.EVENT_GROUP_NUM + 1
ELSE b.EVENT_GROUP_NUM
END AS EVENT_GROUP_NUM
FROM CTE_EVENTS_BASE A
INNER JOIN RECUR_CTE_EVENTS B ON A.ROW_NUM = B.ROW_NUM + 1)
SELECT MIN(a.EVENT_AT) AS EVENT_AT, COUNT(a.ID) AS EVENT_COUNT
FROM RECUR_CTE_EVENTS a
GROUP BY a.EVENT_GROUP_NUM;
输出:
活动_AT | EVENT_COUNT |
---|---|
2024-01-17 10:22:20.000 | 2 |
2024-01-18 11:18:50.000 | 5 |
2024-01-19 18:01:03.000 | 4 |
2024-01-21 00:01:01.000 | 1 |