在sql中对每24小时的行进行计数和分组

问题描述 投票:0回答:1

我想获取从第一个事件到接下来的 24 小时的每天行数,每 24 小时分组一次。我尝试了很多方法,但如果不使用循环就无法弄清楚。我想在不实际使用循环的情况下执行此操作,因为我使用的 sql 服务器不支持循环语句。到目前为止,我能够加入他们,并让我独自思考使用

LAG
或跟踪第一次活动后 24 小时的情况,但无法走得太远。

我怀疑这是否可以用 SQL 实现。

event_at,id,event_log_id
2024-01-17 10:22:20.000,1,1
2024-01-17 11:40:04.000,2,2
2024-01-18 11:18:50.000,3,3
2024-01-18 11:18:51.000,4,4
2024-01-18 11:18:52.000,5,5
2024-01-18 11:39:03.000,6,6
2024-01-18 14:17:48.000,7,7

我预期的结果,从日期 17 的 24 小时内的第一个事件开始,事件计数为 2,接下来的 24 小时为 5

event_at,count
2024-01-17 2
2024-01-18 5

我尝试了一些有延迟的实验,但无法走得太远

WITH lagged_data AS (
  SELECT
    "event_at",
    LAG("event_at") OVER (ORDER BY "event_at") AS prev_timestamp
  FROM
    "table"
     where ** and DATE("event_at") BETWEEN DATE '2024-01-01' AND DATE '2024-01-18'
)
SELECT
  "event_at",
  prev_timestamp AS period_start,
  COUNT(*) AS row_count
FROM (
  SELECT
    "event_at",
    LAG("event_at") OVER (ORDER BY "event_at") AS prev_timestamp,
    CASE
      WHEN LAG(DATE("event_at")) OVER (ORDER BY "event_at") IS NULL
        OR DATE("event_at") != LAG(DATE("event_at")) OVER (ORDER BY "event_at") THEN 1
      ELSE 0
    END AS date_changed
  FROM
    "table"
     where ** is not null and DATE("event_at") BETWEEN DATE '2024-01-01' AND DATE '2024-01-18'
) lagged_with_date_change
WHERE
  (date_diff('second', prev_timestamp, "event_at") <= 86400 AND date_changed = 1)
  OR date_changed = 1  -- To include the first row of each date
GROUP BY
  "event_at", prev_timestamp, date_changed
ORDER BY
  "event_at";

例如这个数据

2024-01-17 11:40:04.000 2   2
2024-01-18 11:18:50.000 3   3
2024-01-18 11:18:51.000 4   4
2024-01-18 11:18:52.000 5   5
2024-01-18 11:39:03.000 6   6
2024-01-18 14:17:48.000 7   7
2024-01-18 14:17:48.000 8   8
2024-01-19 14:17:48.000 10  10
2024-01-19 14:17:48.000 11  11
2024-01-19 15:17:48.000 12  12
2024-01-19 15:17:48.000 13  13

应返回:5,4,2

sql presto
1个回答
0
投票

为此,您可以使用

RECURSIVE CTE

重要提示:

  • 在你指定 Presto 之前我就开始研究这个了;最初我将您的帖子读为
    SQL Server
    ,所以这就是示例的生成方式。 您可以将查询语法修改为您正在使用的任何 DBMS。
  • 我添加了一些额外的日期来进一步测试查询。

如何:

  1. ROW NUMBER
    表中的每个事件分配唯一的
    CTE_EVENTS_BASE
    。这是 CTE 的准备工作。
  2. 创建一个
    RECURSIVE CTE
    来计算当前行与下一行之间的小时数 (
    HOURS_SINCE_LAST_EVENT
    )。然后我们将这些小时添加到小时的运行总和 (
    RS_HOURS_LAST_REQUEST
    )。当运行总和超过 24 小时时,我们重置
    RS_HOURS_LAST_REQUEST 
    归零。
  3. 将每个事件分配给
    EVENT_GROUP_NUM
    。我们从 GROUP_NUM 为 1 开始,然后每次
    RS_HOURS_LAST_REQUEST
    超过 24 小时时递增该值。
  4. 最后,我们返回第一个 EVENT_AT 和基于
    EVENT_GROUP_NUM
  5. 的事件计数

SQL 小提琴: 工作示例

查询:

CREATE TABLE events_tbl (
  event_at DATETIME,
  id INTEGER,
  event_log_id INTEGER
);

INSERT INTO events_tbl
  (event_at, id, event_log_id)
VALUES
  ('2024-01-17 10:22:20.000', '1', '1'),
  ('2024-01-17 11:40:04.000', '2', '2'),
  ('2024-01-18 11:18:50.000', '3', '3'),
  ('2024-01-18 11:18:51.000', '4', '4'),
  ('2024-01-18 11:18:52.000', '5', '5'),
  ('2024-01-18 11:39:03.000', '6', '6'),
  ('2024-01-18 14:17:48.000', '7', '7'),
  ('2024-01-19 18:01:03.000', '8', '8'),
  ('2024-01-19 20:22:01.000', '9', '9'),
  ('2024-01-19 22:39:03.000', '10', '10'),
  ('2024-01-20 01:01:01.000', '11', '11'),
  ('2024-01-21 00:01:01.000', '12', '12');
 
WITH CTE_EVENTS_BASE AS
  (SELECT A.EVENT_AT,
          A.ID,
          A.EVENT_LOG_ID,
          ROW_NUMBER() OVER (
                             ORDER BY A.EVENT_AT) AS ROW_NUM
   FROM EVENTS_TBL A),
     RECUR_CTE_EVENTS AS
  (SELECT A.EVENT_AT,
          A.ID,
          A.EVENT_LOG_ID,
          A.ROW_NUM,
          CAST(0 AS DECIMAL(22, 5)) AS HOURS_SINCE_LAST_EVENT,
          CAST(0 AS DECIMAL(22, 5)) AS RS_HOURS_SINCE_LAST_EVENT,
          1 AS EVENT_COUNT,
   1 AS EVENT_GROUP_NUM
   FROM CTE_EVENTS_BASE A
   WHERE A.ROW_NUM = 1
   UNION ALL SELECT A.EVENT_AT,
                    A.ID,
                    A.EVENT_LOG_ID,
                    A.ROW_NUM,
                    /* GET THE HOURS SINCE LAST EVENT */
                    CAST(DATEDIFF(SECOND, B.EVENT_AT, A.EVENT_AT) / 3600.00 AS DECIMAL(22, 5)) AS HOURS_SINCE_LAST_EVENT,
                    /* IF THE RUNNING SUM OF HOURS EXCEEDS 24 HOURS THEN RESET THE RUNNING SUM TO ZERO */
                    CAST(CASE
                        WHEN CAST(DATEDIFF(SECOND, B.EVENT_AT, A.EVENT_AT) / 3600.00 AS DECIMAL(22, 5)) + B.RS_HOURS_SINCE_LAST_EVENT >= 24 THEN 0.00000
                        ELSE CAST(DATEDIFF(SECOND, B.EVENT_AT, A.EVENT_AT) / 3600.00 AS DECIMAL(22, 5)) + B.RS_HOURS_SINCE_LAST_EVENT
                    END AS DECIMAL(22,5)) AS RS_HOURS_LAST_REQUEST,
                    /* EVENT COUNT EVERY 24 HOURS */
                    CASE
                        WHEN CAST(DATEDIFF(SECOND, B.EVENT_AT, A.EVENT_AT) / 3600.00 AS DECIMAL(22, 5)) + B.RS_HOURS_SINCE_LAST_EVENT >= 24 THEN 1
                        ELSE b.EVENT_COUNT + 1
                    END AS EVENT_COUNT,
                    /* ASSIGN SAME EVENTS TO A GROUP */
                    CASE
                        WHEN CAST(DATEDIFF(SECOND, B.EVENT_AT, A.EVENT_AT) / 3600.00 AS DECIMAL(22, 5)) + B.RS_HOURS_SINCE_LAST_EVENT >= 24 THEN b.EVENT_GROUP_NUM + 1
                        ELSE b.EVENT_GROUP_NUM
                    END AS EVENT_GROUP_NUM
                    
   FROM CTE_EVENTS_BASE A
   INNER JOIN RECUR_CTE_EVENTS B ON A.ROW_NUM = B.ROW_NUM + 1)

SELECT MIN(a.EVENT_AT) AS EVENT_AT, COUNT(a.ID) AS EVENT_COUNT
FROM RECUR_CTE_EVENTS a
GROUP BY a.EVENT_GROUP_NUM;

输出:

活动_AT EVENT_COUNT
2024-01-17 10:22:20.000 2
2024-01-18 11:18:50.000 5
2024-01-19 18:01:03.000 4
2024-01-21 00:01:01.000 1
© www.soinside.com 2019 - 2024. All rights reserved.