有没有一种方法可以根据每行之间的时间间隔(在本例中为 60 分钟)为 SQL 中的行分配唯一的组 ID。本质上,我需要它从第一行开始(8 月 27 日下午 1 点),这将是第 1 行,然后开始遍历每一行(看看是否有一行是 <= 60 min from 1pm), if it is it would be #2, then #3, etc. But once its over 60 min, I then need it to start over again at #1, and use the same logic, going through each row to see if <= 60 min. I've included a screen shot of the expected Groupid ID I would like to see.
尝试使用 LEAD/LAG 的某种组合但不成功
这听起来像是经典的“重叠日期范围”的一部分,如果您想了解更多详细信息,可以通过谷歌搜索。另一个你会发现很多文献的类似问题是“差距和孤岛问题”。
基本上,您想执行以下步骤:
我有一个非常通用的template,我为这个问题运行,它可能包含很多你不需要的东西。它还适用于 Snowflake 风格,因此您可能必须针对您的特定 RDBMS 对其进行调整。但希望这对您有所帮助:
WITH CTE_CONDITION AS (
SELECT
BP_DATETIME AS dtm
FROM
ExampleTable
WHERE
FIRST_BP_READING IS NOT NULL
AND BP_DATETIME is not null
),
CTE_LAGGED AS (
SELECT
dtm,
LAG(dtm) OVER (
ORDER BY
dtm
) AS previous_datetime,
LEAD(dtm) OVER (
ORDER BY
dtm
) AS next_datetime,
ROW_NUMBER() OVER (
ORDER BY
CTE_CONDITION.dtm
) AS island_location
FROM
CTE_CONDITION
),
CTE_ISLAND_START AS (
SELECT
ROW_NUMBER() OVER (
ORDER BY
dtm
) AS island_number,
dtm AS island_start_datetime,
island_location AS island_start_location
FROM
CTE_LAGGED
WHERE
(
DATEDIFF(HOUR, previous_datetime, dtm) > 60
OR CTE_LAGGED.previous_datetime IS NULL
)
),
CTE_ISLAND_END AS (
SELECT
ROW_NUMBER() OVER (
ORDER BY
dtm
) AS island_number,
dtm AS island_end_datetime,
island_location AS island_end_location
FROM
CTE_LAGGED
WHERE
DATEDIFF(HOUR, dtm, next_datetime) > 60
OR CTE_LAGGED.next_datetime IS NULL
)
SELECT
CTE_ISLAND_START.island_start_datetime,
CTE_ISLAND_END.island_end_datetime,
DATEDIFF(
HOUR, CTE_ISLAND_START.island_start_datetime,
CTE_ISLAND_END.island_end_datetime
) AS ISLAND_DURATION_HOUR,
(
SELECT
COUNT(*)
FROM
CTE_LAGGED
WHERE
CTE_LAGGED.dtm BETWEEN CTE_ISLAND_START.island_start_datetime
AND CTE_ISLAND_END.island_end_datetime
) AS island_row_count
FROM
CTE_ISLAND_START
INNER JOIN CTE_ISLAND_END ON CTE_ISLAND_END.island_number = CTE_ISLAND_START.island_number
Postgres
根据时间间隔为 SQL 中的行分配唯一组 ID 的一种方法是使用带窗口函数的递归 CTE(通用表表达式)。这是一个应该达到预期结果的示例查询:
WITH RECURSIVE t AS (
SELECT timestamp,
ROW_NUMBER() OVER (ORDER BY timestamp) AS row_num
FROM your_table
), rcte AS (
SELECT row_num,
timestamp,
1 AS group_id
FROM t
WHERE row_num = 1
UNION ALL
SELECT t.row_num,
t.timestamp,
CASE WHEN EXTRACT(epoch FROM (t.timestamp - rcte.timestamp)) <= 3600
THEN rcte.group_id
ELSE rcte.group_id + 1
END AS group_id
FROM t
JOIN rcte ON t.row_num = rcte.row_num + 1
)
SELECT row_num, group_id
FROM rcte
ORDER BY row_num;
这是查询的作用:
请注意,此查询假定时间戳列已按升序排序。如果不是,您可以将 ORDER BY 子句添加到 t 子查询以确保行被正确排序。
微软 SQL 服务器
Microsoft SQL Server 不支持
WITH RECURSIVE
语法。相反,您可以使用公用表表达式 (CTE) 和带有 UNION ALL 运算符的递归查询来获得相同的结果。这是查询的示例转换:
WITH t AS (
SELECT timestamp,
ROW_NUMBER() OVER (ORDER BY timestamp) AS row_num
FROM your_table
), rcte AS (
SELECT row_num,
timestamp,
1 AS group_id
FROM t
WHERE row_num = 1
UNION ALL
SELECT t.row_num,
t.timestamp,
CASE WHEN DATEDIFF(second, t.timestamp, rcte.timestamp) <= 3600
THEN rcte.group_id
ELSE rcte.group_id + 1
END AS group_id
FROM t
JOIN rcte ON t.row_num = rcte.row_num + 1
)
SELECT row_num, group_id
FROM rcte
ORDER BY row_num;
注意,我用
EXTRACT(epoch FROM ...)
函数替换了DATEDIFF(second, ...)
函数,它以秒为单位计算时差。