我有一个表格 (
#tmstmp
),有 2 列 dt
(DATETIME
) 和 payload
(INT
)。最终我想对每 5 分钟间隔的 payload
求和。
DECLARE @start DATETIME = N'2024-1-1 12:00:00';
DROP TABLE IF EXISTS #tmstmp
, #numbers;
CREATE TABLE #tmstmp (
dt DATETIME PRIMARY KEY
, payload INT NOT NULL
);
CREATE TABLE #numbers (
n INT PRIMARY KEY
);
WITH numbers (n) AS (
SELECT 0 AS n
UNION ALL
SELECT n + 1 AS n
FROM numbers
WHERE n < 100
)
INSERT
INTO #numbers
SELECT n
FROM numbers;
WITH rnd (mins, secs) AS (
SELECT n2.n AS mins
, CAST(ABS(CHECKSUM(NEWID())) % 60 AS INT) AS mins
FROM #numbers AS n1
, #numbers as n2
WHERE n1.n < 5
AND n2.n < 15
), tmstmp (dt) AS (
SELECT DATEADD(SECOND, secs, DATEADD(MINUTE, mins, @start)) AS dt
FROM rnd
)
INSERT
INTO #tmstmp
SELECT DISTINCT dt
, -1 AS payload
FROM tmstmp
ORDER BY dt;
UPDATE #tmstmp
SET payload = CAST(ABS(CHECKSUM(NEWID())) % 10 AS INT);
GO
DECLARE @start DATETIME = N'2024-1-1 12:00:00';
DECLARE @slotDuration INT = 5;
WITH agg (slot, sum_payload) AS (
SELECT DATEDIFF(MINUTE, @start, dt) / @slotDuration AS slot
, SUM(payload) AS sum_payload
FROM #tmstmp
GROUP BY DATEDIFF(MINUTE, @start, dt) / @slotDuration
)
SELECT DATEADD(MINUTE, slot * @slotDuration, @start) AS [from]
, DATEADD(MINUTE, (slot + 1) * @slotDuration, @start) AS [to]
, sum_payload
FROM agg;
来自 | 到 | 总有效负载 |
---|---|---|
2024-01-01 12:00:00 | 2024-01-01 12:05:00 | 124 |
2024-01-01 12:05:00 | 2024-01-01 12:10:00 | 106 |
2024-01-01 12:10:00 | 2024-01-01 12:15:00 | 95 |
但是,我希望在范围内输入每个间隔,即从
12:00-12:05
、12:01-12:06
、12:02-12:07
等直到最后一个时间段。
我可以之前构建整个范围内的限制,并在
JOIN
中使用它,如下所示:
DECLARE @start DATETIME = N'2024-1-1 12:00:00';
DECLARE @slotDuration INT = 5;
DECLARE @intervals INT = (SELECT DATEDIFF(MINUTE, @start, MAX(dt)) FROM #tmstmp);
WITH ranges ([from], [to], slot) AS (
SELECT DATEADD(MINUTE, n, @start) AS [from]
, DATEADD(MINUTE, n + @slotDuration, @start) AS [to]
, n AS slot
FROM #numbers
WHERE n <= @intervals
), tm_mult (slot, [from], [to], dt, payload) AS (
SELECT slot
, [from]
, [to]
, dt
, payload
FROM #tmstmp
INNER JOIN ranges
ON [from] <= dt
AND dt < [to]
)
SELECT MIN([from]) AS [from]
, MAX([to]) AS [to]
, SUM(payload) AS sum_payload
FROM tm_mult
GROUP BY slot
ORDER BY slot;
来自 | 到 | 总有效负载 |
---|---|---|
2024-01-01 12:00:00 | 2024-01-01 12:05:00 | 124 |
2024-01-01 12:01:00 | 2024-01-01 12:06:00 | 120 |
2024-01-01 12:02:00 | 2024-01-01 12:07:00 | 125 |
... | ... | ... |
2024-01-01 12:14:00 | 2024-01-01 12:19:00 | 19 |
虽然这在这个玩具示例中有效,但我的真实数据中有数十万个时间戳,最糟糕的是我对索引的影响很小。我的直觉告诉我,我会用我的不平等
JOIN
创造相当多的重复,我想知道这是否是规范的做法,或者是否有更多SQL-onic
的做法? (就像pythonistas
喜欢调用某些代码pythonic
,如果它使用语言固有的概念而不是尝试使用通用工具来解决它)。
sql 中的窗口函数 (WINDOW - microsoft.com / OVER - microsoft.com) 是添加到 SQL 工具带的重要资产。也特别规范; Windows 自 SQL Server 2005 以来就已存在。
下面是一个例子:
SELECT
[From],
DATEADD(MINUTE, 1, [To]) [To],
payload
FROM (
SELECT
dt,
MIN(dt) OVER(ORDER BY dt ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) [From],
dt [To],
SUM(payload) OVER(ORDER BY dt ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) payload
FROM (
SELECT
DATEADD(MINUTE, DATEDIFF(MINUTE, 0, dt), 0) dt,
SUM(payload) payload
FROM #tmstmp
GROUP BY DATEADD(MINUTE, DATEDIFF(MINUTE, 0, dt), 0)
) q
) q
WHERE DATEDIFF(MINUTE, [From], [To]) > 3
我想提请注意
4 PRECEDING
和 DATEADD(MINUTE, DATEDIFF(MINUTE, 0, dt), 0)
。由于后者实际上将日期时间降低到分钟,因此 2024-01-01 12:04:00.000
包含到 2024-01-01 12:04:59.999
,但不包括 2024-01-01 12:05:00.000
。希望这就是您正在寻找的功能。
这是一个小提琴