我有一个数据采集时间戳列表。靠近的时间戳属于一个循环。我想列举这些周期。因此,只要两个时间戳之间的时间超过 100 秒,就会创建下一个周期。
CREATE TABLE [Cycles](
[Cycle] [int] NOT NULL,
[CycleStart] [datetime] NOT NULL,
[CycleEnd] [datetime] NOT NULL,
CONSTRAINT [PK_Cycles] PRIMARY KEY CLUSTERED
(
[Cycle] DESC
))
INSERT INTO [Cycles] VALUES
(10,'2023-12-04T9:00:00','2023-12-04T10:00:00'),
(11,'2023-12-04T21:00:00','2023-12-04T22:00:00'),
(12,'2023-12-04T23:00:00','2023-12-05T00:00:00')
CREATE TABLE [Data](
[datatimestamp] [datetime] NOT NULL,
CONSTRAINT [PK_Data] PRIMARY KEY NONCLUSTERED
(
[datatimestamp] ASC
))
INSERT INTO [Data] VALUES
('2023-12-05T00:05:20'),
('2023-12-05T00:05:21'),
('2023-12-05T00:05:22'),
('2023-12-05T00:10:01'),
('2023-12-05T00:10:02'),
('2023-12-05T00:10:03')
所以我需要添加
Cycles
13
和 14
以下是我作为精选者可以做的事情:
DECLARE @lastCycle int = (SELECT TOP 1 Cycle FROM Cycles ORDER BY Cycle DESC);
DECLARE @lastCycleEnd datetime = (SELECT TOP 1 CycleEnd FROM Cycles ORDER BY Cycle DESC);
WITH marks AS (
SELECT datatimestamp,
CASE
WHEN DATEDIFF(Second, LAG(datatimestamp, 1, DATEADD(Second, -101, datatimestamp)) OVER (ORDER BY datatimestamp), datatimestamp) > 100
THEN 1 ELSE 0
END AS NextC
FROM [Data]
WHERE datatimestamp > @lastCycleEnd
)
SELECT @lastCycle + ROW_NUMBER() OVER (ORDER BY d.datatimestamp) AS Cycle, d.datatimestamp AS CycleBegin
FROM [Data] d
INNER JOIN marks m On m.datatimestamp = d.datatimestamp
WHERE m.NextC = 1
这将返回新的 Cycles 及其 CycleStarts,因为示例数据的结果如下所示:
循环 | 循环开始 |
---|---|
13 | 2023-12-05 00:05:20 |
14 | 2023-12-05 00:10:01 |
如何获取 CycleEnd 以及第三列?
你很接近。添加一个额外的步骤,计算按日期排序的
NextC
的运行总和。这将为每个“集合”时间戳编号;对该列执行分组。
with cte1 as (
select datatimestamp, case when datediff(second, lag(datatimestamp) over (order by datatimestamp), datatimestamp) < 100 then 0 else 1 end as flag
from data
), cte2 as (
select *, sum(flag) over (order by datatimestamp) as grpnum
from cte1
)
select min(datatimestamp), max(datatimestamp)
from cte2
group by grpnum
一旦您从
NextC
CTE 获得数据,就不再对 marks
进行过滤。即
数据时间戳 | 下一个C |
---|---|
2023-12-05T00:05:20 | 1 |
2023-12-05T00:05:21 | 0 |
2023-12-05T00:05:22 | 0 |
2023-12-05T00:10:01 | 1 |
2023-12-05T00:10:02 | 0 |
2023-12-05T00:10:03 | 0 |
而是执行
SUM(NextC) OVER(ORDER BY datatimestamp)
,这将为您提供每个组的时间戳值,即
数据时间戳 | 循环 |
---|---|
2023-12-05T00:05:20 | 1 |
2023-12-05T00:05:21 | 1 |
2023-12-05T00:05:22 | 1 |
2023-12-05T00:10:01 | 2 |
2023-12-05T00:10:02 | 2 |
2023-12-05T00:10:03 | 2 |
然后,您可以对此列进行分组并获取最小和最大日期时间以获取开始/结束时间。所以你的最终查询将是:
DECLARE @lastCycle int = (SELECT TOP 1 Cycle FROM Cycles ORDER BY Cycle DESC);
DECLARE @lastCycleEnd datetime = (SELECT TOP 1 CycleEnd FROM Cycles ORDER BY Cycle DESC);
WITH marks AS (
SELECT datatimestamp,
CASE
WHEN DATEDIFF(Second, LAG(datatimestamp, 1, DATEADD(Second, -101, datatimestamp)) OVER (ORDER BY datatimestamp), datatimestamp) > 100
THEN 1 ELSE 0
END AS NextC
FROM [Data]
WHERE datatimestamp > @lastCycleEnd
), marks2 AS (
SELECT m.DataTimeStamp, SUM(m.NextC) OVER (ORDER BY m.DataTimeStamp) AS Cycle
FROM marks AS m)
SELECT @lastCycle + ROW_NUMBER() OVER (ORDER BY m.Cycle) AS Cycle,
MIN(m.datatimestamp) AS CycleBegin ,
MAX(m.datatimestamp) AS CycleEnd
FROM marks2 m
GROUP BY Cycle;