我使用 SQL 来存储日志数据。在日志数据内部有重复或重复的连续行,我想将它们压缩成一行。新行需要反映重复或重复的连续行块的开始和结束时间以及重复的次数。但是我仍然想区分行何时发生变化。
示例:
时间戳 | A 栏 | B 栏 | C 栏 | D 栏 | E 栏 |
---|---|---|---|---|---|
时间1 | 2 | 2 | 2 | 2 | 2 |
时间2 | 2 | 2 | 2 | 2 | 2 |
时间3 | 9 | 9 | 9 | 9 | 9 |
时间4 | 2 | 2 | 2 | 2 | 2 |
时间5 | 2 | 2 | 2 | 2 | 2 |
时间6 | 8 | 8 | 8 | 8 | 8 |
时间7 | 8 | 8 | 8 | 8 | 8 |
时间8 | 2 | 2 | 2 | 2 | 2 |
时间9 | 2 | 2 | 2 | 2 | 2 |
时间10 | 2 | 2 | 2 | 2 | 2 |
想要的结果:
开始时间 | 结束时间 | A 栏 | B 栏 | C 栏 | D 栏 | E 栏 | 发生次数 |
---|---|---|---|---|---|---|---|
时间1 | 时间2 | 2 | 2 | 2 | 2 | 2 | 2 |
时间3 | 时间3 | 9 | 9 | 9 | 9 | 9 | 1 |
时间4 | 时间5 | 2 | 2 | 2 | 2 | 2 | 2 |
时间6 | 时间7 | 8 | 8 | 8 | 8 | 8 | 2 |
时间8 | 时间10 | 2 | 2 | 2 | 2 | 2 | 3 |
这是我得到的结果:
开始时间 | 结束时间 | A 栏 | B 栏 | C 栏 | D 栏 | E 栏 | 发生次数 |
---|---|---|---|---|---|---|---|
时间1 | 时间10 | 2 | 2 | 2 | 2 | 2 | 7 |
时间3 | 时间3 | 9 | 9 | 9 | 9 | 9 | 1 |
时间6 | 时间7 | 8 | 8 | 8 | 8 | 8 | 2 |
我尝试过,但没有得到我想要的结果:
SELECT MIN(TimeStamp) AS StartTime, MIN(TimeStamp) AS EndTime, Column A, Column B, Column C, Column D, Column E, count(*)
FROM table
GROUP BY Column A, Column B, Column C, Column D, Column E,
HAVING COUNT(*) > 1
我也厌倦了实施分区,但成功有限
SELECT * FROM (
SELECT MIN(TimeStamp) AS StartTime, MIN(TimeStamp) AS EndTime, Column A, Column B, Column C, Column D, Column E
ROW_NUMBER () OVER(Partition by Column A, Column B, Column C, Column D, Column E
ORDER BY TimeStamp) RowNum
FROM table
) d
我尝试使用最小值和最大值,但没有成功。
CREATE TABLE Log (
Timestamp datetime NOT NULL DEFAULT getdate(),
ColumnA int NOT NULL,
ColumnB int NOT NULL,
ColumnC int NOT NULL,
ColumnD int NOT NULL,
ColumnE int NOT NULL
)
INSERT INTO Log (Timestamp, ColumnA, ColumnB, ColumnC, ColumnD, ColumnE)
VALUES
('2024-04-30 17:10:01', 2, 2, 2, 2, 2),
('2024-04-30 17:10:02', 2, 2, 2, 2, 2),
('2024-04-30 17:10:03', 9, 9, 9, 9, 9),
('2024-04-30 17:10:04', 2, 2, 2, 2, 2),
('2024-04-30 17:10:05', 2, 2, 2, 2, 2),
('2024-04-30 17:10:06', 8, 8, 8, 8, 8),
('2024-04-30 17:10:07', 8, 8, 8, 8, 8),
('2024-04-30 17:10:08', 2, 2, 2, 2, 2),
('2024-04-30 17:10:09', 2, 2, 2, 2, 2),
('2024-04-30 17:10:10', 2, 2, 2, 2, 2)
SELECT * FROM Log;
WITH markedChanges AS (
SELECT
TimeStamp,
ColumnA,
ColumnB,
ColumnC,
ColumnD,
ColumnE,
CASE WHEN LAG(ColumnA) OVER (ORDER BY TimeStamp) = ColumnA AND
LAG(ColumnB) OVER (ORDER BY TimeStamp) = ColumnB AND
LAG(ColumnC) OVER (ORDER BY TimeStamp) = ColumnC AND
LAG(ColumnD) OVER (ORDER BY TimeStamp) = ColumnD AND
LAG(ColumnE) OVER (ORDER BY TimeStamp) = ColumnE
THEN 0
ELSE 1
END AS ChangeFlag
FROM Log
), groupedRows AS (
SELECT
TimeStamp,
ColumnA,
ColumnB,
ColumnC,
ColumnD,
ColumnE,
SUM(ChangeFlag) OVER (ORDER BY TimeStamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS GroupID
FROM MarkedChanges
), aggregatedResults AS (
SELECT
MIN(TimeStamp) AS StartTime,
MAX(TimeStamp) AS EndTime,
ColumnA,
ColumnB,
ColumnC,
ColumnD,
ColumnE,
COUNT(*) AS NumberOfOccurrences
FROM GroupedRows
GROUP BY GroupID, ColumnA, ColumnB, ColumnC, ColumnD, ColumnE
)
SELECT
StartTime,
EndTime,
ColumnA,
ColumnB,
ColumnC,
ColumnD,
ColumnE,
NumberOfOccurrences
FROM AggregatedResults
ORDER BY StartTime;
之前
| Timestamp | ColumnA | ColumnB | ColumnC | ColumnD | ColumnE |
|-------------------------|---------|---------|---------|---------|---------|
| 2024-04-30 17:10:01.000 | 2 | 2 | 2 | 2 | 2 |
| 2024-04-30 17:10:02.000 | 2 | 2 | 2 | 2 | 2 |
| 2024-04-30 17:10:03.000 | 9 | 9 | 9 | 9 | 9 |
| 2024-04-30 17:10:04.000 | 2 | 2 | 2 | 2 | 2 |
| 2024-04-30 17:10:05.000 | 2 | 2 | 2 | 2 | 2 |
| 2024-04-30 17:10:06.000 | 8 | 8 | 8 | 8 | 8 |
| 2024-04-30 17:10:07.000 | 8 | 8 | 8 | 8 | 8 |
| 2024-04-30 17:10:08.000 | 2 | 2 | 2 | 2 | 2 |
| 2024-04-30 17:10:09.000 | 2 | 2 | 2 | 2 | 2 |
| 2024-04-30 17:10:10.000 | 2 | 2 | 2 | 2 | 2 |
之后:
| StartTime | EndTime | ColumnA | ColumnB | ColumnC | ColumnD | ColumnE | NumberOfOccurrences |
|-------------------------|-------------------------|---------|---------|---------|---------|---------|---------------------|
| 2024-04-30 17:10:01.000 | 2024-04-30 17:10:02.000 | 2 | 2 | 2 | 2 | 2 | 2 |
| 2024-04-30 17:10:03.000 | 2024-04-30 17:10:03.000 | 9 | 9 | 9 | 9 | 9 | 1 |
| 2024-04-30 17:10:04.000 | 2024-04-30 17:10:05.000 | 2 | 2 | 2 | 2 | 2 | 2 |
| 2024-04-30 17:10:06.000 | 2024-04-30 17:10:07.000 | 8 | 8 | 8 | 8 | 8 | 2 |
| 2024-04-30 17:10:08.000 | 2024-04-30 17:10:10.000 | 2 | 2 | 2 | 2 | 2 | 3 |