合并重复行并删除重复项

问题描述 投票:0回答:1

我使用 SQL 来存储日志数据。在日志数据内部有重复或重复的连续行,我想将它们压缩成一行。新行需要反映重复或重复的连续行块的开始和结束时间以及重复的次数。但是我仍然想区分行何时发生变化。

示例:

时间戳 A 栏 B 栏 C 栏 D 栏 E 栏
时间1 2 2 2 2 2
时间2 2 2 2 2 2
时间3 9 9 9 9 9
时间4 2 2 2 2 2
时间5 2 2 2 2 2
时间6 8 8 8 8 8
时间7 8 8 8 8 8
时间8 2 2 2 2 2
时间9 2 2 2 2 2
时间10 2 2 2 2 2

想要的结果:

开始时间 结束时间 A 栏 B 栏 C 栏 D 栏 E 栏 发生次数
时间1 时间2 2 2 2 2 2 2
时间3 时间3 9 9 9 9 9 1
时间4 时间5 2 2 2 2 2 2
时间6 时间7 8 8 8 8 8 2
时间8 时间10 2 2 2 2 2 3

这是我得到的结果:

开始时间 结束时间 A 栏 B 栏 C 栏 D 栏 E 栏 发生次数
时间1 时间10 2 2 2 2 2 7
时间3 时间3 9 9 9 9 9 1
时间6 时间7 8 8 8 8 8 2

我尝试过,但没有得到我想要的结果:

SELECT MIN(TimeStamp) AS StartTime, MIN(TimeStamp) AS EndTime, Column A, Column B, Column C, Column D, Column E, count(*)
FROM table
GROUP BY Column A, Column B, Column C, Column D, Column E,
HAVING COUNT(*) > 1

我也厌倦了实施分区,但成功有限

SELECT * FROM (

SELECT MIN(TimeStamp) AS StartTime, MIN(TimeStamp) AS EndTime, Column A, Column B, Column C, Column D, Column E
ROW_NUMBER () OVER(Partition by Column A, Column B, Column C, Column D, Column E
ORDER BY TimeStamp) RowNum

FROM table
) d

我尝试使用最小值和最大值,但没有成功。

sql sql-server
1个回答
0
投票

CRME

CREATE TABLE Log (
  Timestamp datetime NOT NULL DEFAULT getdate(),
  ColumnA int NOT NULL,
  ColumnB int NOT NULL,
  ColumnC int NOT NULL,
  ColumnD int NOT NULL,
  ColumnE int NOT NULL
)

INSERT INTO Log (Timestamp, ColumnA, ColumnB, ColumnC, ColumnD, ColumnE)
VALUES
  ('2024-04-30 17:10:01', 2,  2,  2,  2,  2),
  ('2024-04-30 17:10:02', 2,  2,  2,  2,  2),
  ('2024-04-30 17:10:03', 9,  9,  9,  9,  9),
  ('2024-04-30 17:10:04', 2,  2,  2,  2,  2),
  ('2024-04-30 17:10:05', 2,  2,  2,  2,  2),
  ('2024-04-30 17:10:06', 8,  8,  8,  8,  8),
  ('2024-04-30 17:10:07', 8,  8,  8,  8,  8),
  ('2024-04-30 17:10:08', 2,  2,  2,  2,  2),
  ('2024-04-30 17:10:09', 2,  2,  2,  2,  2),
  ('2024-04-30 17:10:10', 2,  2,  2,  2,  2)

SELECT * FROM Log;

范围重复数据删除

WITH markedChanges AS (
    SELECT
        TimeStamp,
        ColumnA,
        ColumnB,
        ColumnC,
        ColumnD,
        ColumnE,
        CASE WHEN LAG(ColumnA) OVER (ORDER BY TimeStamp) = ColumnA AND
                  LAG(ColumnB) OVER (ORDER BY TimeStamp) = ColumnB AND
                  LAG(ColumnC) OVER (ORDER BY TimeStamp) = ColumnC AND
                  LAG(ColumnD) OVER (ORDER BY TimeStamp) = ColumnD AND
                  LAG(ColumnE) OVER (ORDER BY TimeStamp) = ColumnE
             THEN 0
             ELSE 1
        END AS ChangeFlag
    FROM Log
), groupedRows AS (
    SELECT
        TimeStamp,
        ColumnA,
        ColumnB,
        ColumnC,
        ColumnD,
        ColumnE,
        SUM(ChangeFlag) OVER (ORDER BY TimeStamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS GroupID
    FROM MarkedChanges
), aggregatedResults AS (
    SELECT
        MIN(TimeStamp) AS StartTime,
        MAX(TimeStamp) AS EndTime,
        ColumnA,
        ColumnB,
        ColumnC,
        ColumnD,
        ColumnE,
        COUNT(*) AS NumberOfOccurrences
    FROM GroupedRows
    GROUP BY GroupID, ColumnA, ColumnB, ColumnC, ColumnD, ColumnE
)
SELECT
    StartTime,
    EndTime,
    ColumnA,
    ColumnB,
    ColumnC,
    ColumnD,
    ColumnE,
    NumberOfOccurrences
FROM AggregatedResults
ORDER BY StartTime;

之前

| Timestamp               | ColumnA | ColumnB | ColumnC | ColumnD | ColumnE |
|-------------------------|---------|---------|---------|---------|---------|
| 2024-04-30 17:10:01.000 | 2       | 2       | 2       | 2       | 2       |
| 2024-04-30 17:10:02.000 | 2       | 2       | 2       | 2       | 2       |
| 2024-04-30 17:10:03.000 | 9       | 9       | 9       | 9       | 9       |
| 2024-04-30 17:10:04.000 | 2       | 2       | 2       | 2       | 2       |
| 2024-04-30 17:10:05.000 | 2       | 2       | 2       | 2       | 2       |
| 2024-04-30 17:10:06.000 | 8       | 8       | 8       | 8       | 8       |
| 2024-04-30 17:10:07.000 | 8       | 8       | 8       | 8       | 8       |
| 2024-04-30 17:10:08.000 | 2       | 2       | 2       | 2       | 2       |
| 2024-04-30 17:10:09.000 | 2       | 2       | 2       | 2       | 2       |
| 2024-04-30 17:10:10.000 | 2       | 2       | 2       | 2       | 2       |

之后

| StartTime               | EndTime                 | ColumnA | ColumnB | ColumnC | ColumnD | ColumnE | NumberOfOccurrences |
|-------------------------|-------------------------|---------|---------|---------|---------|---------|---------------------|
| 2024-04-30 17:10:01.000 | 2024-04-30 17:10:02.000 | 2       | 2       | 2       | 2       | 2       | 2                   |
| 2024-04-30 17:10:03.000 | 2024-04-30 17:10:03.000 | 9       | 9       | 9       | 9       | 9       | 1                   |
| 2024-04-30 17:10:04.000 | 2024-04-30 17:10:05.000 | 2       | 2       | 2       | 2       | 2       | 2                   |
| 2024-04-30 17:10:06.000 | 2024-04-30 17:10:07.000 | 8       | 8       | 8       | 8       | 8       | 2                   |
| 2024-04-30 17:10:08.000 | 2024-04-30 17:10:10.000 | 2       | 2       | 2       | 2       | 2       | 3                   |

dbFiddle

© www.soinside.com 2019 - 2024. All rights reserved.