仅删除连续的重复行

问题描述 投票:3回答:2

我已经从API收集了数据以建立历史记录。最初,我每五分钟保存一次[[all值。后来,我将程序更改为仅保存已更改的数据。

现在,我要清理旧数据并删除同一countaccount中以前的记录中id不变的所有值。

account id count time 42 12147 492 2015-09-20 11:31:14.0 42 12147 492 2015-09-20 11:36:19.0 // delete 13 12147 246 2015-09-20 11:31:14.0 2 12253 183 2015-09-20 11:36:19.0 2 19684 805 2015-09-20 12:00:41.0 // note in next comment 2 19684 810 2015-09-20 12:05:41.0 2 19684 805 2015-09-20 12:10:41.0 // we had this combination, but don't delete this record because the previous value was different 2 19684 805 2015-09-20 12:15:41.0 // delete 2 19684 805 2015-09-20 12:20:41.0 // delete 2 19684 806 2015-09-20 12:25:41.0

我试图通过group byaccountid上的count解决此问题。但是,采用这种方法,它将删除

非连续

重复项,即,如果某个记录在一段时间后再次具有相同的值,则它将属于同一组。[我还考虑过编写一个小脚本,如果accountidcount与先前的记录相同,我将遍历所有数据并删除当前行,但是我很好奇一条SQL语句可能实现吗?
mysql sql sql-delete
2个回答
2
投票
您可以使用以下查询:

DELETE history FROM history INNER JOIN (SELECT MIN(time) AS minTime, account, id, count FROM history GROUP BY account, id, count) AS h ON history.account = h.account AND history.id = h.id AND history.count = h.count WHERE history.time > h.minTime

Demo here

编辑:

完成编辑后,我认为OP的示例数据中仍然存在一些错误(time字段应按升序排列。

使用表中存在的PK的其他假设,您可以使用以下查询:

SELECT pk FROM history AS h1 WHERE account = (SELECT account FROM history AS h2 WHERE h1.account = h2.account AND h1.id = h2.id AND h2.time < h1.time ORDER BY time DESC LIMIT 1) AND id = (SELECT id FROM history AS h2 WHERE h1.account = h2.account AND h1.id = h2.id AND h2.time < h1.time ORDER BY time DESC LIMIT 1) AND count = (SELECT count FROM history AS h2 WHERE h1.account = h2.account AND h1.id = h2.id AND h2.time < h1.time ORDER BY time DESC LIMIT 1)

为了识别

要删除的记录(请参见this demo)。

现在您可以使用NOT IN运算符轻松删除不需要的行:

DELETE FROM history WHERE pk IN ( SELECT x.pk FROM ( SELECT pk FROM history AS h1 WHERE account = (SELECT account FROM history AS h2 WHERE h1.account = h2.account AND h1.id = h2.id AND h2.time < h1.time ORDER BY time DESC LIMIT 1) AND id = (SELECT id FROM history AS h2 WHERE h1.account = h2.account AND h1.id = h2.id AND h2.time < h1.time ORDER BY time DESC LIMIT 1) AND count = (SELECT count FROM history AS h2 WHERE h1.account = h2.account AND h1.id = h2.id AND h2.time < h1.time ORDER BY time DESC LIMIT 1)) AS x)

Demo here

编辑2:

使用变量来定位要删除的pk值可能会导致查询速度大大提高:

SELECT pk FROM ( SELECT pk, account, id, count, time, @rn := IF (account = @acc AND id = @id AND count = @count, @rn + 1, 1) AS rn, @acc := account, @id := id, @count := count FROM history CROSS JOIN (SELECT @rn = 0, @acc = 0, @id = 0, @count = 0) AS vars ORDER BY account, id, time, count ) AS t WHERE t.rn > 1

Demo here

0
投票
您可以使用此(未调试的)代码删除除第一个以外的所有内容:

delete from history h1 where exists (select h2 from history where h1.account = h2.account and h1.id = h2.id and h1.count = h2.count and h1.time < h2.time )

© www.soinside.com 2019 - 2024. All rights reserved.