我已经从API收集了数据以建立历史记录。最初,我每五分钟保存一次[[all值。后来,我将程序更改为仅保存已更改的数据。
现在,我要清理旧数据并删除同一count
和account
中以前的记录中id
不变的所有值。account id count time
42 12147 492 2015-09-20 11:31:14.0
42 12147 492 2015-09-20 11:36:19.0 // delete
13 12147 246 2015-09-20 11:31:14.0
2 12253 183 2015-09-20 11:36:19.0
2 19684 805 2015-09-20 12:00:41.0 // note in next comment
2 19684 810 2015-09-20 12:05:41.0
2 19684 805 2015-09-20 12:10:41.0 // we had this combination, but don't delete this record because the previous value was different
2 19684 805 2015-09-20 12:15:41.0 // delete
2 19684 805 2015-09-20 12:20:41.0 // delete
2 19684 806 2015-09-20 12:25:41.0
我试图通过重复项,即,如果某个记录在一段时间后再次具有相同的值,则它将属于同一组。[我还考虑过编写一个小脚本,如果group by
,account
和id
上的count
解决此问题。但是,采用这种方法,它将删除非连续
account
,id
和count
与先前的记录相同,我将遍历所有数据并删除当前行,但是我很好奇一条SQL语句可能实现吗?DELETE history
FROM history
INNER JOIN (SELECT MIN(time) AS minTime, account, id, count
FROM history
GROUP BY account, id, count) AS h
ON history.account = h.account AND history.id = h.id AND history.count = h.count
WHERE history.time > h.minTime
Demo here编辑:
完成编辑后,我认为OP的示例数据中仍然存在一些错误(time
字段应按升序排列。使用表中存在的PK的其他假设,您可以使用以下查询:
SELECT pk FROM history AS h1 WHERE account = (SELECT account FROM history AS h2 WHERE h1.account = h2.account AND h1.id = h2.id AND h2.time < h1.time ORDER BY time DESC LIMIT 1) AND id = (SELECT id FROM history AS h2 WHERE h1.account = h2.account AND h1.id = h2.id AND h2.time < h1.time ORDER BY time DESC LIMIT 1) AND count = (SELECT count FROM history AS h2 WHERE h1.account = h2.account AND h1.id = h2.id AND h2.time < h1.time ORDER BY time DESC LIMIT 1)
为了识别要删除的记录(请参见this demo)。
现在您可以使用NOT IN
运算符轻松删除不需要的行:
DELETE FROM history WHERE pk IN ( SELECT x.pk FROM ( SELECT pk FROM history AS h1 WHERE account = (SELECT account FROM history AS h2 WHERE h1.account = h2.account AND h1.id = h2.id AND h2.time < h1.time ORDER BY time DESC LIMIT 1) AND id = (SELECT id FROM history AS h2 WHERE h1.account = h2.account AND h1.id = h2.id AND h2.time < h1.time ORDER BY time DESC LIMIT 1) AND count = (SELECT count FROM history AS h2 WHERE h1.account = h2.account AND h1.id = h2.id AND h2.time < h1.time ORDER BY time DESC LIMIT 1)) AS x)
Demo here编辑2:
使用变量来定位要删除的pk
值可能会导致查询速度大大提高:
SELECT pk FROM ( SELECT pk, account, id, count, time, @rn := IF (account = @acc AND id = @id AND count = @count, @rn + 1, 1) AS rn, @acc := account, @id := id, @count := count FROM history CROSS JOIN (SELECT @rn = 0, @acc = 0, @id = 0, @count = 0) AS vars ORDER BY account, id, time, count ) AS t WHERE t.rn > 1
Demo here
delete from history h1
where exists (select h2
from history
where
h1.account = h2.account and
h1.id = h2.id and
h1.count = h2.count and
h1.time < h2.time
)