我有一个带列的 postgresql 表;
id
(int8)
user_id
(可变字符)
is_favorite
(布尔值)
join_time
(时间戳)
我想在某些条件下删除此表中的某些行。
条件:
user_id
,必须有最多10
user_id
的带有 is_favorite=true
的行(已经有一个约束,即 is_favorite=true
必须是每个 user_id
最多 5 行并且 is_favorite=true
可以小于 5), join_time
.我想删除具有上述条件的表中每个 user_id 超过 10 行的行。这在 PostgreSql 中可能吗?
例如下表;
for
user_id = 655caab8-ce81-11ed-afa1-0242ac120002
-> 11,12,13,14.rows must be deleted.
for
user_id = 81c126b6-ce81-11ed-afa1-0242ac120002
-> 25,26.rows must be deleted.
id|user_id |is_favorite|join_time
------------------------------------+------------------------------------+-------
1 |655caab8-ce81-11ed-afa1-0242ac120002|true |2023-03-04 15:16:40.000 +0300
2 |655caab8-ce81-11ed-afa1-0242ac120002|true |2023-03-03 15:16:25.000 +0300
3 |655caab8-ce81-11ed-afa1-0242ac120002|true |2023-03-02 15:16:40.000 +0300
4 |655caab8-ce81-11ed-afa1-0242ac120002|false |2023-04-22 15:16:40.000 +0300
5 |655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-23 15:16:25.000 +0300
6 |655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-21 15:16:25.000 +0300
7 |655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-20 15:16:40.000 +0300
8 |655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-19 15:16:25.000 +0300
9 |655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-18 15:16:40.000 +0300
10|655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-17 15:16:25.000 +0300
11|655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-16 15:16:40.000 +0300
12|655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-15 15:16:25.000 +0300
13|655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-14 15:16:40.000 +0300
14|655caab8-ce81-11ed-afa1-0242ac120002|false |2023-03-14 15:16:39.000 +0300
15|81c126b6-ce81-11ed-afa1-0242ac120002|true |2023-03-01 12:16:25.000 +0300
16|81c126b6-ce81-11ed-afa1-0242ac120002|true |2023-03-01 11:16:25.000 +0300
17|81c126b6-ce81-11ed-afa1-0242ac120002|true |2023-03-01 10:16:25.000 +0300
18|81c126b6-ce81-11ed-afa1-0242ac120002|true |2023-03-01 09:16:25.000 +0300
19|81c126b6-ce81-11ed-afa1-0242ac120002|true |2023-03-01 08:16:25.000 +0300
20|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 07:16:25.000 +0300
21|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 06:16:25.000 +0300
22|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 05:16:25.000 +0300
23|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 04:16:25.000 +0300
24|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 03:16:25.000 +0300
25|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 02:16:25.000 +0300
26|81c126b6-ce81-11ed-afa1-0242ac120002|false |2023-03-01 01:16:25.000 +0300
由于您正在处理整个表,因此使用带有
row_number()
的简单子查询应该是最快的:
DELETE FROM tbl t
USING (
SELECT id, row_number() OVER (PARTITION BY user_id
ORDER BY is_favorite DESC, join_time DESC
ROWS UNBOUNDED PRECEDING) AS rn
FROM tbl t
) del
WHERE t.id = del.id
AND del.rn > 10;
添加
ROWS UNBOUNDED PRECEDING
是可选的,但应该会大大加快速度(直到 Postgres 16 发布)。参见:
应用正确的排序顺序,这将跳过每个用户最想要的前 10 行并删除其余行。
true
按降序排列在false
之前。参见:
如果能有
null
值,就需要做更多。比如,首先澄清你的问题。
显然,并发写入会出现竞争条件。如果可以并发写入负载,先在同一个事务中对表进行写入锁定...
如果这要删除大部分行,那么创建一个新的幸存者表可能会更便宜......
还有其他方法。喜欢: