在 Postgres 数据库中,我有一个名为
school_id_cards
的视图。该视图捕获学校系统中每个学校和学生的身份证列表。身份证会定期重新发放给学生,因此每个学生可能有任意数量的身份证。每个card_id
都是独一无二的。一个学生可能属于多个学校。记录样本如下所示:
卡_id | 学校_id | 学生id |
---|---|---|
1 | 1 | 1 |
2 | 1 | 1 |
3 | 1 | 2 |
4 | 1 | 2 |
5 | 1 | 3 |
6 | 1 | 3 |
7 | 1 | 4 |
8 | 1 | 4 |
9 | 2 | 5 |
10 | 2 | 5 |
11 | 2 | 6 |
12 | 3 | 7 |
给定一个
school_ids
列表,我想检索每个学校+学生最近创建的身份证列表,仅限于每个school_id
选定的学生数量。
我有以下查询来获取我需要的东西,没有任何限制:
SELECT card_id FROM school_id_cards
WHERE card_id IN (
SELECT MAX(card_id) FROM school_id_cards
WHERE
school_id in (1, 2, 3)
GROUP BY
school_id,
student_id
);
...对于上述示例,返回
(2,4,6,8,10,11,12)
但是,在我的子查询中,我想限制为 where 子句中列出的每个
school_id
返回的学校+学生记录的数量。例如,限制为 2,以获取学校 1 的最多 2 个最近添加的学生、学校 2 的最多 2 个最近添加的学生以及学校 3 的最多 2 个最近添加的学生。在这种情况下,最终结果为(6,8,10,11,12)
。
有一个查询可以完成这个任务吗?
dense_rank
。
首先,获取每个学生/学校的最新卡片。 (注意:ID 不能很好地替代时间排序。添加日期时间列。)
select
*,
dense_rank() over(partition by school_id, student_id order by card_id desc) as student_card_order
from school_id_cards
接下来,我们使用它作为子查询来获取学校最近发给学生的卡片的顺序。最近的卡片有
student_card_order = 1
。
with ordered_student_cards as (
select
*,
dense_rank() over(partition by school_id, student_id order by card_id desc) as student_card_order
from school_id_cards
)
select
*,
dense_rank() over(partition by school_id order by card_id desc) as school_card_order
from ordered_student_cards
where student_card_order = 1
最后,我们只能获取每所学校的前两个。
school_card_order <= 2;
with ordered_student_cards as (
select
*,
dense_rank() over(partition by school_id, student_id order by card_id desc) as student_card_order
from school_id_cards
), ordered_school_cards as (
select
*,
dense_rank() over(partition by school_id order by card_id desc) as school_card_order
from ordered_student_cards
where student_card_order = 1
)
select card_id
from ordered_school_cards
where school_card_order <= 2;
示范.
可能有更紧凑或更高效的方法来做到这一点,但是窗口函数和子查询是分解复杂查询的一种方法。
如果您的表很大,您希望避免昂贵的全表顺序扫描。使用智能查询从匹配索引中选择具有索引(仅)扫描的合格行。速度快得多。
通常,您的数据库中应该存在某种“学校”表,每个相关学校只有一行。使查询更简单、更快:
WITH RECURSIVE latest_card AS (
SELECT c.*
FROM school s
CROSS JOIN LATERAL (
SELECT c.school_id, c.card_id, ARRAY[c.student_id] AS leading_ids
FROM school_id_cards c
WHERE c.school_id = s.school_id
ORDER BY c.card_id DESC
LIMIT 1
) c
UNION ALL
SELECT c.*
FROM latest_card l
JOIN LATERAL (
SELECT l.school_id, c.card_id, l.leading_ids || student_id
FROM school_id_cards c
WHERE c.school_id = l.school_id
AND c.card_id < l.card_id
AND c.student_id <> ALL (l.leading_ids)
ORDER BY c.card_id DESC
LIMIT 1
) C ON cardinality(l.leading_ids) < 2 -- your limit per school here!
)
SELECT card_id
FROM latest_card
ORDER BY card_id;
正如您所演示的那样,这对于每所学校的小限制来说可以很好地扩展。对于较大的限制,我会切换到不同的查询。
关于递归CTE(rCTE)的使用:
确保有一个匹配的索引,例如
CREATE INDEX ON school_id_cards (school_id DESC, card_id DESC);
具有(默认)升序排序顺序的更简单的索引几乎没有任何糟糕。 Postgres 可以向后扫描 B 树索引。只有相反的排序顺序才不太理想。
如果没有
school
表:
WITH RECURSIVE latest_card AS (
(
SELECT DISTINCT ON (school_id)
school_id, card_id, ARRAY[student_id] AS leading_ids
FROM school_id_cards c
ORDER BY school_id DESC, card_id DESC
)
UNION ALL
SELECT c.*
FROM latest_card l
JOIN LATERAL (
SELECT l.school_id, c.card_id, l.leading_ids || student_id
FROM school_id_cards c
WHERE c.school_id = l.school_id
AND c.card_id < l.card_id
AND c.student_id <> ALL (l.leading_ids)
ORDER BY c.card_id DESC
LIMIT 1
) C ON cardinality(l.leading_ids) < 2 -- your limit per school here!
)
SELECT card_id
FROM latest_card
ORDER BY card_id;
关于
DISTINCT ON
:
您可以用另一个嵌套的 rCTE 替换非递归项来生成学校列表(可能使用最新的卡片来启动)...
但确实应该有一张
school
桌子。如果没有,请创建它。