我在 PostgreSQL 中将(二叉)树表示为表。每一行都是树的一个节点,每一行都有一个父列和一种颜色。
树叶上的节点有颜色(它们的颜色列不为空,可以是绿色或红色)。
我的目标是给整棵树上色。有了这些规则:
如何在 PostgreSQL 中为该算法编写递归查询?
树的例子:
CREATE TEMPORARY TABLE family (
node text
, parent text
, color text
);
INSERT INTO family (parent, node, color)
VALUES
(null , 'A' , null)
, ('A' , 'AB' , null)
, ('A' , 'AC' , null)
, ('AB' , 'ABD' , null)
, ('ABD', 'ABDE', 'green')
, ('AC' , 'ACF' , null)
, ('AC' , 'ACG' , null)
, ('ACF', 'ACFH', 'red')
, ('ACF', 'ACFI', 'green')
, ('ACG', 'ACGJ', 'red')
, ('ACG', 'ACGK', 'red')
;
这棵树,全彩色:
这是一个基于您的规则的示例查询:
WITH RECURSIVE colored_tree AS (
-- Anchor query: select the leaves of the tree and assign their colors
SELECT node, parent, color
FROM family
WHERE color IS NOT NULL
UNION ALL
-- Recursive query: traverse the tree upwards and apply coloring rules
SELECT f.node, f.parent,
CASE
-- If parent has one child, parent's color is child's color
WHEN COUNT(ct.node) = 1 THEN ct.color
-- If parent has two children with the same color, parent's color is children's color
WHEN COUNT(DISTINCT ct.color) = 1 THEN ct.color
-- Otherwise, parent's color is gray
ELSE 'gray'
END AS color
FROM family f
JOIN colored_tree ct ON ct.node = f.parent
GROUP BY f.node, f.parent, ct.color
)
-- Final query: select all nodes and their colors
SELECT node, color
FROM colored_tree
ORDER BY node;
以下是查询的工作原理:
family
表与 colored_tree
CTE 连接起来
关于亲子关系。这遍历树向上,
从叶子到根。colored_tree
CTE 和其中不同颜色的数量
孩子们。基于这些计数,它应用着色规则和
为父节点分配颜色。GROUP BY
子句是聚合子节点所必需的
他们的父母,以便可以应用计数和着色规则
正确。colored_tree
CTE 中选择所有节点及其颜色。您可以使用递归
cte
,聚合after您在子查询中的父节点上执行join
:
with recursive cte(parent, node, color) as (
select t.parent, t.node, case when t.c1 = t.c2 then t.c1 else 'gray' end
from (select f1.parent, f1.node, max(f.color) c1, min(f.color) c2
from family f
join family f1 on f1.node = f.parent where f.color is not null
group by f1.parent, f1.node) t
union all
select t1.parent, t1.node, case when t1.c1 = t1.c2 then t1.c1 else 'gray' end
from (select t.parent, t.node, max(t.color) c1, min(t.color) c2
from (select f.parent, f.node, c.color from cte c
join family f on f.node = c.parent) t
group by t.parent, t.node) t1
)
select c.parent, c.node, c.color from cte c
union all
select f.parent, f.node, f.color from family f where f.color is not null
见小提琴.
A recursive CTE 不允许在递归项中聚合。
我建议一次更新原始表一层,自上而下:
DO
$do$
DECLARE
_max_depth int := (SELECT max(length(node)) - 1 FROM family);
_depth int;
BEGIN
FOR _depth IN REVERSE _max_depth .. 1 LOOP
UPDATE family p
SET color = c.color
FROM (
SELECT parent, CASE WHEN min(color) = max(color) THEN min(color) ELSE 'gray' END AS color
FROM family
WHERE length(node) = _depth + 1
GROUP BY 1
) c
WHERE p.node = c.parent
AND p.color IS DISTINCT FROM c.color; -- optional
END LOOP;
END
$do$;
如果你不想接触原始表,在表的副本上操作,也许是
TEMPORARY
表以获得最佳性能。
我使用
length(node)
作为嵌套级别的指标。如果那不可靠,请将其替换为可靠的嵌套级别。无论哪种方式,如果表很大并且有很多级别,级别指示器上的索引将有助于提高性能。对于将是的演示:
CREATE INDEX family_level_idx ON family (length(node));
添加的
WHERE
子句p.color IS DISTINCT FROM c.color
跳过空更新。参见:
相关:
您可以使用递归公用表表达式 (CTE) 来实现此算法。这是一个适用于您的用例的示例查询:
WITH RECURSIVE colored_tree AS (
-- Base case: leaf nodes
SELECT node, color
FROM family
WHERE color IS NOT NULL
UNION ALL
-- Recursive case: internal nodes
SELECT f.node, (
CASE
WHEN ct_l.color IS NOT NULL AND ct_r.color IS NOT NULL THEN 'gray'
WHEN ct_l.color IS NOT NULL THEN ct_l.color
ELSE ct_r.color
END
) AS color
FROM family f
JOIN colored_tree ct_l ON f.node = ct_l.parent AND f.color IS NULL
JOIN colored_tree ct_r ON f.node = ct_r.parent AND f.color IS NULL)
SELECT * FROM colored_tree;
查询首先定义一个名为 colored_tree 的递归 CTE。递归的基本情况定义为叶节点集,叶节点是具有非空颜色值的节点。递归情况被定义为内部节点的集合,这些节点是具有空颜色值的节点。
递归案例将 family 表与 colored_tree CTE 连接两次:一次用于左孩子,一次用于右孩子。它使用 CASE 语句根据其子节点的颜色确定父节点的颜色。如果两个子项都有一种颜色,则父项颜色为灰色。否则,父颜色是具有颜色的孩子的颜色。
查询从 colored_tree CTE 中选择所有行,它为您提供树中每个节点的颜色。请注意,此查询假定树是非循环的并且没有孤立节点(具有父值的节点在 family 表中不存在)。如果这些假设不成立,您可能需要修改查询以处理这些情况。