数据结构: 该表有一个 PK“id”和一个 jsonb 列“data”。 “数据”包含对象“实例”的数组。每个“实例”都有一些值和一个“路径”数组。 “path”数组是深度嵌套树状层次结构的平面表示。每个“路径”由具有字符串“id”(不唯一)和整数“索引”(仅在同一级别(同一父结构)上唯一)的对象组成, 但可以在不同级别上重复。
Example:
"instances": [
{
"path": [
{"id": "root", "index": 2},
{"id": "folder1", "index": 0},
{"id": "folder2", "index": 0},
{"id": "folder3", "index": 0}
],
"pdf": "pdf in 1,2,3",
"info": "some other data"
},
...
]
我需要能够在 Postgres jsonb 表中同一指定文件夹的层次结构中搜索多个特定值。
例如,搜索在同一文件夹2的层次结构中同时具有“pdf in 1,2,3”和“text in 1,2,3”值的项目(意味着同一父结构中的folder2)。
这是我提出的查询:
WITH indexed_paths AS (
SELECT id, instance -> 'index' as inst_idx,
MIN(CASE WHEN path_element @> '{"id": "folder2"}' THEN path_idx END)
OVER (PARTITION BY id, instance -> 'index') AS searched_index,
path_idx, path_element, instance
FROM "flat",
jsonb_array_elements(data -> 'instances') AS instance,
jsonb_array_elements(instance -> 'path') WITH ORDINALITY arr(path_element, path_idx)
WHERE
instance -> 'path' @> '[{"id": "folder2"}]'
ORDER BY id, inst_idx, path_idx
), combined_paths AS (
SELECT id, jsonb_agg(path_element) as path, instance
FROM indexed_paths
WHERE path_idx <= searched_index
GROUP BY id, inst_idx, instance
), combined_instances AS (
SELECT id, path, jsonb_agg(instance) as instances
FROM combined_paths
GROUP BY id, path
)
SELECT *
FROM "flat" f
WHERE EXISTS (
SELECT 1
FROM combined_instances ci
WHERE
ci.id = f.id
AND ci.instances @> '[{"pdf": "pdf in 1,2,3"}, {"jpg": "jpg in 1,2,3"}]'::jsonb
);
indexed_paths CTE 展开每一行的实例以及每个实例的每个路径 到单独的“path_element”行中,枚举所有带有索引的path_element。 如果 path_element 是搜索到的,它将获取它的索引并将其写入 到具有匹配 id 和实例索引的所有行的新列。 如果在同一 id 和实例索引中多次出现搜索到的 path_element 它需要最小的一个。
combined_paths CTE 聚合 path_elements,按 id 和实例索引对它们进行分组,检查是否 element_path 索引是 <= to the searched index, this way reconstructing element_paths back but only up to the searched path_element.
combined_instances CTE 通过匹配 id 和重构路径来聚合所有实例的数据。
最终的SELECT语句在combined_instances中搜索指定的数据并将其连接起来 与 id 的原始表。*
它的工作方式正是我想要的,但它太冗长太长了。有什么办法可以简化吗?更改 jsonb 列的数据结构是一种选择。也欢迎其他一些算法。基本上任何帮助都会非常感激。
我会尝试更多地使用子查询和横向连接,而不是在多个 CTE 中扩展和重新分组行:
SELECT id, to_jsonb(ancestor_path) AS ancestor_path, instances
FROM "flat" f,
LATERAL (
SELECT element->>'id' AS ancestor_name, path[0:ancestor.idx] AS ancestor_path, jsonb_agg(instance) AS instances
FROM jsonb_array_elements(f.data -> 'instances') AS instance,
jsonb_to_record(instance) AS _i(path jsonb[]),
unnest(path) WITH ORDINALITY AS ancestor(element, idx)
GROUP BY path[0:ancestor.idx], element->>'id'
) AS data
WHERE ancestor_name = 'folder2'
AND instances @> '[{"pdf": "pdf in 1,2,3"}, {"jpg": "jpg in 1,2,3"}]'::jsonb
(此方法和以下方法的在线演示)
或者使用
EXISTS
,如果您只想要整个 flat
行(无论其 data
中有多少个匹配项):
SELECT *
FROM "flat" f
WHERE EXISTS (
SELECT 1
FROM jsonb_array_elements(f.data -> 'instances') AS instance,
jsonb_to_record(instance) AS _i(path jsonb[]),
unnest(path) WITH ORDINALITY AS ancestor(element, idx)
WHERE element->>'id' = 'folder2'
GROUP BY path[0:ancestor.idx]
HAVING jsonb_agg(instance) @> '[{"pdf": "pdf in 1,2,3"}, {"jpg": "jpg in 1,2,3"}]'::jsonb
)
GROUP BY
和在聚合 instances
数组中搜索的替代方案是 CTE 关系的自连接:
SELECT *
FROM "flat" f
WHERE EXISTS (
WITH instances AS (
SELECT value, element->>'id' AS ancestor_name, path[0:ancestor.idx] AS ancestor_path
FROM jsonb_array_elements(f.data -> 'instances') AS el(value),
jsonb_to_record(value) AS _i(path jsonb[]),
unnest(path) WITH ORDINALITY AS ancestor(element, idx)
)
SELECT *
FROM instances a JOIN instances b USING (ancestor_path, ancestor_name)
WHERE ancestor_name = 'folder2'
AND a.value @> '{"pdf": "pdf in 1,2,3"}'
AND b.value @> '{"jpg": "jpg in 1,2,3"}'
)
您也可以只访问
ancestor_name
的最后一个元素,而不是将 GROUP BY
作为子查询中的单独列返回(这会使 JOIN … USING
或 ancestor_path
更难看):
SELECT *
FROM "flat" f,
LATERAL (
SELECT to_jsonb(path[0:ancestor.idx]) AS ancestor_path, jsonb_agg(instance) AS instances
FROM jsonb_array_elements(f.data -> 'instances') AS instance,
jsonb_to_record(instance) AS _i(path jsonb[]),
unnest(path) WITH ORDINALITY AS ancestor(element, idx)
GROUP BY path[0:ancestor.idx], element->>'id'
) AS data
WHERE ancestor_path->-1->>'id' = 'folder2'
AND instances @> '[{"pdf": "pdf in 1,2,3"}, {"jpg": "jpg in 1,2,3"}]'::jsonb
我猜使这些查询比你的查询更短的主要技巧是使用数组切片来生成祖先路径以及使用
jsonb_to_record
将 jsonb
数组转换为 postgres 数组。这可能可以通过多种方式实现。