了解 PostgreSQL 中的执行计划和并行化

Question

我有一个表，在类型为

vector

的名为

"char"[]

的列中存储 1M 384 维向量。我正在尝试使用并行化来加速以下查询：

EXPLAIN ANALYZE
WITH Vars(key) as (
    VALUES (array_fill(1, ARRAY[384])::vector)
)
SELECT content_id
FROM MyTable, Vars
ORDER BY vector::int[]::vector <#> key
LIMIT 10;

key

只是一个由所有 1 组成的玩具向量。

<#>

是

pgvector

扩展的点积运算符，

vector

是该扩展定义的类型，据我理解，它类似于

real[]

。

我正在 AWS RDS 的免费套餐中运行此查询。 Postgres 实例有两个 vCPU，因此我预计使用两个工作线程会带来改进。考虑到向量的高维性，我预计计算点积将主导执行时间，因此使用两个工作线程可以将性能提高近 2 倍。

要在没有并发的情况下运行，我这样做：

set max_parallel_workers = 1;
set max_parallel_workers_per_gather = 1;
set enable_parallel_hash = off;

输出为：

 Limit  (cost=94267.29..94268.44 rows=10 width=12) (actual time=15624.961..15634.530 rows=10 loops=1)
   ->  Gather Merge  (cost=94267.29..161913.85 rows=588231 width=12) (actual time=15624.958..15634.524 rows=10 loops=1)
         Workers Planned: 1
         Workers Launched: 1
         ->  Sort  (cost=93267.28..94737.86 rows=588231 width=12) (actual time=15607.302..15607.305 rows=7 loops=2)
               Sort Key: ((((mytable.vector)::integer[])::vector <#> '[1,1,...,1]'::vector))
               Sort Method: top-N heapsort  Memory: 25kB
               Worker 0:  Sort Method: top-N heapsort  Memory: 25kB
               ->  Parallel Seq Scan on mytable  (cost=0.00..80555.82 rows=588231 width=12) (actual time=0.413..15452.274 rows=500000 loops=2)
 Planning Time: 10.502 ms
 Execution Time: 15635.121 ms
(11 rows)

接下来，我们强制两名工人：

set force_parallel_mode = on;
set max_parallel_workers = 2;
set max_parallel_workers_per_gather = 2;
set enable_parallel_hash = on;

这是输出：

 Limit  (cost=83268.20..83269.37 rows=10 width=12) (actual time=14369.219..14379.656 rows=10 loops=1)
   ->  Gather Merge  (cost=83268.20..180496.59 rows=833328 width=12) (actual time=14369.217..14379.647 rows=10 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Sort  (cost=82268.18..83309.84 rows=416664 width=12) (actual time=14352.711..14352.714 rows=9 loops=3)
               Sort Key: ((((mytable.vector)::integer[])::vector <#> '[1,1,...,1]'::vector))
               Sort Method: top-N heapsort  Memory: 25kB
               Worker 0:  Sort Method: top-N heapsort  Memory: 25kB
               Worker 1:  Sort Method: top-N heapsort  Memory: 25kB
               ->  Parallel Seq Scan on mytable  (cost=0.00..73264.22 rows=416664 width=12) (actual time=0.611..14204.459 rows=333333 loops=3)
 Planning Time: 7.062 ms
 Execution Time: 14380.487 ms
(12 rows)

主要问题是：为什么没有观察到预期的时间改进？不过，我真的很感激能浏览一下

EXPLAIN ANALYZE

的输出。特别是：

显示的行数似乎并未反映表中的实际行数（即正好 1M）。
在哪里可以看到与点积计算相关的时间测量/估计？
为什么第二个输出中显示 12 行？（我认为应该是 11 行：一行用于标题，十行用于数据）
最重要的是，
```
EXPLAIN ANALYZE
```
的输出揭示了未达到预期时间改进的原因是什么？

Answer 1

您有一个基本的误解：除了后端进程之外，还启动了并行工作进程，默认情况下，后端进程也完成其份额的工作。因此，如果您设置 max_parallel_workers_per_gather = 1，您不会禁用并行查询，而是将其限制为一个额外的进程。将参数设置为 0 以禁用并行查询。

有了这个，就更容易回答你的问题了：

EXPLAIN (ANALYZE, VERBOSE)
，您可以看到它们：然后您可以在计算函数的节点的输出列中看到函数调用。函数执行时间不单独列出，而是包含在节点的执行时间中。
请注意，如果您启用了
pg_stat_user_functions
，您可以从
```
track_functions
```
获取函数执行统计信息。
EXPLAIN
的输出，而不是查询的输出。
track_io_timing
。

了解 PostgreSQL 中的执行计划和并行化

问题描述投票：0回答：1

1个回答

最新问题

了解 PostgreSQL 中的执行计划和并行化

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1