大的COUNT DISTINCT在postgresql中执行缓慢。

Question

我正在对postgresql 12上的一个表运行一个大的count(DISTINCT)查询组查询。这个表大概有32GB，300MM行。它是按YEAR分区的。各组的分布差不多完全一致。

EXPLAIN (ANALYZE,BUFFERS) 
SELECT 
date_trunc('month', condition_start_date::timestamp) as dt, 
condition_source_value, 
COUNT(DISTINCT person_id) 
FROM synpuf5.condition_occurrence_yrpart 
GROUP BY date_trunc('month', condition_start_date::timestamp), condition_source_value 
ORDER BY COUNT(DISTINCT person_id) DESC LIMIT 10;

这是查询规划器的输出结果。

QUERY PLAN
Limit  (cost=50052765961.82..50052765961.85 rows=10 width=21) (actual time=691022.306..691022.308 rows=10 loops=1)
   Buffers: shared hit=3062256 read=222453
   ->  Sort  (cost=50052765961.82..50052777188.87 rows=4490820 width=21) (actual time=690786.364..690786.364 rows=10 loops=1)
         Sort Key: (count(DISTINCT condition_occurrence_yrpart_2007.person_id)) DESC
         Sort Method: top-N heapsort  Memory: 26kB
         Buffers: shared hit=3062256 read=222453
         ->  GroupAggregate  (cost=50049709699.80..50052668916.82 rows=4490820 width=21) (actual time=567099.326..690705.612 rows=360849 loops=1)
               Group Key: (date_trunc('month'::text, (condition_occurrence_yrpart_2007.condition_start_date)::timestamp without time zone)), condition_occurrence_yrpart_2007.condition_source_value
               Buffers: shared hit=3062253 read=222453
               ->  Sort  (cost=50049709699.80..50050432663.48 rows=289185472 width=17) (actual time=567098.345..619461.044 rows=289182385 loops=1)
                     Sort Key: (date_trunc('month'::text, (condition_occurrence_yrpart_2007.condition_start_date)::timestamp without time zone)), condition_occurrence_yrpart_2007.condition_source_value
                     Sort Method: quicksort  Memory: 30333184kB
                     Buffers: shared hit=3062246 read=222453
                     ->  Append  (cost=10000000000.00..50009068412.44 rows=289185472 width=17) (actual time=0.065..74222.771 rows=289182385 loops=1)
                           Buffers: shared hit=3062240 read=222453
                           ->  Seq Scan on condition_occurrence_yrpart_2007  (cost=10000000000.00..10000001125.61 rows=42774 width=17) (actual time=0.064..13.756 rows=42774 loops=1)
                                 Buffers: shared read=484
                           ->  Seq Scan on condition_occurrence_yrpart_2008  (cost=10000000000.00..10002732063.72 rows=103678448 width=17) (actual time=0.039..21209.532 rows=103676930 loops=1)
                                 Buffers: shared hit=954918 read=221969
                           ->  Seq Scan on condition_occurrence_yrpart_2009  (cost=10000000000.00..10003024874.44 rows=114743696 width=17) (actual time=0.142..20191.131 rows=114743002 loops=1)
                                 Buffers: shared hit=1303719
                           ->  Seq Scan on condition_occurrence_yrpart_2010  (cost=10000000000.00..10001864406.36 rows=70720224 width=17) (actual time=0.050..12464.117 rows=70719679 loops=1)
                                 Buffers: shared hit=803603
                           ->  Seq Scan on condition_occurrence_yrpart_2011  (cost=10000000000.00..10000000014.95 rows=330 width=17) (actual time=0.022..0.022 rows=0 loops=1)

我还对postgresql进行了大量的配置试图将所有的数据放在内存中，包括：

shared_buffers = 80GB
work_mem = 32GB
max_worker_processes = 32 
max_parallel_workers_per_gather = 16
max_parallel_workers = 32
wal_compression = on
max_wal_size = 8GB
enable_seqscan = off
enable_partitionwise_join = on
enable_partitionwise_aggregate = on
parallel_tuple_cost = 0.01
parallel_setup_cost = 100.0
shared_preload_libraries = 'pg_prewarm'
effective_cache_size = 192GB

我运行的虚拟机是个庞然大物。256 GB ram，32个核心。SSD是存放postgres目录的地方...

这里有几个问题。

为什么这么慢？
为什么它不能并行运行？
为什么当我再次运行时，尽管有pg_prewarm，性能却没有提高？
为什么当我的会话结束时，内存会被释放？我正在使用预热？

Answer 1

为什么这么慢？
对3亿行进行排序需要一段时间，即使有大方的 work_mem. 超过9分钟的查询执行时间都花在了排序上。GROUP BY.
为什么它不能并行操作？
因为在PostgreSQL中排序不能并行化。
为什么我在pg_prewarm的情况下再次运行时，性能没有提高？
因为所有的东西都已经被缓存了。
为什么当我的会话结束时，内存会被释放？我使用的是预热？
当你的会话结束时，你的后台使用的内存肯定会被释放。后台使用的内存是用于 shared_buffers 不会被释放，因为那是数据库中所有进程共享的缓存。你不希望该内存被释放。

这是一个繁重的查询，它需要一些时间。我不认为这可以改进。

你没有告诉我们分区表达式是什么，但由于它可能不是 date_trunc('month', condition_start_date::timestamp)，你不会得到分区聚合，尽管有 enable_partitionwise_aggregate = on. PostgreSQL没有足够的智能来推断它是否真的可以做到这一点（假设你的分区在 condition_start_date).

Answer 2

它很慢，因为做3亿行的东西需要一些时间。

它不是并行操作，我想是因为 COUNT(DISTINCT...) 代码非常老旧，最近也没怎么关注。它不知道如何使用哈希聚集，也不知道如何并行操作。 (在我手里，如果我把parallel_tuple_cost全部降为0，它确实可以并行操作，但是聚集在大规模排序下面，没有任何好处。但我没有使用你的真实数据，所以可能会得到不同的结果）。)

你可以绕过不灵活的 COUNT(DISTINCT...) 通过在不同的步骤中执行 DISTINCT 和 COUNT。

select dt, condition_source_value, count(person_id) from (
   SELECT distinct                                             
   date_trunc('month', condition_start_date::timestamp) as dt, 
   condition_source_value, 
   person_id                        
   FROM condition_occurrence_yrpart
) foo 
GROUP BY dt, condition_source_value 
ORDER BY COUNT(person_id) DESC LIMIT 10;

但它仍然可能无法在正确的地方进行平行化。

大的COUNT DISTINCT在postgresql中执行缓慢。

问题描述投票：0回答：1

1个回答

最新问题

大的COUNT DISTINCT在postgresql中执行缓慢。

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1