嗨,我有一个包含 10 个表的数据库。每个表有大约 0.5-10 亿行,按范围分区然后散列(10x10=100 个分区)。它在下面用于搜索的列 (
id
) 上建立了索引。数据库托管在 Azure PostgreSQL 单服务器上。
测试查询显示大部分时间用于“I/O Timings: read”:
postgres=> EXPLAIN (ANALYZE, BUFFERS) select count(*) from table_id4 where id=654321;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=7458.09..7458.10 rows=1 width=8) (actual time=21141.393..21141.396 rows=1 loops=1)
Buffers: shared read=2256
I/O Timings: read=21096.814
-> Append (cost=41.26..7452.66 rows=2171 width=0) (actual time=197.168..21138.495 rows=2247 loops=1)
Buffers: shared read=2256
I/O Timings: read=21096.814
-> Bitmap Heap Scan on table_id4_r2_h5 (cost=41.26..7441.80 rows=2171 width=0) (actual time=197.167..21137.471 rows=2247 loops=1)
Recheck Cond: (id = 244730)
Heap Blocks: exact=2247
Buffers: shared read=2256
I/O Timings: read=21096.814
-> Bitmap Index Scan on table_id4_r2_h5_id_idx (cost=0.00..40.72 rows=2171 width=0) (actual time=117.586..117.586 rows=2247 loops=1)
Index Cond: (id = 244730)
Buffers: shared read=9
I/O Timings: read=116.929
Planning Time: 2.882 ms
Execution Time: 21141.449 ms
(17 rows)
我做了一个批量测试,在同一个循环中显示了相当不同的查询时间:
FOR idx IN SELECT (random()*total_IDs)::int AS id from generate_series (1,10)
LOOP ...
select count(*) from table_id4 where id=idx;
...
END LOOP;
NOTICE: id: 321158 count#: 2154, time: 46.734967s
NOTICE: id: 487596 count#: 2238, time: 0.968759s
NOTICE: id: 548334 count#: 2180, time: 1.062516s
NOTICE: id: 404978 count#: 2179, time: 29.750295s
NOTICE: id: 370904 count#: 2123, time: 22.203384s
NOTICE: id: 228857 count#: 2223, time: 29.094126s
NOTICE: id: 327134 count#: 2169, time: 24.750242s
NOTICE: id: 372101 count#: 2180, time: 28.062825s
NOTICE: id: 341814 count#: 2130, time: 30.250353s
NOTICE: id: 248316 count#: 2195, time: 32.375377s
但是如果我对相同的 id 重复查询,那么时间就会变得理想
ms
:
psql -c " ...
select count(*) from table_id4 where pt_id=321158;
select count(*) from table_id4 where pt_id=487596;
select count(*) from table_id4 where pt_id=548334;
select count(*) from table_id4 where pt_id=404978;
select count(*) from table_id4 where pt_id=370904;
select count(*) from table_id4 where pt_id=228857;
"
Time: 5267.168 ms (00:05.267)
Time: 171.925 ms
Time: 24.942 ms
Time: 11.387 ms
Time: 6.753 ms
Time: 17.573 ms
其他表在查询时的行为类似,这里是表的详细信息,
postgres=> \d+ table_id4
Unlogged table "table_id4"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-------------+----------+-----------+----------+---------+---------+--------------+-------------
date | date | | not null | | plain | |
field1 | real | | | | plain | |
field2 | real | | | | plain | |
field3 | smallint | | | | plain | |
id | integer | | not null | | plain | |
Partition key: RANGE (id)
Indexes:
"table_id4_date_idx" btree (date)
"table_id4_id_idx" btree (id)
Partitions: table_id4_r1 FOR VALUES FROM (0) TO (1...5), PARTITIONED,
table_id4_r10 FOR VALUES FROM (...93) TO (MAXVALUE), PARTITIONED,
table_id4_r2 FOR VALUES FROM (1...) TO (3...), PARTITIONED,
table_id4_r3 FOR VALUES FROM (3...) TO (4...), PARTITIONED,
table_id4_r4 FOR VALUES FROM (4...) TO (6...), PARTITIONED,
table_id4_r5 FOR VALUES FROM (6...) TO (7...), PARTITIONED,
table_id4_r6 FOR VALUES FROM (7...) TO (9...), PARTITIONED,
table_id4_r7 FOR VALUES FROM (9...) TO (1...1), PARTITIONED,
table_id4_r8 FOR VALUES FROM (1...1) TO (...32), PARTITIONED,
table_id4_r9 FOR VALUES FROM (1...2) TO (...93), PARTITIONED
我在here和here上看到一些类似的讨论,但我对数据库不是很熟悉而且那些表非常大,所以想了解更多关于
REINDEX
或VACUUM
等之前的问题,哪个可能需要几天才能完成。
[更新]:Azure门户上的psql服务器资源监视器显示MAX使用
[cpu,memory,storage]: 55%, 30%, <5%
。所以似乎资源不是问题?
一些服务器参数:
CPU: vCore 2
total memory: 4GB
storage: 1Tb
shared_buffers: 512MB
work_mem: 4MB (changed to 256MB but still not work)
max_parallel_workers: 10
max_parallel_maintenance_workers: 8
LOGGED
和ENABLE TRIGGER ALL
会有帮助吗?
任何建议表示赞赏!