我正在努力优化具有约21M条记录的简单表上的查询。该表的两个主要列是node_ip_addr
和nbr_ip_addr
,它们都是inet
类型,用于存储邻居信息数据。因此,如果A是B的邻居,则表中可以包含以下2个条目:
A -> B
B -> A
下面是表DDL:
CREATE TABLE tbl_relation (
id serial NOT NULL,
node_ip_addr inet NULL,
nbr_ip_addr inet NULL,
);
表上的索引:
idx_tbl_relation_id CREATE INDEX idx_tbl_relation_id ON tbl_relation USING btree (id)
idx_tbl_relation_node_ip_addr_gist CREATE INDEX idx_tbl_relation_node_ip_addr_gist ON tbl_relation USING gist (node_ip_addr inet_ops)
idx_tbl_relation_nbr_ip_addr_gist CREATE INDEX idx_tbl_relation_nbr_ip_addr_gist ON tbl_relation USING gist (nbr_ip_addr inet_ops)
注意已经在桌子上尝试过抽真空:
vacuum analyze tbl_relation;
以下是要优化的查询:
explain (analyze,buffers) SELECT * FROM tbl_relation WHERE (node_ip_addr = '10.14.221.167' OR nbr_ip_addr = '10.14.221.167') AND (node_ip_addr = '10.14.9.185' OR nbr_ip_addr = '10.14.9.185');
Bitmap Heap Scan on tbl_relation (cost=459.24..463.26 rows=1 width=71) (actual time=142.336..142.336 rows=0 loops=1)
Recheck Cond: (((node_ip_addr = '10.14.221.167'::inet) OR (nbr_ip_addr = '10.14.221.167'::inet)) AND ((node_ip_addr = '10.14.9.185'::inet) OR (nbr_ip_addr = '10.14.9.185'::inet)))
Buffers: shared hit=13789
-> BitmapAnd (cost=459.24..459.24 rows=1 width=0) (actual time=142.332..142.332 rows=0 loops=1)
Buffers: shared hit=13789
-> BitmapOr (cost=33.05..33.05 rows=1095 width=0) (actual time=70.667..70.667 rows=0 loops=1)
Buffers: shared hit=6894
-> Bitmap Index Scan on idx_tbl_relation_node_ip_addr_gist (cost=0.00..11.30 rows=385 width=0) (actual time=44.895..44.895 rows=10 loops=1)
Index Cond: (node_ip_addr = '10.14.221.167'::inet)
Buffers: shared hit=4256
-> Bitmap Index Scan on idx_tbl_relation_nbr_ip_addr_gist (cost=0.00..21.74 rows=710 width=0) (actual time=25.767..25.767 rows=3 loops=1)
Index Cond: (nbr_ip_addr = '10.14.221.167'::inet)
Buffers: shared hit=2638
-> BitmapOr (cost=425.94..425.94 rows=16147 width=0) (actual time=71.651..71.651 rows=0 loops=1)
Buffers: shared hit=6895
-> Bitmap Index Scan on idx_tbl_relation_node_ip_addr_gist (cost=0.00..404.19 rows=15437 width=0) (actual time=45.983..45.983 rows=15831 loops=1)
Index Cond: (node_ip_addr = '10.14.9.185'::inet)
Buffers: shared hit=4262
-> Bitmap Index Scan on idx_tbl_relation_nbr_ip_addr_gist (cost=0.00..21.74 rows=710 width=0) (actual time=25.662..25.662 rows=0 loops=1)
Index Cond: (nbr_ip_addr = '10.14.9.185'::inet)
Buffers: shared hit=2633
Planning Time: 0.159 ms
Execution Time: 142.461 ms
有关数据的一些信息:
select count(*) from tbl_relation;
-- 21,058,705
select nbr_ip_addr , count(*) from tbl_relation group by nbr_ip_addr order by count(*) desc;
10.81.255.11 76788
10.72.0.202 50299
10.72.9.75 40949
10.72.65.150 38533
10.64.1.176 37262
10.72.65.146 33601
10.72.73.40 33566
.
.
.
select node_ip_addr , count(*) from tbl_relation group by node_ip_addr order by count(*) desc;
10.72.9.75 39310
10.72.0.202 34655
10.81.255.11 25730
10.64.1.176 18443
10.109.64.25 17206
10.72.65.150 16006
10.14.9.185 15831
.
.
.
.
它是8核32GB RAM机器,所有这些都可用于postgres。
Postgres版本:
PostgreSQL 11.6 (Ubuntu 11.6-1.pgdg18.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, 64-bit
以下是postgres设置:
maintenance_work_mem 65536 kB
work_mem 409600 kB
shared_buffers 393216 8kB
commit_delay 100000
max_wal_size 10240 MB
min_wal_size 1024 MB
effective_io_concurrency 8
select pg_size_pretty (pg_relation_size('tbl_relation'));
-- 1834 MB
使用给定的表大小和配置,这是我们能得到的最好的吗?我们还可以尝试其他索引组合吗? postgres中有任何设置吗?还是任何其他查询方式?任何帮助,将不胜感激!预先感谢!
SELECT version();
CREATE TABLE tbl_relation (
node_ip_addr inet NOT NULL -- <<-- NOT NULL
, nbr_ip_addr inet NOT NULL -- <<-- (NULL keyvalues in a junction table make no sense)
, PRIMARY KEY( node_ip_addr, nbr_ip_addr) -- <<-- will imply an index
, UNIQUE (nbr_ip_addr, node_ip_addr) -- <<-- will imply an index
);
ANALYZE tbl_relation;
EXPLAIN
SELECT * FROM tbl_relation
WHERE node_ip_addr IN ('10.14.221.167' , '10.14.9.185')
AND nbr_ip_addr IN ('10.14.221.167' , '10.14.9.185')
;
结果:(无数据,但可能会保留单个索引扫描)
DROP SCHEMA
CREATE SCHEMA
SET
version
----------------------------------------------------------------------------------------------------------
PostgreSQL 11.6 on armv7l-unknown-linux-gnueabihf, compiled by gcc (Raspbian 8.3.0-6+rpi1) 8.3.0, 32-bit
(1 row)
CREATE TABLE
ANALYZE
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using tbl_relation_nbr_ip_addr_node_ip_addr_key on tbl_relation (cost=0.15..2.85 rows=1 width=64)
Index Cond: ((nbr_ip_addr = ANY ('{10.14.221.167,10.14.9.185}'::inet[])) AND (node_ip_addr = ANY ('{10.14.221.167,10.14.9.185}'::inet[])))
(2 rows)
Extra:您可以尝试使用CLUSTER
进一步优化,这将使记录或多或少地保持排序,但是这需要定期维护(重新整理),尤其是当表内容经常更改时:
CLUSTER tbl_relation USING tbl_relation_pkey;
-- Or:
-- CLUSTER tbl_relation USING tbl_relation_nbr_ip_addr_node_ip_addr_key;
您可以尝试这种方法:
SELECT *
FROM tbl_relation r
WHERE node_ip_addr = '10.14.221.167'::inet AND nbr_ip_addr IN ('10.14.221.167'::inet, '10.14.9.185'::inet)
UNION ALL
SELECT *
FROM tbl_relation r
WHERE node_ip_addr = '10.14.9.185'::inet AND nbr_ip_addr IN ('10.14.221.167'::inet, '10.14.9.185'::inet)
然后我将尝试在tbl_relation(node_ip_addr, nbr_ip_addr)
上使用标准索引。
Postgres尚不支持索引的“跳过扫描”。这应该变成两个直接索引查找。如果这符合您对性能的要求,则可能会有其他方法来获得类似的计划。