进一步优化〜20M记录表的'inet'列上的“位图索引扫描”的可能性

问题描述 投票:1回答:2

我正在努力优化具有约21M条记录的简单表上的查询。该表的两个主要列是node_ip_addrnbr_ip_addr,它们都是inet类型,用于存储邻居信息数据。因此,如果A是B的邻居,则表中可以包含以下2个条目:

A -> B
B -> A

下面是表DDL:

CREATE TABLE tbl_relation (
    id serial NOT NULL,
    node_ip_addr inet NULL,
    nbr_ip_addr inet NULL,
);

表上的索引:

idx_tbl_relation_id                     CREATE INDEX idx_tbl_relation_id ON tbl_relation USING btree (id)
idx_tbl_relation_node_ip_addr_gist      CREATE INDEX idx_tbl_relation_node_ip_addr_gist ON tbl_relation USING gist (node_ip_addr inet_ops)
idx_tbl_relation_nbr_ip_addr_gist       CREATE INDEX idx_tbl_relation_nbr_ip_addr_gist ON tbl_relation USING gist (nbr_ip_addr inet_ops)

注意已经在桌子上尝试过抽真空:

vacuum analyze tbl_relation;

以下是要优化的查询:

explain (analyze,buffers) SELECT * FROM  tbl_relation WHERE (node_ip_addr = '10.14.221.167' OR nbr_ip_addr = '10.14.221.167') AND (node_ip_addr = '10.14.9.185'   OR nbr_ip_addr = '10.14.9.185');
Bitmap Heap Scan on tbl_relation  (cost=459.24..463.26 rows=1 width=71) (actual time=142.336..142.336 rows=0 loops=1)
      Recheck Cond: (((node_ip_addr = '10.14.221.167'::inet) OR (nbr_ip_addr = '10.14.221.167'::inet)) AND ((node_ip_addr = '10.14.9.185'::inet) OR (nbr_ip_addr = '10.14.9.185'::inet)))
      Buffers: shared hit=13789
      ->  BitmapAnd  (cost=459.24..459.24 rows=1 width=0) (actual time=142.332..142.332 rows=0 loops=1)
            Buffers: shared hit=13789
            ->  BitmapOr  (cost=33.05..33.05 rows=1095 width=0) (actual time=70.667..70.667 rows=0 loops=1)
                  Buffers: shared hit=6894
                  ->  Bitmap Index Scan on idx_tbl_relation_node_ip_addr_gist  (cost=0.00..11.30 rows=385 width=0) (actual time=44.895..44.895 rows=10 loops=1)
                        Index Cond: (node_ip_addr = '10.14.221.167'::inet)
                        Buffers: shared hit=4256
                  ->  Bitmap Index Scan on idx_tbl_relation_nbr_ip_addr_gist  (cost=0.00..21.74 rows=710 width=0) (actual time=25.767..25.767 rows=3 loops=1)
                        Index Cond: (nbr_ip_addr = '10.14.221.167'::inet)
                        Buffers: shared hit=2638
            ->  BitmapOr  (cost=425.94..425.94 rows=16147 width=0) (actual time=71.651..71.651 rows=0 loops=1)
                  Buffers: shared hit=6895
                  ->  Bitmap Index Scan on idx_tbl_relation_node_ip_addr_gist  (cost=0.00..404.19 rows=15437 width=0) (actual time=45.983..45.983 rows=15831 loops=1)
                        Index Cond: (node_ip_addr = '10.14.9.185'::inet)
                        Buffers: shared hit=4262
                  ->  Bitmap Index Scan on idx_tbl_relation_nbr_ip_addr_gist  (cost=0.00..21.74 rows=710 width=0) (actual time=25.662..25.662 rows=0 loops=1)
                        Index Cond: (nbr_ip_addr = '10.14.9.185'::inet)
                        Buffers: shared hit=2633
    Planning Time: 0.159 ms
    Execution Time: 142.461 ms

有关数据的一些信息:

select count(*) from tbl_relation;
-- 21,058,705
select nbr_ip_addr , count(*) from tbl_relation group by nbr_ip_addr order by count(*) desc;
10.81.255.11    76788
10.72.0.202     50299
10.72.9.75      40949
10.72.65.150    38533
10.64.1.176     37262
10.72.65.146    33601
10.72.73.40     33566
.
.
.
select node_ip_addr , count(*) from tbl_relation group by node_ip_addr order by count(*) desc;
10.72.9.75      39310
10.72.0.202     34655
10.81.255.11    25730
10.64.1.176     18443
10.109.64.25    17206
10.72.65.150    16006
10.14.9.185     15831
.
.
.
.

它是8核32GB RAM机器,所有这些都可用于postgres。

Postgres版本:

PostgreSQL 11.6 (Ubuntu 11.6-1.pgdg18.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, 64-bit

以下是postgres设置:

maintenance_work_mem        65536   kB
work_mem                    409600  kB
shared_buffers              393216  8kB
commit_delay                100000
max_wal_size                10240   MB
min_wal_size                1024    MB
effective_io_concurrency    8
select pg_size_pretty (pg_relation_size('tbl_relation'));
-- 1834 MB

使用给定的表大小和配置,这是我们能得到的最好的吗?我们还可以尝试其他索引组合吗? postgres中有任何设置吗?还是任何其他查询方式?任何帮助,将不胜感激!预先感谢!

sql postgresql indexing query-optimization
2个回答
2
投票
SELECT version();

CREATE TABLE tbl_relation (
    node_ip_addr inet NOT NULL  -- <<-- NOT NULL
    , nbr_ip_addr inet NOT NULL -- <<-- (NULL keyvalues in a junction table make no sense)
        , PRIMARY KEY( node_ip_addr, nbr_ip_addr)       -- <<-- will imply an index 
        , UNIQUE (nbr_ip_addr, node_ip_addr)            -- <<-- will imply an index 
);

ANALYZE tbl_relation;

EXPLAIN
SELECT * FROM tbl_relation
WHERE node_ip_addr IN ('10.14.221.167' , '10.14.9.185')
AND nbr_ip_addr IN ('10.14.221.167' , '10.14.9.185')
        ;

结果:(无数据,但可能会保留单个索引扫描)


DROP SCHEMA
CREATE SCHEMA
SET
                                                 version                                                  
----------------------------------------------------------------------------------------------------------
 PostgreSQL 11.6 on armv7l-unknown-linux-gnueabihf, compiled by gcc (Raspbian 8.3.0-6+rpi1) 8.3.0, 32-bit
(1 row)

CREATE TABLE
ANALYZE
                                                                  QUERY PLAN                                                                  
----------------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using tbl_relation_nbr_ip_addr_node_ip_addr_key on tbl_relation  (cost=0.15..2.85 rows=1 width=64)
   Index Cond: ((nbr_ip_addr = ANY ('{10.14.221.167,10.14.9.185}'::inet[])) AND (node_ip_addr = ANY ('{10.14.221.167,10.14.9.185}'::inet[])))
(2 rows)

Extra:您可以尝试使用CLUSTER进一步优化,这将使记录或多或少地保持排序,但是这需要定期维护(重新整理),尤其是当表内容经常更改时:

CLUSTER tbl_relation USING tbl_relation_pkey;
-- Or:
-- CLUSTER tbl_relation USING tbl_relation_nbr_ip_addr_node_ip_addr_key;

1
投票

您可以尝试这种方法:

SELECT *
FROM tbl_relation r
WHERE node_ip_addr = '10.14.221.167'::inet AND nbr_ip_addr IN ('10.14.221.167'::inet, '10.14.9.185'::inet)
UNION ALL
SELECT *
FROM tbl_relation r
WHERE node_ip_addr = '10.14.9.185'::inet AND nbr_ip_addr IN ('10.14.221.167'::inet, '10.14.9.185'::inet)

然后我将尝试在tbl_relation(node_ip_addr, nbr_ip_addr)上使用标准索引。

Postgres尚不支持索引的“跳过扫描”。这应该变成两个直接索引查找。如果这符合您对性能的要求,则可能会有其他方法来获得类似的计划。

© www.soinside.com 2019 - 2024. All rights reserved.