如何提高Postgres DB数据检索效率

Question

我试图在 postgres 中检索表的某些子集，但遇到了查询效率的问题。

我正在检索数据的表（table1）由 1.45 亿行和 25 列组成。用于检索数据子集的主要列是 id、name、first 和 last 列（其中 first 和 last 是整数年值。

当前的查询（如下所示）大约需要 20-30 分钟来运行和检索源和目标数据子集，但我希望能够尽可能减少时间。

为了提高这些查询的效率，我已经尝试了很多事情，但似乎都没有提高它们的效率。查询 1 大约需要 2 分钟，查询 2 大约需要 3-8 分钟，查询 3 大约需要 15-25 分钟。尝试包括：

将 id 和 fullanmes 分离到不同的物化视图中，以确保没有重复项。
从查询 2 和 3 中删除不同的子句

我认为主要的困难是由于表 1 中观察到的许多重复项不可避免且必须保留。因此，对查询 2 和 3 运行解释观察到 unique、sort 和 gather 的大量成本。他们似乎也从不在这些查询中使用索引。非常感谢任何关于如何提高查询 2 和 3 的效率/结构的指示。如果有任何其他问题，我会尽力回答。

提前致谢。

    -- Query 1 - Creating Materialised View
    DROP MATERIALIZED VIEW IF EXISTS MMgB_1;
    
    CREATE MATERIALIZED VIEW MMgB_1 AS 
    SELECT distinct a.id, a.fullname, a.first
    FROM table1 a 
        LEFT JOIN  groups b 
        on (COALESCE(substr(a.forename, 1,1),'') = COALESCE(b.f_initial, '') and COALESCE(substr(a.surname, 1,1),'') = COALESCE(b.s_initial, '') and a.first = b.first_year)
        WHERE b.group = 1; 
    
    CREATE INDEX ON jt_lcrtest.MMgB_1 (id);
    CREATE INDEX ON jt_lcrtest.MMgB_1 (fullname);
    CREATE INDEX ON jt_lcrtest.MMgB_1 (first);
    
    -- Query 2 - Getting Origin Subset
     SELECT DISTINCT a.* FROM table1 a 
        INNER JOIN (SELECT DISTINCT id, first FROM MMgB_1) b 
        on a.id = b.id
        WHERE a.first <= (b.first+1) OR a.first >= (b.first-1);
    
    -- Query 3 - Getting Destination Subset 
    SELECT DISTINCT z.id as d_id, z.pcd as d_pcd, z.forename as d_forename, z.surname as d_surname, z.first as d_first, z.last as d_last, z.eastings as d_eastings, z.northings as d_northings, z.rn as d_rn
            FROM table1 z 
            INNER JOIN (
                SELECT DISTINCT a.id, b.first
                FROM table1 a
                INNER JOIN (SELECT DISTINCT fullname, first FROM MMgB_1) b
                ON (a.fullname = b.fullname)) x
            ON z.id = x.id
            WHERE z.last <= (x.first+1) and z.last >= (x.first-3);

查询3的查询计划如下：

"Unique  (cost=233871968.33..244234487.69 rows=143516672 width=74) (actual time=909563.935..910207.026 rows=1130680 loops=1)"
"  Buffers: shared hit=10742190 read=22436208, temp read=10707753 written=10713847"
"  ->  Sort  (cost=233871968.33..234814015.54 rows=376818886 width=74) (actual time=909563.933..909957.546 rows=1130680 loops=1)"
"        Sort Key: z.id, z.pcd, z.forename, z.surname, z.first, z.last, z.eastings, z.northings, z.rn"
"        Sort Method: external merge  Disk: 99472kB"
"        Buffers: shared hit=10742190 read=22436208, temp read=10707753 written=10713847"
"        ->  Gather  (cost=39807897.35..113221768.36 rows=376818886 width=74) (actual time=844994.248..906225.121 rows=1130680 loops=1)"
"              Workers Planned: 2"
"              Workers Launched: 2"
"              Buffers: shared hit=10742184 read=22436208, temp read=10684975 written=10691023"
"              ->  Merge Join  (cost=39806897.35..75538879.76 rows=157007869 width=74) (actual time=844988.880..905816.659 rows=376893 loops=3)"
"                    Merge Cond: (z.id = a.id)"
"                    Join Filter: ((z.last <= (b.first + 1)) AND (z.last >= (b.first - 3)))"
"                    Rows Removed by Join Filter: 3086353"
"                    Buffers: shared hit=10742184 read=22436208, temp read=10684975 written=10691023"
"                    ->  Sort  (cost=24584683.47..24734180.01 rows=59798613 width=74) (actual time=442706.553..492701.960 rows=47784204 loops=3)"
"                          Sort Key: z.id"
"                          Sort Method: external merge  Disk: 4216944kB"
"                          Worker 0:  Sort Method: external merge  Disk: 4242552kB"
"                          Worker 1:  Sort Method: external merge  Disk: 4203736kB"
"                          Buffers: shared hit=10 read=8291416, temp read=7148113 written=7154071"
"                          ->  Parallel Seq Scan on table1 z  (cost=0.00..8889402.13 rows=59798613 width=74) (actual time=1.635..146825.594 rows=47784263 loops=3)"
"                                Buffers: shared read=8291416"
"                    ->  Materialize  (cost=15222213.88..15343607.04 rows=6069658 width=16) (actual time=402282.178..404494.609 rows=4107624 loops=3)"
"                          Buffers: shared hit=10742174 read=14144792, temp read=3536862 written=3536952"
"                          ->  Unique  (cost=15222213.88..15267736.31 rows=6069658 width=16) (actual time=402282.168..404197.446 rows=1251970 loops=3)"
"                                Buffers: shared hit=10742174 read=14144792, temp read=3536862 written=3536952"
"                                ->  Sort  (cost=15222213.88..15237388.02 rows=6069658 width=16) (actual time=402282.165..403893.050 rows=1388044 loops=3)"
"                                      Sort Key: a.id, b.first"
"                                      Sort Method: external merge  Disk: 38280kB"
"                                      Worker 0:  Sort Method: external merge  Disk: 38280kB"
"                                      Worker 1:  Sort Method: external merge  Disk: 38280kB"
"                                      Buffers: shared hit=10742174 read=14144792, temp read=3536862 written=3536952"
"                                      ->  Hash Join  (cost=12514199.34..14330904.28 rows=6069658 width=16) (actual time=306633.836..397293.362 rows=1388044 loops=3)"
"                                            Hash Cond: (b.fullname = a.fullname)"
"                                            Buffers: shared hit=10742160 read=14144792, temp read=3514146 written=3514146"
"                                            ->  Subquery Scan on b  (cost=12587.22..12957.18 rows=18498 width=18) (actual time=432.249..470.094 rows=89294 loops=3)"
"                                                  Buffers: shared hit=96 read=12576"
"                                                  ->  HashAggregate  (cost=12587.22..12772.20 rows=18498 width=18) (actual time=432.247..457.108 rows=89294 loops=3)"
"                                                        Group Key: mmgb_1.fullname, mmgb_1.first"
"                                                        Buffers: shared hit=96 read=12576"
"                                                        ->  Seq Scan on mmgb_1  (cost=0.00..9799.48 rows=557548 width=18) (actual time=21.418..252.257 rows=557548 loops=3)"
"                                                              Buffers: shared hit=96 read=12576"
"                                            ->  Hash  (cost=9726582.72..9726582.72 rows=143516672 width=26) (actual time=306035.037..306035.038 rows=143352788 loops=3)"
"                                                  Buckets: 65536 (originally 65536)  Batches: 8192 (originally 4096)  Memory Usage: 3585kB"
"                                                  Buffers: shared hit=10742032 read=14132216, temp written=2455650"
"                                                  ->  Seq Scan on table1 a  (cost=0.00..9726582.72 rows=143516672 width=26) (actual time=1.108..233726.194 rows=143352788 loops=3)"
"                                                        Buffers: shared hit=10742032 read=14132216"
"Planning Time: 22.971 ms"
"Execution Time: 910271.228 ms"

如何提高Postgres DB数据检索效率

问题描述投票：0回答：0

最新问题

如何提高Postgres DB数据检索效率

问题描述 投票：0回答：0

最新问题

问题描述投票：0回答：0