PostgreSQL与MS SQL相比,Where和Group的速度极慢

问题描述 投票:0回答:1

在尝试解决PostgreSQL中数据库性能问题的五天后,我决定向您寻求帮助!一周前,我们决定尝试将具有6000万条记录的数据库从MSSQL迁移到PostgreSQL,而下面的SQL在PostgreSQL上的运行速度非常慢。

set random_page_cost=1;
set seq_page_cost=5;
set enable_seqscan=on;
set work_mem = '100MB';

SELECT 
       DATE("DateStamp"), "Result", Count(*), Sum("ConversionCost")
FROM 
     "Log"
WHERE 
      "UserId" = 7841 AND "DateStamp" > '2019-01-01' AND "DateStamp" < '2020-02-26'
GROUP BY 
         1,2

执行计划

Finalize GroupAggregate  (cost=1332160.59..1726394.02 rows=3093547 width=21) (actual time=2929.936..3157.049 rows=714 loops=1) "  Output: (date(""DateStamp"")), ""Result"", count(*), sum(""ConversionCost"")" "  Group Key: (date(""Log"".""DateStamp"")), ""Log"".""Result"""   Buffers: shared hit=2292 read=345810   ->  Gather Merge  (cost=1332160.59..1661945.12 rows=2577956 width=21) (actual time=2929.783..3156.616 rows=2037 loops=1) "        Output: (date(""DateStamp"")), ""Result"", (PARTIAL count(*)), (PARTIAL sum(""ConversionCost""))"
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=6172 read=857125
        ->  Partial GroupAggregate  (cost=1331160.56..1363385.01 rows=1288978 width=21) (actual time=2906.450..3089.056 rows=679 loops=3) "              Output: (date(""DateStamp"")), ""Result"", PARTIAL count(*), PARTIAL sum(""ConversionCost"")" "              Group Key: (date(""Log"".""DateStamp"")), ""Log"".""Result"""
              Buffers: shared hit=6172 read=857125
              Worker 0: actual time=2895.531..3058.852 rows=675 loops=1
                Buffers: shared hit=1930 read=255687
              Worker 1: actual time=2894.513..3052.916 rows=673 loops=1
                Buffers: shared hit=1950 read=255628
              ->  Sort  (cost=1331160.56..1334383.01 rows=1288978 width=9) (actual time=2906.435..2968.562 rows=1064916 loops=3) "       Output: (date(""DateStamp"")), ""Result"", ""ConversionCost""" "       Sort Key: (date(""Log"".""DateStamp"")), ""Log"".""Result"""
                    Sort Method: quicksort  Memory: 94807kB
                    Worker 0:  Sort Method: quicksort  Memory: 69171kB
                    Worker 1:  Sort Method: quicksort  Memory: 69063kB
                    Buffers: shared hit=6172 read=857125
                    Worker 0: actual time=2895.518..2951.406 rows=951356 loops=1
                      Buffers: shared hit=1930 read=255687
                    Worker 1: actual time=2894.494..2947.892 rows=949038 loops=1
                      Buffers: shared hit=1950 read=255628
                    ->  Parallel Index Scan using "IX_Log_UserId" on public."Log"  (cost=0.56..1200343.50 rows=1288978 width=9) (actual time=0.087..2634.603 rows=1064916 loops=3) "                          Output: date(""DateStamp""), ""Result"", ""ConversionCost"""
                          Index Cond: ("Log"."UserId" = 7841)
                          Filter: (("Log"."DateStamp" > '2019-01-01 00:00:00'::timestamp without time zone) AND ("Log"."DateStamp" < '2020-02-26 00:00:00'::timestamp without time zone))
                          Buffers: shared hit=6144 read=857123
                          Worker 0: actual time=0.077..2653.065 rows=951356 loops=1
                            Buffers: shared hit=1917 read=255685
                          Worker 1: actual time=0.107..2654.640 rows=949038 loops=1
                            Buffers: shared hit=1935 read=255628 Planning Time: 0.330 ms Execution Time: 3163.850 ms

执行计划URL https://explain.depesz.com/s/zLNI

MSSQL上相同的SQL花费不到2秒,而PostgreSQL花费甚至10秒。日志表包含约6000万条记录和"UserId" = 7841 AND "DateStamp" > '2019-01-01' AND "DateStamp" < '2020-02-26' where子句过滤约3M条记录。

表结构如下

create table "Log"
(
    "Id"                    integer generated by default as identity
        constraint "PK_Log"
            primary key,
    "Result"                boolean   not null,
    "DateStamp"             timestamp not null,
    "ConversionCost"        integer   not null,
    "UserId"                integer   not null
        constraint "FK_Log_User_UserId"
            references "User"
            on delete cascade,
);


create index "IX_Log_ConversionCost"
    on "Log" ("ConversionCost");

create index "IX_Log_DateStamp"
    on "Log" ("DateStamp");

create index "IX_Log_Result"
    on "Log" ("Result");

create index "IX_Log_UserId"
    on "Log" ("UserId");

PostgreSQL服务器为6CPU和16GB的ram服务器,而我们的旧MSSQL 2CPU和8GB的RAM相比,您会发现PostgreSQL具有更多的计算资源,但性能却差得多。两台服务器均具有SSD。

也许问题是PostgreSQL在性能上不像MS SQL那样先进,在这里什么也不能做?

sql postgresql query-performance postgresql-11
1个回答
1
投票

您可以将查询改写为:

SELECT 
  DATE("DateStamp"), "Result", Count(*), Sum("ConversionCost")
FROM "Log"
WHERE "UserId" = 7841 
  AND "DateStamp" >= '2019-01-02' 
  AND "DateStamp" < '2020-02-26'
GROUP BY 1,2

然后,该查询将大大受益于索引:

create index "IX_Log_UserId" on "Log" ("UserId", "DateStamp"));

为了获得更高的性能,您可以创建覆盖索引

create index "IX_Log_UserId" on "Log" (
  "UserId",
  "DateStamp",
  "Result",
  "ConversionCost"
);
© www.soinside.com 2019 - 2024. All rights reserved.