Postgres COPY FROM是否应该更新BRIN索引？

Question

想象一下这样一个表...

create table study_value (
    id serial primary key,
    study_id int not null references study (id),
    category text not null,
    subcategory int not null,
    p_value double precision not null
);

我知道它将有2500多万行，而且它们需要通过父研究以及可选的类别和子类别来快速查询，所以我选择在其中添加一个BRIN。

create index study_value_idx
    on study_value using brin (study_id, category, subcategory);

一个给定研究的所有数据(1mil+行)是通过缓冲区批量插入的(按类别和子类别排序)。

    copy study_value from stdin with (format csv, header false);

这个研究数据是按照研究id的顺序依次上传的，所以插入顺序完全尊重BRIN列顺序。

我看到的问题是 是在BRIN满足的条件下查询这个表，例如。select count(*) from study_value where study_id = 3;，就是进行一次全面扫描，需要30多秒。BRIN本身的大小是48 kb。

如果我 reindex index study_value_idx然而，现在查询需要大约100毫秒，而且索引大小超过100 kb。

我读到的所有内容(在PG文档，SO等)都表明，一个人应该只是在非常特殊的情况下需要重新索引（例如数据损坏或索引无法建立）。

我不需要在加载数据之前放弃索引，之后再重新创建索引，因为将100万条记录复制到表中只需要10秒钟。

我是不是做错了什么？有没有更好的方法？

编辑一下。

我忘了说，在运行reindex之前，我运行了 analyze study_value 并没有看到变化。

Answer 1

是的，我的错误。我需要 VACUUM ANALYZE 根据@a_horse_withno_name的评论。

我重新创建了这个表，并重新导入了数据。在重新加载时，索引大小又变成了48 kb，查询时间回到了30秒左右。我看错了 查询计划，虽然 - 它是否使用索引，实际的行数与预期相差甚远。

Aggregate  (cost=231550.86..231550.87 rows=1 width=8) (actual time=32233.141..32233.156 rows=1 loops=1)
->  Bitmap Heap Scan on study_value  (cost=6226.26..229546.26 rows=801840 width=0) (actual time=6555.954..27253.035 rows=781580 loops=1)
     Recheck Cond: (study_id = 920)
     Rows Removed by Index Recheck: 22027434
     Heap Blocks: lossy=213169
     ->  Bitmap Index Scan on study_value_idx  (cost=0.00..6025.80 rows=801840 width=0) (actual time=16.345..16.352 rows=2132480 loops=1)
           Index Cond: (study_id = 920)
Planning time: 0.941 ms
Execution time: 32233.266 ms

在 analyze study_value (3秒)idx仍为48kb，查询计划为。

Aggregate  (cost=231360.49..231360.50 rows=1 width=8) (actual time=25468.247..25468.259 rows=1 loops=1)
->  Bitmap Heap Scan on study_value  (cost=6161.41..229376.81 rows=793472 width=0) (actual time=2740.866..20419.470 rows=781580 loops=1)
     Recheck Cond: (study_id = 920)
     Rows Removed by Index Recheck: 22027434
     Heap Blocks: lossy=213169
     ->  Bitmap Index Scan on study_value_idx  (cost=0.00..5963.04 rows=793472 width=0) (actual time=17.301..17.306 rows=2132480 loops=1)
           Index Cond: (study_id = 920)
Planning time: 0.101 ms
Execution time: 25468.389 ms

之后 vacuum analyze study_value (20秒)现在的idx是112kb，查询计划是。

Aggregate  (cost=231496.34..231496.35 rows=1 width=8) (actual time=10038.873..10038.884 rows=1 loops=1)
->  Bitmap Heap Scan on study_value  (cost=6228.78..229501.25 rows=798037 width=0) (actual time=12.303..5133.281 rows=781580 loops=1)
     Recheck Cond: (study_id = 920)
     Rows Removed by Index Recheck: 17962
     Heap Blocks: lossy=7473
     ->  Bitmap Index Scan on study_value_idx  (cost=0.00..6029.27 rows=798037 width=0) (actual time=1.644..1.650 rows=75520 loops=1)
           Index Cond: (study_id = 920)
Planning time: 0.511 ms
Execution time: 10038.993 ms

执行一个更详细的查询（即包括categoryssubcategory）要快得多，可能约400毫秒。

Postgres COPY FROM是否应该更新BRIN索引？

问题描述投票：0回答：1

1个回答

最新问题

Postgres COPY FROM是否应该更新BRIN索引？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1