postgresql 中单字全文搜索很慢

问题描述 投票:0回答:1

在 Postgres 中运行全文搜索时,搜索单个单词时性能非常慢,我不明白为什么。有人有想法吗?

在大型帖子表中,如果搜索至少两个单词(即使是重复的同一个单词,如“yorkie yorkie”),则对

title
字段进行全文搜索会快速执行:

explain analyze
SELECT * FROM "posts"
WHERE to_tsvector('simple', posts.title) @@ websearch_to_tsquery('simple', 'yorkie yorkie')
ORDER BY "posts"."created_at" DESC
LIMIT 100 OFFSET 0;

                                                                  QUERY PLAN                                                                  
----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=299.53..299.69 rows=63 width=712) (actual time=2.812..3.003 rows=56 loops=1)
   ->  Sort  (cost=299.53..299.69 rows=63 width=712) (actual time=2.797..2.887 rows=56 loops=1)
         Sort Key: created_at DESC
         Sort Method: quicksort  Memory: 61kB
         ->  Bitmap Heap Scan on posts  (cost=32.49..297.65 rows=63 width=712) (actual time=1.293..2.277 rows=56 loops=1)
               Recheck Cond: (to_tsvector('simple'::regconfig, (title)::text) @@ '''yorkie'' & ''yorkie'''::tsquery)
               Heap Blocks: exact=56
               ->  Bitmap Index Scan on posts_title_simple_gin  (cost=0.00..32.47 rows=63 width=0) (actual time=1.105..1.109 rows=56 loops=1)
                     Index Cond: (to_tsvector('simple'::regconfig, (title)::text) @@ '''yorkie'' & ''yorkie'''::tsquery)
 Planning Time: 2.324 ms
 Execution Time: 3.700 ms

但是,当仅搜索一个单词(本例中为“yorkie”)时,相同的全文搜索表现要差得多(4 个数量级):

explain analyze
SELECT * FROM "posts"
WHERE to_tsvector('simple', posts.title) @@ websearch_to_tsquery('simple', 'yorkie')
ORDER BY "posts"."created_at" DESC
LIMIT 100 OFFSET 0;
                                                                                    QUERY PLAN                                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1000.45..9190.19 rows=100 width=712) (actual time=2594.425..33396.395 rows=56 loops=1)
   ->  Gather Merge  (cost=1000.45..1032415.71 rows=12594 width=712) (actual time=2594.392..33396.242 rows=56 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Parallel Index Scan Backward using index_posts_on_created_at on posts  (cost=0.43..1029962.02 rows=5248 width=712) (actual time=1487.382..28371.694 rows=19 loops=3)
               Filter: (to_tsvector('simple'::regconfig, (title)::text) @@ '''yorkie'''::tsquery)
               Rows Removed by Filter: 839536
 Planning Time: 4.079 ms
 Execution Time: 33397.521 ms

第二个示例的性能较差,因为 Postgres 的查询规划器没有选择使用

posts_title_simple_gin
索引。目前,我们的解决方法是在运行 SQL 查询之前在单个单词搜索中复制文本,但这感觉就像随机地将我的手指插入黑匣子。

其他信息:

  • 上述任一查询中匹配了 56 行
  • posts
    表中有 2,518,663 行
  • 使用
    to_tsquery
    代替
    websearch_to_tsquery
  • 的性能类似
  • 使用
    order by
    子句删除了 GIN 索引 is,并且性能良好
  • PostgreSQL 12.9

posts
表的 DDL 的相关部分

\d posts
                                                 Table "public.posts"
            Column            |            Type             | Collation | Nullable |              Default              
------------------------------+-----------------------------+-----------+----------+-----------------------------------
 id                           | integer                     |           | not null | nextval('posts_id_seq'::regclass)
 title                        | character varying(255)      |           |          | 
 created_at                   | timestamp without time zone |           |          | 
Indexes:
    "posts_pkey" PRIMARY KEY, btree (id)
    "index_posts_on_created_at" btree (created_at)
    "posts_title_simple_gin" gin (to_tsvector('simple'::regconfig, title::text))

预先感谢您的任何见解!

编辑以添加:

pg_stats
created_at
title
列的数据:

SELECT * FROM pg_stats WHERE tablename = 'posts' and attname in ('title', 'created_at');
-[ RECORD 1 ]----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname             | public
tablename              | posts
attname                | created_at
inherited              | f
null_frac              | 0
avg_width              | 8
n_distinct             | -0.23620126
most_common_vals       | {"2010-01-13 05:00:00","2010-01-11 05:00:00","2009-01-07 05:00:00","2009-03-23 04:00:00","2010-12-02 05:00:00","2011-04-13 04:00:00","2008-03-10 04:00:00","2008-03-31 04:00:00","2008-06-06 04:00:00","2009-01-12 05:00:00","2010-01-12 05:00:00","2010-07-02 04:00:00","2010-07-26 04:00:00","2010-09-16 04:00:00","2010-10-28 04:00:00","2011-04-01 04:00:00","2011-04-20 04:00:00","2011-04-29 04:00:00","2011-05-04 04:00:00","2011-06-13 04:00:00"}
most_common_freqs      | {0.0003,0.00026666667,0.00023333334,0.0002,0.0002,0.0002,0.00016666666,0.00016666666,0.00016666666,0.00016666666,0.00016666666,0.00016666666,0.00016666666,0.00016666666,0.00016666666,0.00016666666,0.00016666666,0.00016666666,0.00016666666,0.00016666666}
histogram_bounds       | {"2006-05-16 04:00:00","2008-06-23 04:00:00","2009-05-15 04:00:00","2010-03-07 05:00:00","2010-10-17 15:11:18","2011-05-12 04:00:00","2011-10-17 01:08:01.85107","2012-03-04 18:58:37.880235","2012-06-15 03:18:26.697081","2012-09-12 02:07:25.459938","2012-12-10 15:54:54.177231","2013-03-06 21:33:47.627239","2013-05-22 01:38:36.233915","2013-07-28 15:42:51.805947","2013-09-27 13:41:51.771791","2013-11-29 17:06:30.986796","2014-01-22 18:24:05.921479","2014-03-27 10:18:51.343131","2014-05-19 18:14:17.215933","2014-07-07 23:41:22.083837","2014-08-27 01:27:44.550284","2014-10-11 13:24:07.971617","2014-11-22 13:16:15.480388","2015-01-19 21:05:34.840664","2015-03-11 12:15:05.140441","2015-04-27 08:50:48.268397","2015-06-01 17:19:55.511741","2015-07-11 01:08:00.294815","2015-08-24 10:51:48.556368","2015-09-29 02:29:34.326452","2015-11-04 02:16:23.921007","2015-12-14 12:07:01.934066","2016-02-01 12:53:50.167836","2016-03-22 21:07:13.023372","2016-05-02 21:57:00.568695","2016-06-05 22:03:03.22586","2016-07-16 18:15:23.975105","2016-08-26 18:20:47.420994","2016-09-29 18:52:35.151128","2016-11-10 19:21:21.215267","2016-12-30 22:48:24.774753","2017-02-17 00:42:43.375997","2017-04-06 23:34:01.46038","2017-05-18 02:45:03.851118","2017-06-23 17:23:58.861746","2017-08-03 22:18:44.854972","2017-09-09 15:31:56.005093","2017-10-21 14:10:49.289062","2017-12-01 11:17:54.746535","2018-01-22 13:06:07.030717","2018-03-12 21:21:20.332725","2018-04-25 12:44:39.755558","2018-05-30 21:59:47.546456","2018-07-03 11:37:17.739654","2018-08-08 22:56:05.021759","2018-09-17 01:01:31.195837","2018-10-18 23:24:33.892799","2018-11-22 01:44:30.286318","2019-01-12 19:27:44.349061","2019-02-28 12:19:46.484815","2019-04-08 13:13:07.474614","2019-05-10 16:08:46.529295","2019-06-08 20:17:43.860866","2019-07-11 10:23:33.400506","2019-08-17 14:32:06.42251","2019-09-24 13:04:51.522404","2019-10-25 23:37:12.696142","2019-12-04 15:09:20.724566","2020-01-23 18:33:27.442689","2020-03-04 15:30:56.986692","2020-04-06 14:41:41.159956","2020-05-07 21:53:29.576601","2020-06-03 15:03:54.28836","2020-06-30 20:11:09.078978","2020-07-29 16:22:40.323743","2020-08-23 23:15:29.63534","2020-09-21 18:23:02.667319","2020-10-16 17:16:41.520823","2020-11-16 14:45:19.574664","2020-12-19 19:57:49.525309","2021-01-31 17:23:54.266707","2021-03-06 21:19:47.904091","2021-04-09 18:16:52.062082","2021-05-14 09:40:17.087965","2021-06-09 17:04:11.950742","2021-07-13 18:25:19.01579","2021-08-17 15:48:40.341387","2021-09-19 15:38:57.157543","2021-10-20 21:32:48.745221","2021-11-22 14:40:46.110008","2021-12-30 22:07:43.551326","2022-02-12 18:58:35.742953","2022-03-26 16:41:17.374151","2022-04-28 16:59:51.371317","2022-05-27 18:06:27.401201","2022-06-28 10:36:37.161636","2022-07-27 20:09:36.662766","2022-08-30 18:43:07.650713","2022-09-30 17:52:38.115483","2022-10-31 10:33:42.586593","2022-11-28 23:37:37.832008"}
correlation            | 0.3581746
most_common_elems      | 
most_common_elem_freqs | 
elem_count_histogram   | 
-[ RECORD 2 ]----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname             | public
tablename              | posts
attname                | title
inherited              | f
null_frac              | 0
avg_width              | 30
n_distinct             | 160484
most_common_vals       | {"Another Neighbor Joins Forum","Items for Sale","7 Words","Seeking Lost Cat","Garage Sale","10 Words for 10 Years","Yard Sale","Thank You","House Cleaning","Moving Sale","Winter Tires","Furniture for Sale","Seeking Missing Cat","Thank You!","Apartment for Rent",Firewood,"Snow Tires","Tires for Sale","6 Words","Free Couch","Snow Tires for Sale","Winter Tires for Sale","Seeking Lost Dog","another neighbor joins forum","House for Sale","Moving Boxes","Snow Plowing",Electrician,"Free Items","For Sale","Found Cat - Yours?","Free Moving Boxes","Free Piano","Lost Cat","Room for Rent",Tires,"Dog Crate","Hay for Sale","Help Wanted","House for Rent","Painter Recommended","Plumber Recommended",Raffle,"Seven Words","6 Words for 2016 Raffle","Another Neighbor Joins the Forum","Canning Jars","Car for Sale",Fireworks,"Free Furniture","Free Stuff","Girl Scout Cookies","House Cleaner","Looking for Work","Lost Dog",Re,"School Budget","Seeking Rental","Six Words","Snow Removal","Tree Removal","10 Words","7 Words for 2015","Apt for Rent","Babysitter Available","Found Cat","FPF Response",Furniture,"Housing Needed","Kayak for Sale",Kittens,"Lawn Mower","Missing Cat","More Neighbors Join Forum","Pet Sitting",Plowing,Plumber,"Postings from other neighborhoods","Scrap Metal","Seeking Housing",Taxes}
most_common_freqs      | {0.059366666,0.0013666666,0.0010666667,0.001,0.00093333336,0.00083333335,0.00083333335,0.00076666666,0.0007,0.0007,0.00066666666,0.00063333334,0.00063333334,0.0006,0.00056666665,0.00056666665,0.00056666665,0.00053333334,0.00046666668,0.00043333333,0.00043333333,0.00043333333,0.0004,0.00036666667,0.00036666667,0.00036666667,0.00036666667,0.00033333333,0.00033333333,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.0003,0.00026666667,0.00026666667,0.00026666667,0.00026666667,0.00026666667,0.00026666667,0.00026666667,0.00026666667,0.00023333334,0.00023333334,0.00023333334,0.00023333334,0.00023333334,0.00023333334,0.00023333334,0.00023333334,0.00023333334,0.00023333334,0.00023333334,0.00023333334,0.00023333334,0.00023333334,0.00023333334,0.00023333334,0.00023333334,0.0002,0.0002,0.0002,0.0002,0.0002,0.0002,0.0002,0.0002,0.0002,0.0002,0.0002,0.0002,0.0002,0.0002,0.0002,0.0002,0.0002,0.0002,0.0002,0.0002}
histogram_bounds       | {"","2006 Hyundai Elantra for Sale","2 Child Chariot Ski, Bike, and Joggng Stroller","5 Baby Chicks Available Soon!","Advance Directive Workshop","A New Playground at Callahan","Apt/Small Home to Rent for Woman and Her Cat","Australian Ballot Informational Hearing on Feb. 8","Barn Boards","Benefit Family Climbing Event Oct. 18","Black Walnuts","B&P Printshop Looking for Used Small Jars w/Lids","Burlington - Walkable 1BR For Rent","Capital City Stampede 10K Saturday","Cedar Decking","Childcare in Late March","Clean Fill Needed","Community Food Social","Cordwood / Buzz Saw 3 Pt. Hitch","Curbside Food Waste Pickup! Servicing All of Rutland!","Delta 1440 Wood Turning Lathe $250","Dog Sitter","Dropped Shades Between Putney Mountain and Banning Rd","Electric Wall Oven 30” Stainless","Exercise Balls for Sale or Trade","Fat Phone Books Needed!","Flea Market and Bake Sale","Found Beagle W No Tags","Free Bosch Dishwasher","Free Full Size Mattress","Free Parakeet/Bird Cage","Free Toilet","Full Plates Food Boxes Available in Morristown June 22","Garmin Nuvi 2699LMTHD for Sale","Good Quality Hay?","Green vs Dunwright","Hardy Boys Series and Tom Swift Series","Herbal/ Mushroom Stroll Cancelled for Saturday","Horace Greeley Writer's Guild","H&R Block Personal Tax Preparation Software for Sale","Information Re: Foriegn National Moving to U.S.","Items For Sale","Jotul Woodstove for Sale","Kings Hill Association","Large Print Books","Lemonade Stand and Food Drive Monday!","Local Issues","Looking for Companionship  and Caregiver","Looking for Rental","Lost Cat Near O'Connor Drive & 116","Mahogany Dining Room Table","McDonald's Dinner Rush Challenge","Mini Trampoline, Weighted Hula Hoop, Solid Core Doors","More Free Items","Moving Bed and Box Spring to Storage","My Brother's Keeper Basketball Event","Need to Revoke King Size Mattress Before Feb. 16","Nice Loveseat","Office Cleaning Needed","Orchard Valley Puppet Show","Parish Players Presents an Evening of Sci Fi Short Stories!","Phone Found - Yours? ","Please Join Lincoln's Save Community School - Edit","Pre-K Storytime and Playgroup","Pullets or Young Layers","RCC Third Annual Sunday of Service","Re: Covid-19 Conscious Business","Re: Guilt by Association?","Re: Need Help Stacking Firewood, and Indoor Painting","Re: Shout Out About Ted's Kar Kare!","Re: Washer and Dryer",Roofing,"Sap Buckets Wanted","Seeking 2 Bedroom","Seeking chemically-safe housing","Seeking Fuel Company Recommendations","Seeking Kids Skis/Snowboard","Seeking Nanny","Seeking Ride from Montreal","Seeking Toddler Bed","Selling Excess Woodshop Tools/Equipment","Sick Raccoons","Snack Drive!","South Hero School Board News","Staymat for Driveway and Drainage Stone","Studio/Apartment, Private Living Space","Support Meaghan Emery for Balance and Good Governance","Temporary Library Closure",thefts,"Time To Pass It On? Donate Blue Graduation Gowns","Tractor Chains","Turn Your Lawn Into Eden! Edible Landscaping Presentation Jan. 29","Upcoming Events at Earthwalk Vermont","Vegetable Starts","Vinyl Record Collection for Sale","Wallingford Rotary's Annual Fishing Clinic and Derby","Weed Whacker Repair","Why It's Time to #Boycottboves","Women's Depends Looking for Home","Yard Clean Up + Maintenance","Zumbathon March 8"}
correlation            | 0.013592109
most_common_elems      | 
most_common_elem_freqs | 
elem_count_histogram   | 
postgresql full-text-search
1个回答
0
投票

此性能问题是因为搜索单个单词时,查询可能最终会扫描表的很大一部分。

您是否尝试使用全文搜索索引为

title
列创建索引。使用 tsvector 数据类型作为索引列,使用 tsquery 类型作为搜索查询。

还搜索三元组索引

© www.soinside.com 2019 - 2024. All rights reserved.