如何让 ts_headline 尊重phraseto_tsquery

Question

我有一个使用短语搜索来匹配整个短语的查询。

SELECT ts_headline(
  'simple',
  'This is my test text. My test text has many words. Well, not THAT many words.',
  phraseto_tsquery('simple', 'text has many words')
);

结果是：

This is my test <b>text</b>. My test <b>text</b> <b>has</b> <b>many</b> <b>words</b>. Well, not THAT <b>many</b> <b>words</b>.

但我早料到会这样：

This is my test text. My test <b>text</b> <b>has</b> <b>many</b> <b>words</b>. Well, not THAT many words.

或者理想情况下甚至是这样：

This is my test text. My test <b>text has many words</b>. Well, not THAT many words.

旁注：

phraseto_tsquery('simple', 'text has many words')

相当于

to_tsquery('simple', 'text <-> has <-> many <-> words')

我不确定我是否做错了什么，或者 ts_headline 是否根本不支持这种突出显示。

Answer 1

phraseto_tsquery('simple', 'text has many words')

生成正确的查询，但问题似乎出在

ts_headline

函数中。似乎已经报告了 BUG #155172。

Answer 2

我正在编写一个扩展，它改进了 ts_headline 功能，以正确突出显示具有单个标签的匹配短语，而不突出显示部分匹配。该扩展可在 https://github.com/thevermeer/pg_ts_semantic_headline 获取，旨在直接替换 ts_headline。

用途：

SELECT ts_semantic_headline(
  'simple',
  'This is my test text. My test text has many words. Well, not THAT many words.',
  phraseto_tsquery('simple', 'text has many words')
);

产生： | ts_semantic_headline | ts_semantic_headline | | --- | |这是我的测试文本。我的测试文本有很多单词。嗯，没有那么多单词。 |

ts_semantic_headline

解决方案是在底层使用

ts_headline

来生成内容片段，然后使用文本解析和定制的TSVectors，以及包含的

ts_fast_headline

功能以最小（5-10％）执行多单词突出显示性能成本高于 ts_headline。

如果关注性能，

ts_fast_headline

函数还可以使用 2 个预处理列 (TSPVector + TEXT[])，并以比 ts_headline 快 5-10 倍的速度提供突出显示的内容。

如何让 ts_headline 尊重phraseto_tsquery

问题描述投票：0回答：2

2个回答

最新问题

如何让 ts_headline 尊重phraseto_tsquery

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2