用文本块匹配字符串列表

问题描述 投票:0回答:1

初学者在这里:

我有一段文字:

例如:'hey this is a block of text, for an example, wow looks cool blah blah blah angiotensin enzyme looks cool okay.But what about angiotensin enzym well I dont know.'

和单词列表:['angiotensin enzyme serum', 'some diff enzyme', 'angiotensin enzyme a1']

我的最终目标是从单词列表中找到与文本块中的字符串匹配/模糊匹配的单词。

我尝试了什么:difflib.get_close_matches

需要输出:'angiotensin enzyme serum''angiotensin enzyme a1'

输出顺序无关紧要。

对于其他文本块,列表中的其他字符串将匹配。块不是常数。

有没有办法做到这一点?

python-3.x fuzzy-search
1个回答
0
投票

使用fuzzywuzzy(来自PyPi):

from fuzzywuzzy import fuzz

text = 'hey this is a block of text, for an example, wow looks cool blah blah blah angiotensin enzyme looks cool okay.But what about angiotensin enzym well I dont know.'

words = ['angiotensin enzyme serum', 'some diff enzyme', 'angiotensin enzyme a1']

matches = [w for w in words if fuzz.partial_ratio(text, w) > 70.]

很显然,您需要调整阈值以适合它,但是在此示例中,这些值被很好地分开了:

>>> print(matches)
['angiotensin enzyme serum', 'angiotensin enzyme a1']

>>> for w in words:
...     print(w, fuzz.partial_ratio(text, w))
... 
angiotensin enzyme serum 83
some diff enzyme 56
angiotensin enzyme a1 90
© www.soinside.com 2019 - 2024. All rights reserved.