使用生成器表达式与fuzzywuzzy得到前2名的比赛。

问题描述 投票:0回答:1

我试图将生成器与fuzzywuzzy的extract方法结合起来使用,以加快我的代码速度,但在存储前两场比赛和它们的分数时遇到了困难。我之前在将 "结果 "存储为List对象时可以做到这一点,但我不确定是否可以尝试用生成器对象进行同样的操作。

from fuzzywuzzy import process

lookup_list = ['Pepsi Co','Steel United','ADIGAS','Consulting Group LLC','Company ABCDY','Blueberry Corp','FoodIndustries','PETCEO', 'OxChem']
vals = ['Pepsi', 'Steel Untd', 'ADIDAS','Consulting Group', 'Company ABC','Bluuberrie Cor']

results = (process.extract(val, lookup_list, scorer=fuzz.WRatio, limit=2) for val in vals)

best_match = None
best_score = 0

best_match_2 = None
best_score_2 = 0

for result in results: 
    best_score, best_match = (result[0][1], result[0][0]) if result[0][1] > best_score else (best_score, best_match)
    best_score2, best_match2 = .....

希望的输出。

best_score
95
best_match
'Consulting Group LLC'

best_score2
92
best_match
'Company ABCDY'
python generator fuzzywuzzy
1个回答
0
投票

事后看来,这是一个非常愚蠢的问题,我的困惑是源于我对生成器的经验不足。谢谢 @Blckknght 的建议,使用 heapq.nlargest 几乎解决了所有问题。

from fuzzywuzzy import process

lookup_list = ['Pepsi Co','Steel United','ADIGAS','Consulting Group LLC','Company ABCDY','Blueberry Corp','FoodIndustries','PETCEO', 'OxChem']
vals = ['Pepsi', 'Steel Untd', 'ADIDAS','Consulting Group', 'Company ABC','Bluuberrie Cor']

results = (result for val in vals for result in process.extract(val, isv_lookup_list, scorer=fuzz.WRatio, limit=2))
top_two = heapq.nlargest(2, results, key=lambda x: x[1])

best_match_1, best_score_1 = top_two[0][0], top_two[0][1]
best_match_2, best_score_2 = top_two[1][0], top_two[1][1]

输出。

best_score
95
best_match
'Consulting Group LLC'

best_score2
92
best_match
'Company ABCDY'
© www.soinside.com 2019 - 2024. All rights reserved.