我试图将生成器与fuzzywuzzy的extract方法结合起来使用,以加快我的代码速度,但在存储前两场比赛和它们的分数时遇到了困难。我之前在将 "结果 "存储为List对象时可以做到这一点,但我不确定是否可以尝试用生成器对象进行同样的操作。
from fuzzywuzzy import process
lookup_list = ['Pepsi Co','Steel United','ADIGAS','Consulting Group LLC','Company ABCDY','Blueberry Corp','FoodIndustries','PETCEO', 'OxChem']
vals = ['Pepsi', 'Steel Untd', 'ADIDAS','Consulting Group', 'Company ABC','Bluuberrie Cor']
results = (process.extract(val, lookup_list, scorer=fuzz.WRatio, limit=2) for val in vals)
best_match = None
best_score = 0
best_match_2 = None
best_score_2 = 0
for result in results:
best_score, best_match = (result[0][1], result[0][0]) if result[0][1] > best_score else (best_score, best_match)
best_score2, best_match2 = .....
希望的输出。
best_score
95
best_match
'Consulting Group LLC'
best_score2
92
best_match
'Company ABCDY'
事后看来,这是一个非常愚蠢的问题,我的困惑是源于我对生成器的经验不足。谢谢 @Blckknght 的建议,使用 heapq.nlargest 几乎解决了所有问题。
from fuzzywuzzy import process
lookup_list = ['Pepsi Co','Steel United','ADIGAS','Consulting Group LLC','Company ABCDY','Blueberry Corp','FoodIndustries','PETCEO', 'OxChem']
vals = ['Pepsi', 'Steel Untd', 'ADIDAS','Consulting Group', 'Company ABC','Bluuberrie Cor']
results = (result for val in vals for result in process.extract(val, isv_lookup_list, scorer=fuzz.WRatio, limit=2))
top_two = heapq.nlargest(2, results, key=lambda x: x[1])
best_match_1, best_score_1 = top_two[0][0], top_two[0][1]
best_match_2, best_score_2 = top_two[1][0], top_two[1][1]
输出。
best_score
95
best_match
'Consulting Group LLC'
best_score2
92
best_match
'Company ABCDY'