我有两组数据,我需要使用Fuzzywuzzy或任何其他选项来找到相似的匹配项,
数据集包含如下所述的列(两个数据集的列相同,但行不同)
SLNo |产品名称|品牌
首先,如果品牌相似度大于95,我需要找到品牌相似度,然后我需要检查产品名称相似度
我尝试过以下代码
import pandas as pd
from fuzzywuzzy import process, fuzz
Bl=pd.read_excel(r'C:\Datas\BLRL3.xlsx')
master=pd.read_excel(r'C:\Datas\MO.xlsx')
actual_Name= []
similarity = []
brandsimilarity = []
for i in Bl.Productname:
for j in Bl.Brand:
brandratio = process.extract( i, master.Brand, limit=1,scorer=fuzz.token_sort_ratio)
brandsimilarity.append(brandratio[0][1])
if brandsimilarity > 95:
ratio = process.extract( i, master.Productname, limit=1,scorer=fuzz.token_sort_ratio)
actual_Productname.append(ratio[0][0])
similarity.append(ratio[0][1])
Bl['actual_Name'] = pd.Series(actual_Name)
Bl['similarity'] = pd.Series(similarity)
Bl['brandsimilarity']=pd.Series(brandsimilarity)
Bl.to_csv("oput2503-2.csv",index = False)
错误:如果品牌相似度> 95:TypeError:“>”实例与“列表”实例之间不支持“>”
brandsimilarity.append(brandratio[0][1])
但是
brandsimilarity = brandratio[0][1]