如何从BoW矢量中取回字符串？

Question

我为一个名为qazxsw poi的pandas数据帧列生成了BoW。

tech_raw_data['Product lower']

在测试字符串与此BoW向量的相似性旁边，我在数据框的一列中创建了BoW，玩具['ITEM NAME']。

count_vect = CountVectorizer()
smer_counts = count_vect.fit_transform(tech_raw_data['Product lower'].values.astype('U'))
smer_vocab = count_vect.get_feature_names()

检查相似之处：

 toys = pd.read_csv('toy_data.csv', engine='python')
 print('-'*80)
 print(toys['ITEM NAME'].iloc[0])
 print('-'*80)
 inp = [toys['ITEM NAME'].iloc[0]]

 cust_counts = count_vect.transform(inp)
 cust_vocab = count_vect.get_feature_names()

现在，当匹配率超过0.85时，我需要在def similar(a, b): return SequenceMatcher(None, a, b).ratio() for x in cust_counts[0].toarray(): for y in smer_counts.toarray(): ratio = similar(x, y) #print(ratio) if ratio>=0.85: should print the string corresponding to BoW y数据帧中打印对应于smer_counts的字符串。

Answer 1

tech_raw_data['Product lower']

枚举for x in cust_counts[0].toarray(): for i, y in enumerate(smer_counts.toarray()): ratio = similar(x, y) #print(ratio) if ratio>=0.85: print (tech_raw_data.loc[i, 'Product lower'])返回的numpy数组，并在smer_counts.toarray()中使用索引来获取ratio>=0.85数据帧中的相应文本。

这是有效的，因为tech_raw_data以及数据帧中的记录顺序都被保留。

如何从BoW矢量中取回字符串？

问题描述投票：0回答：1

1个回答

最新问题

如何从BoW矢量中取回字符串？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1