如何使用 spacy 查找数据框 2 列中句子的相似性

Question

我从https://spacy.io/universe/project/spacy-sentence-bert

提取了这段代码

import spacy_sentence_bert
# load one of the models listed at https://github.com/MartinoMensio/spacy-sentence-bert/
nlp = spacy_sentence_bert.load_model('en_roberta_large_nli_stsb_mean_tokens')
# get two documents
doc_1 = nlp('Hi there, how are you?')
doc_2 = nlp('Hello there, how are you doing today?')
# use the similarity method that is based on the vectors, on Doc, Span or Token
print(doc_1.similarity(doc_2[0:7]))

我有一个包含两列的数据框，其中包含如下句子。我试图找出每行句子之间的相似性。我尝试了几种不同的方法，但运气不佳，所以我想在这里问。谢谢大家。

当前df

Sentence1 | Sentence2

Another-Sentence1 | Another-Sentence2

Yet-Another-Sentence1 | Yet-Another-Sentence2

目标输出：

Sentence1 | Sentence2 | Similarity-Score-Sentence1-Sentence2

Another-Sentence1 | Another-Sentence2 | Similarity-Score-Another-Sentence1-Another-Sentence2

Yet-Another-Sentence1 | Yet-Another-Sentence2 | Similarity-Score-Yet-Another-Sentence1-Yet-Another-Sentence2

Answer 1

我假设你的第一行由标题组成，数据将从标题后的下一行开始，并且还假设你正在使用 panda 将 csv 转换为数据帧，下面的代码在我的环境中工作。

import spacy_sentence_bert
import pandas as pd
nlp = spacy_sentence_bert.load_model('en_roberta_large_nli_stsb_mean_tokens')
df = pd.read_csv('testing.csv')
similarityValue = []

for i in range(df.count()[0]):
    sentence_1 = nlp(df.iloc[i][0])
    sentence_2 = nlp(df.iloc[i][1])
    similarityValue.append(sentence_1.similarity(sentence_2))
    print(sentence_1, '|', sentence_2, '|', sentence_1.similarity(sentence_2))

df['Similarity'] = similarityValue
print(df)

输入 CSV：

输出：

如何使用 spacy 查找数据框 2 列中句子的相似性

问题描述投票：0回答：1

1个回答

最新问题

如何使用 spacy 查找数据框 2 列中句子的相似性

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1