从python单词包中搜索文本

Question

假设我有很多关键词。例如：

['profit low', 'loss increased', 'profit lowered']

我有一个pdf文档，然后我从中解析出整个文本，现在我想得到与单词袋匹配的句子。

让我们说一个句子是：

'The profit in the month of November lowered from 5% to 3%.'

此词应与单词袋'profit lowered'匹配此句子匹配。

解决python中这个问题的最佳方法是什么？

Answer 1

# input
checking_words = ['profit low', 'loss increased', 'profit lowered']
checking_string = 'The profit in the month of November lowered from 5% to 3%.'

trans_check_words = checking_string.split()
# output
for word_bug in [st.split() for st in checking_words]:
    if word_bug[0] in trans_check_words and word_bug[1] in trans_check_words:
        print(word_bug)

Answer 2

如果您要检查所有检查单词列表元素，如果它们都在长句子内

sentence = 'The profit in the month of November lowered from 5% to 3%.'

words = ['profit','month','5%']

for element in words:
    if element in sentence:
        #do something with it
        print(element)

如果您想变得更清洁，可以使用此衬套循环将匹配的单词收集到列表中：

sentence = 'The profit in the month of November lowered from 5% to 3%.'

words = ['profit','month','5%']

matched_words = [] # Will collect the matched words in the next life loop:

[matched_words.append(word) for word in words if word in sentence]

print(matched_words)

如果您在列表中的每个元素上都有“隔开”的单词，则希望使用split（）方法来处理它。

sentence = 'The profit in the month of November lowered from 5% to 3%.'

words = ['profit low','month high','5% 3%']

single_words = []
for w in words:
    for s in range(len(w.split(' '))):
        single_words.append(w.split(' ')[s])

matched_words = [] # Will collect the matched words in the next life loop:
[matched_words.append(word) for word in single_words if word in sentence]

print(matched_words)

Answer 3

您可以尝试以下操作：

将单词袋转换为句子：

bag_of_words = ['profit low', 'loss increased', 'profit lowered']    
bag_of_word_sent =  ' '.join(bag_of_words)

然后加上句子列表：

list_sents = ['The profit in the month of November lowered from 5% to 3%.']

使用Levenshtein距离：

import distance
for sent in list_sents:
    dist = distance.levenshtein(bag_of_word_sent, sent)
    if dist > len(bag_of_word_sent):
        # do something
        print(dist)

从python单词包中搜索文本

问题描述投票：0回答：3

3个回答

最新问题

从python单词包中搜索文本

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3