在句子列表中查找不同单词的索引

Question

一种有效的算法，它查看2个或更多句子，并返回不同的单词及其索引。

例如：
sentence_a ='cat带垫了。'
sentence_b ='rat取得了胜利。'
sentence_c ='猫和老鼠带垫了。'

这里我想将粗体文本标识为输出。

output = [cat, rat, cat and rat]

我一直在研究，但是没有发现任何有用的东西。我试图创建一个，它在每个索引处比较单词，但是当有另一个单词时，逻辑就很难建模。

将不胜感激。谢谢！

Answer 1

首先，我可以告诉您您的算法将不需要机器学习模型，可以通过经典方式完成

我对您有一个建议，第一件事是您将为每个句子创建一个数组，它将存储其中的每个单词，例如：

sentence_a = 'The cat took the mat.'
sentence_b = 'The rat took the mat.'
sentence_c = 'The cat and rat took the mat.'
list_a = list()
list_b = list()
list_c = list()
# you will parse each sentence and store the words into the lists
# list_a = [ 'The','cat','took' ... etc ]
# list_b = [ 'The','rat','took' ... etc ]
# ...

然后，您将每个列表与第一个列表进行比较，并且每个不同的句子将存储在第三个列表中，例如：

list_differences = list() # will store differences
# comparing sentence a and b
for s1,s2 in zip(list_a,list_b):
    if ( s1 != s2 ) :
       list_differences.append( s1 )
       list_differences.append( s2 )

# list_differences = [ 'cat','rat' ]

Answer 2

您可以创建句子中使用的所有单词的字典，并计算这些术语的反文档频率。每个句子中出现的单词的idf为0，而其他单词的idf较高。如果在最终数组中需要不同单词的确切顺序，则可以保留单词和句子索引的单独映射。

在句子列表中查找不同单词的索引

问题描述投票：-1回答：2

2个回答

最新问题

在句子列表中查找不同单词的索引

问题描述 投票：-1回答：2

2个回答

最新问题

问题描述投票：-1回答：2