Python：将字典值中的短语匹配到句子（字典键）并根据匹配结果输出

Question

我有一本字典，其中每个键是一个句子，值是该句子中的特定单词或短语。

例如：

dict1 = {'it is lovely weather and it is kind of warm':['lovely weather', 'it is kind of warm'],'and the weather is rainy and cold':['rainy and cold'],'the temperature is ok':['temperature']}

我希望根据短语是否在字典值中来标记每个句子的输出。

在此示例中，输出为（其中0不在值中，而1在值中）

*
it 0
is 0
lovely weather 1 (combined because it's a phrase)
and 0
it is kind of warm 1 (combined because it's a phrase)
*
and 0
the 0
weather 0
is 0
rainy and cold 1 (combined because it's a phrase)
...(and so on)...

我可以使类似的东西起作用，但是只能通过对短语中的单词数进行硬编码：

for k,v in dict1.items():
   words_in_val = v.split()
   if len(words_in_val) == 1:
      words = k.split()
      for each_word in words:
             if v == each_word:
                   print(each_word + '\t' + '1')
             else:
                   print(each_word + '\t' + '0')


     if len(words_in_val) == 2::
         words = k.split()
         for index,item in enumerate(words[:-1]):
                if words[index] == words_in_val[0]:
                       if words[index+1] == words_in_val[1]:
                              words[index] = ' '.join(words_in_val)
                              words.remove(words[index+1])
                              ....something like this...

[我的问题是我可以看到它开始变得凌乱，而且从理论上讲，我想匹配的词组中可以包含无限数量的单词，尽管通常是<10。

有人会对如何执行此操作有更好的主意吗？

Answer 1

所以这就是我要做的：

from collections import defaultdict

dict1 = {'it is lovely weather and it is kind of warm':['lovely weather', 'it is kind of warm'],'and the weather is rainy and cold':['rainy and cold'],'the temperature is ok':['temperature']}

def tag_sentences(dict):
    id = 1
    tagged_results = []
    for sentence, phrases in dict.items():
        words = sentence.split()
        phrases_split = [phrase.split() for phrase in phrases]
        positions_keeper = {}
        sentence_results = [(word, 0) for word in words]
        for word_index, word in enumerate(words):
            for index, phrase in enumerate(phrases_split):
                position = positions_keeper.get(index, 0)
                if phrase[position] == word:
                    if len(phrase) > position + 1:
                        positions_keeper[index] = position + 1
                    else:
                        for i in range(len(phrase)):
                            sentence_results[word_index - i] = (sentence_results[word_index - i][0], id)
                        id = id + 1
        tagged_results.append(sentence_results)
    return tagged_results

def print_tagged_results(tagged_results):
    for tagged_result in tagged_results:
        memory = 0
        memory_sentence = ""
        for result, id in tagged_result:
            if memory != 0 and memory != id:
                print(memory_sentence + "1")
                memory_sentence = ""
            if id == 0:
                print(result, 0)
            else:
                memory_sentence += result + " "
            memory = id
        if memory != 0:
            print(memory_sentence + "1")

tagged_results = tag_sentences(dict1)
print_tagged_results(tagged_results)

这基本上是在做以下事情：

首先，我以[(it, 0), (is, 0), (lovely, 0) ...]的格式列出标签列表>
在标记列表中，我标记为0 =>不在一个组中，而其他整数不一起分组（带有标签1的单词分组在一起，带有标签2的单词分组在一起）]
我反复遍历每个单词，并在与短语开头匹配的地方对其进行标记
如果它是短语的结尾，我会标记该单词以及所有过去使用该ID相同的短语匹配的单词
如果不是结束，我将保持位置并开始下一次迭代。
最后，我有一个格式为[(it, 0), (is, 0), (lovely, 1) ... (kind,2), (of, 2), ...]的标记列表>

如果一个短语是另一个短语的副词，则将不起作用，但您在示例中从未提及过它如何应对这种情况。

Python：将字典值中的短语匹配到句子（字典键）并根据匹配结果输出

问题描述投票：1回答：1

1个回答

最新问题

Python：将字典值中的短语匹配到句子（字典键）并根据匹配结果输出

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1