返回特定主题的句子列表

Question

我正在探索一小部分文本，我正在做的其中一件事是检查与各种主题相关的动作。我已经盘点了多少次，例如，“man”是动词为“love”的句子的主语：这项工作是使用 Textacy 用主谓宾三元组完成的。

当我处理各种统计数据时，我希望能够返回数据并查看在其原始上下文中具有主题的句子。 NLTK 有一个内置的索引功能，但它不注意词性标注。我已经用代码走了这么远。

我想做的是

find_the_subject("noun", corpus)

，如果我输入“man”，我会得到一个以man为主题的句子列表：

一个人走在街上说为什么我中间很矮？

男人来了。

到目前为止，我有以下代码可以抓取所有带有“man”的句子，而不仅仅是以 man 为主题的句子。

def find_sentences_with_noun(subject_noun, sentences):
    # Start with two empty lists
    noun_subjects = []
    noun_sentences = []
    # Work through the sentences
    for sentence in sentences:
        words = word_tokenize(sentence)
        tagged_words = nltk.tag.pos_tag(words)
        # This works but doesn't get me the subject
        for word, tag in tagged_words:
            if "NN" in tag and word == subject_noun:
                noun_subjects.append(word)
                noun_sentences.append(sentence)
    return noun_sentences

我这辈子都想不出如何抓住主语位置的名词。

返回特定主题的句子列表

问题描述投票：0回答：0

最新问题

返回特定主题的句子列表

问题描述 投票：0回答：0

最新问题

问题描述投票：0回答：0