从带有列表的列表中删除带有nltk.corpus的停用词

Question

我有一个列表，其中包含带有评论的所有单独单词的列表，看起来像这样：

texts = [['fine','for','a','night'],['it','was','good']]

我想使用nltk.corpus软件包删除所有停用词，并将所有没有停用词的词放回到列表中。最终结果应该是一个列表，由不带停用词的单词列表组成。这是我尝试过的：

import nltk
nltk.download() # to download stopwords corpus
from nltk.corpus import stopwords
stopwords=stopwords.words('english')
words_reviews=[]

for review in texts:
    wr=[]
    for word in review:
        if word not in stopwords:
            wr.append(word)
        words_reviews.append(wr)

此代码实际上有效，但是现在我得到了错误：AttributeError：'list'对象没有属性'words'，指的是停用词。我确保已安装所有软件包。可能是什么问题？

Answer 1

问题是您在代码中重新定义了stopwords：

from nltk.corpus import stopwords
stopwords=stopwords.words('english')

[在第一行之后，stopwords是具有words()方法的语料库阅读器。在第二行之后，它是一个列表。相应地继续。

实际上在列表中查找内容的速度确实很慢，因此如果使用此功能，您将获得更好的性能：

stopwords = set(stopwords.words('english'))

Answer 2

而不是

[word for word in text_tokens if not word in stopwords.words()]

使用

[word for word in text_tokens if not word in all_stopwords]

从带有列表的列表中删除带有nltk.corpus的停用词

问题描述投票：0回答：2

2个回答

最新问题

从带有列表的列表中删除带有nltk.corpus的停用词

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2