去除关键字的困境

问题描述投票：0回答：1

我在NLTK中遇到停用词功能的困境。我正在通过使用NLTK删除停用词来处理来自社交媒体平台的用户生成的内容。但是，难题是我想在用户文本中保留人称代词，这对于分类任务很重要。其中包括“我”，“您”，“我们”等词。

[不幸的是，停用词功能也删除了这些词，我需要出现这些词。我该如何解决这个问题？

python nlp nltk stop-words

1个回答

0
投票

import nltk
from nltk.corpus import stopwords
stop_words= stopwords.words('english')
type(stop_words)
print(len(stop_words))

如果查看输出，停用词的类型为List。然后：

personal_pronouns= ['i', 'you', 'she', 'he', 'they'] # you can add another words for remove
for word in personal_pronouns:
    if word in stop_words:
        stop_words.remove(word)
        print(word+ '  Deleted')
print(len(stop_words))

最新问题

© www.soinside.com 2019 - 2024. All rights reserved.