从 nltk 停用词中排除负面词

问题描述 投票:0回答:1

我想从我的句子中删除nltk停用词,除了那些具有负面含义的停用词,例如:不,不,不能等。换句话说,我想从停用词列表中排除负面词。我怎样才能做到这一点?

python machine-learning nltk stop-words data-preprocessing
1个回答
0
投票

没有一帆风顺的路,

negative_words = {
    'no',
    'not',
    'none',
    'neither',
    'never',
    'nobody',
    'nothing',
    'nowhere',
    'doesn't',
    'isn't',
    'wasn't',
    'shouldn't',
    'won't',
    'can't',
    'couldn't',
    'don't',
    'haven't',
    'hasn't',
    'hadn't',
    'aren't',
    'weren't',
    'wouldn't',
    'daren't',
    'needn't',
    'didn't',
    'without',
    'against',
    'negative',
    'deny',
    'reject',
    'refuse',
    'decline',
    'unhappy',
    'sad',
    'miserable',
    'hopeless',
    'worthless',
    'useless',
    'futile',
    'disagree',
    'oppose',
    'contrary',
    'contradict',
    'disapprove',
    'dissatisfied',
    'objection',
    'unsatisfactory',
    'unpleasant',
    'regret',
    'resent',
    'lament',
    'mourn',
    'grieve',
    'bemoan',
    'despise',
    'loathe',
    'detract',
    'abhor',
    'dread',
    'fear',
    'worry',
    'anxiety',
    'sorrow',
    'gloom',
    'melancholy',
    'dismay',
    'disheartened',
    'despair',
    'dislike',
    'aversion',
    'antipathy',
    'hate',
    'disdain'
}
nltk.download('stopwords')
stop_words = set(stopwords.words('english'))
def remove_stopwords(sentence, stopwords_list):
    tokens = nltk.word_tokenize(sentence)
    filtered_tokens = [word for word in tokens if word.lower() not in stop_words ]
    return ' '.join(filtered_tokens)

我自己写了这样的代码。也许这对你有用。

© www.soinside.com 2019 - 2024. All rights reserved.