如何删除 NLTK 停用词列表中的一些词

问题描述 投票:0回答:0

我想在 nltk 提供的停用词列表中添加一些词。我有一个 csv 文件,其中包含我想添加到列表中的停用词,但它不起作用。这是我试过的:

from nltk.corpus import stopwords

stop_words = set(stopwords.words('english'))

#add words that aren't in the NLTK stopwords list
new_stopwords = ["stop", "pause", "replay"]
new_stopwords_list = stop_words.union(new_stopwords)

#add words that aren't in the NLTK stopwords list from a csv file
tcsv_stopword = pd.read_csv("stopwords.csv", names= ["stopwords"], header = None)

# convert stopword string to list & append additional stopword
list_stopwords.extend(txt_stopword["stopwords"][0].split(' '))

#remove words that are in NLTK stopwords list
not_stopwords = {"not", "don't", "never", "without"} 
final_stop_words = set([word for word in new_stopwords_list if word not in not_stopwords])

print(final_stop_words)
python jupyter-notebook nltk sentiment-analysis stop-words
© www.soinside.com 2019 - 2024. All rights reserved.