[TfidfVectorizer使用我自己的停用词词典

Question

我想问你是否可以使用自己的停用词词典，而不是TfidfVectorizer中的现有停用词词典。我建立了一个更大的停用词字典，我更喜欢使用它。但是我很难将其包含在下面的代码中（尽管显示了标准代码）。

def preprocessing(line):
    line = line.lower()
    line = re.sub(r"[{}]".format(string.punctuation), " ", line)
    return line

tfidf_vectorizer = TfidfVectorizer(preprocessor=preprocessing,stop_words_='english')
tfidf = tfidf_vectorizer.fit_transform(df["0"]['Words']) # multiple dataframes

kmeans = KMeans(n_clusters=2).fit(tfidf)

但出现以下错误：

    TypeError: __init__() got an unexpected keyword argument 'stop_words_'

假设我的字典是：

stopwords["a","an", ... "been", "had",...]

我怎么包括它？

任何帮助将不胜感激。

Answer 1

TfidfVectorizer没有参数'stop_words_'。

如果您有这样的stop_words列表：smart_stoplist = ['a'，'an'，'the']

像这样使用它：

tfidf_vectorizer = TfidfVectorizer(preprocessor=preprocessing,stop_words=smart_stoplist)

[TfidfVectorizer使用我自己的停用词词典

问题描述投票：0回答：1

1个回答

最新问题

[TfidfVectorizer使用我自己的停用词词典

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1