AttributeError: 'CountVectorizer' object has no attribute 'get_feature_names' -- 主题建模 -- Latent Dirichlet Allocation

问题描述 投票:0回答:0

我正在尝试按照以下链接中的示例进行操作。

https://medium.datadriveninvestor.com/trump-tweets-topic-modeling-using-latent-dirichlet-allocation-e4f93b90b6fe

到目前为止所有代码都有效,但下面的代码不起作用。

from sklearn.decomposition import LatentDirichletAllocation
vectorizer = CountVectorizer(
            analyzer='word',       
            min_df=3,# minimum required occurences of a word 
            stop_words='english',# remove stop words
            lowercase=True,# convert all words to lowercase
            token_pattern='[a-zA-Z0-9]{3,}',# num chars > 3
            max_features=5000,# max number of unique words
            )


data_matrix = vectorizer.fit_transform(df_clean['question_lemmatize_clean'])

                                                                    
lda_model = LatentDirichletAllocation(
            n_components=10, # Number of topics
            learning_method='online',
            random_state=20,       
            n_jobs = -1  # Use all available CPUs
            )
    
    
lda_output = lda_model.fit_transform(data_matrix)
                                                                    

import pyLDAvis
import pyLDAvis.sklearn
pyLDAvis.enable_notebook()
pyLDAvis.sklearn.prepare(lda_model, data_matrix, vectorizer, mds='tsne')    

当我运行该代码片段时,我收到此错误消息。

AttributeError                            Traceback (most recent call last)
Cell In[83], line 29
     27 import pyLDAvis.sklearn
     28 pyLDAvis.enable_notebook()
---> 29 pyLDAvis.sklearn.prepare(lda_model, data_matrix, vectorizer, mds='tsne')

File ~\anaconda3\lib\site-packages\pyLDAvis\sklearn.py:94, in prepare(lda_model, dtm, vectorizer, **kwargs)
     62 def prepare(lda_model, dtm, vectorizer, **kwargs):
     63     """Create Prepared Data from sklearn's LatentDirichletAllocation and CountVectorizer.
     64 
     65     Parameters
   (...)
     92     See `pyLDAvis.prepare` for **kwargs.
     93     """
---> 94     opts = fp.merge(_extract_data(lda_model, dtm, vectorizer), kwargs)
     95     return pyLDAvis.prepare(**opts)

File ~\anaconda3\lib\site-packages\pyLDAvis\sklearn.py:38, in _extract_data(lda_model, dtm, vectorizer)
     37 def _extract_data(lda_model, dtm, vectorizer):
---> 38     vocab = _get_vocab(vectorizer)
     39     doc_lengths = _get_doc_lengths(dtm)
     40     term_freqs = _get_term_freqs(dtm)

File ~\anaconda3\lib\site-packages\pyLDAvis\sklearn.py:20, in _get_vocab(vectorizer)
     19 def _get_vocab(vectorizer):
---> 20     return vectorizer.get_feature_names()

AttributeError: 'CountVectorizer' object has no attribute 'get_feature_names'

我觉得,也许某些库没有正确更新,但我不能说,当我谷歌它时,我没有得到很好的结果来帮助我调试这个东西。有人知道这里出了什么问题吗?

python python-3.x topic-modeling countvectorizer latentdirichletallocation
© www.soinside.com 2019 - 2024. All rights reserved.