pyLDAvis 错误 AttributeError: 'CountVectorizer' object has no attribute 'get_feature_names'

Question

我正在为我的一个项目进行主题建模，并努力将结果可视化。我认为程序是正确的。特别是当我运行这条线时

vis = pyLDAvis.sklearn.prepare(bi_lda, bigram_vectorized, bivectorizer, mds='tsne')
pyLDAvis.show(vis)

我得到这个错误：

AttributeError: 'CountVectorizer' object has no attribute 'get_feature_names'

我觉得这很奇怪，无法弄清楚，因为程序是正确的，我能够创建一个 lda 模型。

我创建模型的方式如下

import numpy as np
import pandas as pd
from tqdm import tqdm
import string
import matplotlib.pyplot as plt
from sklearn.decomposition import NMF, LatentDirichletAllocation, TruncatedSVD
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.manifold import TSNE
import concurrent.futures
import time
import pyLDAvis.sklearn
from pylab import bone, pcolor, colorbar, plot, show, rcParams, savefig
import warnings
warnings.filterwarnings('ignore')

%matplotlib inline
import os
#print(os.listdir("../input"))

# Plotly based imports for visualization
import chart_studio.plotly as py


from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)
import plotly.graph_objs as go
import plotly.figure_factory as ff


# spaCy based imports
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from spacy.lang.en import English
!python -m spacy download it_core_news_sm

进行中：

# Create a custom stopword list
custom_stop_words = []

# Add spaCy's built-in stop words to the list
custom_stop_words.extend(spacy.lang.it.stop_words.STOP_WORDS)

def spacy_tokenizer(sentence):
    # Use the Italian model to tokenize the sentence
    mytokens = nlp(sentence)

    # Use lemmatization to lowercase, strip, and remove stop words and punctuation
    mytokens = [ word.lemma_.lower().strip() if word.lemma_ != "-PRON-" else word.lower_ for word in mytokens ]
    mytokens = [ word for word in mytokens if word not in custom_stop_words and word not in punctuations ]
    mytokens = " ".join([i for i in mytokens])

    return mytokens

tqdm.pandas()
df["processed_description"] = df["content"].progress_apply(spacy_tokenizer)

# Creating a vectorizer
vectorizer = CountVectorizer(min_df=5, max_df=0.9, stop_words=custom_stop_words, lowercase=True, token_pattern='[a-zA-Z\-][a-zA-Z\-]{2,}')
data_vectorized = vectorizer.fit_transform(df["processed_description"])
# Latent Dirichlet Allocation Model
NUM_TOPICS = 10
lda = LatentDirichletAllocation(n_components=NUM_TOPICS, max_iter=10, learning_method='online',verbose=True)
data_lda = lda.fit_transform(data_vectorized)

我遇到的问题就在这里

pyLDAvis.enable_notebook()
dash = pyLDAvis.sklearn.prepare(lda, data_vectorized, vectorizer, mds='tsne')
dash

输出总是 AttributeError: 'CountVectorizer' object has no attribute 'get_feature_names' 我也尝试更新库，但它不起作用

相反，如果我这样绘制它，它就会这样做

svd_2d = TruncatedSVD(n_components=2)
data_2d = svd_2d.fit_transform(data_vectorized)
trace = go.Scattergl(
    x = data_2d[:,0],
    y = data_2d[:,1],
    mode = 'markers',
    marker = dict(
        color = '#FFBAD2',
        line = dict(width = 1)
    ),
    text = vectorizer.get_feature_names_out(),
    hovertext = vectorizer.get_feature_names_out(),
    hoverinfo = 'text' 
)
data = [trace]
iplot(data, filename='scatter-mode')

Answer 1

在最新版本中修复，这里： https://github.com/bmabey/pyLDAvis/pull/235

Answer 2

使用较新版本的 scikit-learn >= 1.2 时会发生此错误。要解决此问题，只需替换任何涉及

的逻辑

import pyLDAvis.sklearn
...
pyLDAvis.sklearn.prepare

与

import pyLDAvis.lda_model
...
pyLDAvis.lda_model.prepare

这应该可以解决问题。有关此here.

的更多背景信息

pyLDAvis 错误 AttributeError: 'CountVectorizer' object has no attribute 'get_feature_names'

问题描述投票：0回答：2

2个回答

最新问题

pyLDAvis 错误 AttributeError: 'CountVectorizer' object has no attribute 'get_feature_names'

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2