我训练了 BERTopic 并获得了我的主题。我现在想为这些主题自动分配标签。我遇到了一个名为 Yake 的框架。我想知道是否有 python 代码来完成此任务,或者您是否有任何资源推荐。
在我看来,有两种方法可以实现您的目标:
from sklearn.datasets import fetch_20newsgroups
from keybert import KeyBERT
# Prepare documents
docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data']
# Extract keywords
kw_model = KeyBERT()
keywords = kw_model.extract_keywords(docs)
# Create our vocabulary
vocabulary = [k[0] for keyword in keywords for k in keyword]
vocabulary = list(set(vocabulary))
# Then, we pass our vocabulary to BERTopic and train the model:
from bertopic import BERTopic
from sklearn.feature_extraction.text import CountVectorizer
vectorizer_model= CountVectorizer(vocabulary=vocabulary)
topic_model = BERTopic(vectorizer_model=vectorizer_model)
topics, probs = topic_model.fit_transform(docs)
n
中的单词数量减少到所需的数字(我假设主要是1):topic_model.generate_topic_labels(nr_words=1)