Gensim LDA提供主题ID但概率的输出不等于1

问题描述 投票:0回答:1

我已经将LDA模型训练到100集群主题,并且根据我的知识,每个主题都应该以一定的概率输出,所有这些都加起来为1。

但是,当我运行此代码时,我只获得了2个主题。

请帮忙。

text = "A blood cell, also called a hematocyte, is a cell produced by hematopoiesis and normally found in blood."

# transform text into the bag-of-words space
bow_vector = dictionary.doc2bow(tokenize(text))
lda_vector = lda_model[bow_vector]
print("LDA Output: ", lda_vector)
print("\nTop Keywords from highest prob Topic: ",lda_model.print_topic(max(lda_vector, key=lambda item: item[1])[0]))
print("\n\nAddition of all the probabilities from LDA output:",functools.reduce(lambda x,y:x+y,[i[1] for i in lda_vector]))

LDA产出:[(64,0.6952628),(69,0.18223721)]

来自最高概率的热门关键词主题:0.042 *“健康”+ 0.032 *“医疗”+ 0.017 *“患者”+ 0.016 *“癌症”+ 0.015 *“医院”+ 0.015 *“说”+ 0.015 *“治疗”+ 0.012 *“医生”+ 0.012 *“护理”+ 0.012 *“药物”

增加LDA输出的所有概率:0.8775

machine-learning nlp gensim topic-modeling unsupervised-learning
1个回答
0
投票

如果将minimum_probability的参数LdaModel设置为0,则总和将为1(或由于近似误差而接近1)。它控制过滤为文档返回的主题。

© www.soinside.com 2019 - 2024. All rights reserved.