Gensim LDA提供主题ID但概率的输出不等于1

Question

我已经将LDA模型训练到100集群主题，并且根据我的知识，每个主题都应该以一定的概率输出，所有这些都加起来为1。

但是，当我运行此代码时，我只获得了2个主题。

请帮忙。

text = "A blood cell, also called a hematocyte, is a cell produced by hematopoiesis and normally found in blood."

# transform text into the bag-of-words space
bow_vector = dictionary.doc2bow(tokenize(text))
lda_vector = lda_model[bow_vector]
print("LDA Output: ", lda_vector)
print("\nTop Keywords from highest prob Topic: ",lda_model.print_topic(max(lda_vector, key=lambda item: item[1])[0]))
print("\n\nAddition of all the probabilities from LDA output:",functools.reduce(lambda x,y:x+y,[i[1] for i in lda_vector]))

LDA产出：[（64,0.6952628），（69,0.18223721）]

来自最高概率的热门关键词主题：0.042 *“健康”+ 0.032 *“医疗”+ 0.017 *“患者”+ 0.016 *“癌症”+ 0.015 *“医院”+ 0.015 *“说”+ 0.015 *“治疗”+ 0.012 *“医生”+ 0.012 *“护理”+ 0.012 *“药物”

增加LDA输出的所有概率：0.8775

Answer 1

如果将minimum_probability的参数LdaModel设置为0，则总和将为1（或由于近似误差而接近1）。它控制过滤为文档返回的主题。

Gensim LDA提供主题ID但概率的输出不等于1

问题描述投票：0回答：1

1个回答

最新问题

Gensim LDA提供主题ID但概率的输出不等于1

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1