我已经将LDA模型训练到100集群主题,并且根据我的知识,每个主题都应该以一定的概率输出,所有这些都加起来为1。
但是,当我运行此代码时,我只获得了2个主题。
请帮忙。
text = "A blood cell, also called a hematocyte, is a cell produced by hematopoiesis and normally found in blood."
# transform text into the bag-of-words space
bow_vector = dictionary.doc2bow(tokenize(text))
lda_vector = lda_model[bow_vector]
print("LDA Output: ", lda_vector)
print("\nTop Keywords from highest prob Topic: ",lda_model.print_topic(max(lda_vector, key=lambda item: item[1])[0]))
print("\n\nAddition of all the probabilities from LDA output:",functools.reduce(lambda x,y:x+y,[i[1] for i in lda_vector]))
LDA产出:[(64,0.6952628),(69,0.18223721)]
来自最高概率的热门关键词主题:0.042 *“健康”+ 0.032 *“医疗”+ 0.017 *“患者”+ 0.016 *“癌症”+ 0.015 *“医院”+ 0.015 *“说”+ 0.015 *“治疗”+ 0.012 *“医生”+ 0.012 *“护理”+ 0.012 *“药物”
增加LDA输出的所有概率:0.8775
如果将minimum_probability
的参数LdaModel
设置为0
,则总和将为1
(或由于近似误差而接近1
)。它控制过滤为文档返回的主题。