如何理解gensim LDA模型中的“Phi值”

Question

从文档来看，我想知道 gensim LdaModel 中每个文档的主题术语概率。我得到了这样的东西

lda_model = LdaModel(corpus, id2word=dictionary, num_topics=50)


# phi relevance of the document 1
phi_doc1 = lda_model.get_document_topics(corpus[1], 
minimum_probability=0.05, per_word_topics=True)[2]


phi_doc1
---
[(52, [(8, 19.999924)]),
 (69, [(8, 666.9981)]),
 (241, [(8, 30.999844)]),
 (482, [(8, 0.9999151)]),
 (593, [(8, 5.9999304)])]

但是我无法理解这些值的含义。

我想知道phi relevance的意思。看了求助信息没看懂


help(lda_model.get_document_topics)

--
" ...
Phi relevance values, multiplied by the feature length, 
for each word-topic combination.
Each element in the list is a pair of a word's id and 
a list of the phi values between this word and each topic..."

值的含义是什么：

lda_model.get_document_topics(corpus[1], minimum_probability=0.05, per_word_topics=True)[2]

这是“每个文档的主题的术语概率”吗？

Answer 1

我的理解是你收到的结果是指：list of word-ids and tuples of (topic number, phi value)。你想要的是每个主题的文档概率。

如果您的任务只是获取文档概率，请在 get_document_topics() 中使用 per_word_topics=False。这将返回文档的 (topic, probability) 元组。更多信息：https://radimrehurek.com/gensim/models/ldamodel.html

Phi 值是单词分布的相对度量。他们告诉哪个词增加了文档属于某个主题的概率（在您的案例中为主题 8）。看看这个：https://miningthedetails.com/LDA_Inference_Book/lda-inference.html

如何理解gensim LDA模型中的“Phi值”

问题描述投票：0回答：1

1个回答

最新问题

如何理解gensim LDA模型中的“Phi值”

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1