top2vec - get_documents_topics 函数行为的解释

Question

需要解释什么 get_documents_topics(doc_ids, reduced=False, num_topics=1) does.

获取文档主题。每个文档的主题将被返回。返回相应的原始主题，除非 reduced=True，在这种情况下，将返回减少的主题。

退货：

topic_nums (array of int, shape(len(doc_ids), num_topics)) – 与每个 doc_id 对应的文档的主题编号。

topic_score (array of float, shape(len(doc_ids), num_topics)) – 文档与主题的语义相似度。文档和主题向量的余弦相似度。

topics_words (array of shape(len(doc_ids), num_topics, 50)) – 对于每个主题，返回前 50 个单词，按照与主题的语义相似度排序。

word_scores (array of shape(num_topics, 50)) – 对于每个主题，返回与该主题的前 50 个词的余弦相似度分数。

使用BBC新闻分类新闻文本。

document_id = 1
document = train_df.iloc[document_id]['Text']
document
---
german business confidence slides german business confidence fell in february knocking hopes of a speedy recovery in europe s largest economy....

topic_nums, topic_score, topics_words, word_scores = \
    model.get_documents_topics([document_id], reduced=False)

print(f"topic_nums:{topic_nums}, topic_score: {topic_score}")
for word, score in zip(topics_words[0][:10], word_scores[0][:10]):
    print(f"{word:20}: {score}")
-----
topic_nums:[0], topic_score: [0.3969033]
parliament          : 0.10377583652734756
politicians         : 0.10281675308942795
britain             : 0.10191775858402252
election            : 0.09515437483787537
elections           : 0.0923602283000946
no                  : 0.08872390538454056
non                 : 0.0843275785446167
voters              : 0.08393856137990952
british             : 0.08337553590536118
bbc                 : 0.08136938512325287

什么是

topic_nums

？它是主题的 ID 还是与文档相关的主题数 (document_id = 1)？

我相信文档中的topic是Topic Vector，它是文档向量簇的平均值，但如果不是，请更正。

top2vec - get_documents_topics 函数行为的解释

问题描述投票：0回答：0

最新问题

top2vec - get_documents_topics 函数行为的解释

问题描述 投票：0回答：0

最新问题

问题描述投票：0回答：0