Doc2Vec.infer_vector每次都会在特定的训练模型上保持不同的结果

Question

我正在尝试遵循这里提到的官方Doc2Vec Gensim教程 - https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-lee.ipynb

我修改了第10行中的代码以确定给定查询的最佳匹配文档，每次运行时，我得到一个完全不同的结果集。我在笔记本第10行的新代码是：

inferred_vector = model.infer_vector(['only', 'you', 'can', 'prevent', 'forest', 'fires']) sims = model.docvecs.most_similar([inferred_vector], topn=len(model.docvecs)) rank = [docid for docid, sim in sims] print(rank)

每次运行这段代码时，我都会获得与此查询匹配的不同文档集：“只有您可以防止森林火灾”。差别很明显，似乎并不匹配。

Doc2Vec不适合查询和信息提取吗？还是有bug？

Answer 1

查看代码，在infer_vector中，您使用的算法部分是非确定性的。单词向量的初始化是确定性的 - 参见seeded_vector的代码，但是当我们进一步观察时，即单词的随机采样时，负采样（每次迭代仅更新单词向量的样本）可能导致非确定性输出（感谢@gojomo）。

    def seeded_vector(self, seed_string):
        """Create one 'random' vector (but deterministic by seed_string)"""
        # Note: built-in hash() may vary by Python version or even (in Py3.x) per launch
        once = random.RandomState(self.hashfxn(seed_string) & 0xffffffff)
        return (once.rand(self.vector_size) - 0.5) / self.vector_size

Doc2Vec.infer_vector每次都会在特定的训练模型上保持不同的结果

问题描述投票：6回答：1

1个回答

最新问题

Doc2Vec.infer_vector每次都会在特定的训练模型上保持不同的结果

问题描述 投票：6回答：1

1个回答

最新问题

问题描述投票：6回答：1