无法使用gensim加载Doc2vec对象

Question

我正在尝试使用gensim加载预先训练的Doc2vec模型，并使用它将段落映射到向量。我指的是https://github.com/jhlau/doc2vec，我下载的预训练模型是英文Wikipedia DBOW，它也位于同一链接中。但是，当我在Wikipedia上加载Doc2vec模型并使用以下代码来推断向量时：

import gensim.models as g
import codecs

model="wiki_sg/word2vec.bin"
test_docs="test_docs.txt"
output_file="test_vectors.txt"

#inference hyper-parameters
start_alpha=0.01
infer_epoch=1000

#load model
test_docs = [x.strip().split() for x in codecs.open(test_docs, "r", "utf-8").readlines()]
m = g.Doc2Vec.load(model)

#infer test vectors
output = open(output_file, "w")
for d in test_docs:
    output.write(" ".join([str(x) for x in m.infer_vector(d, alpha=start_alpha, steps=infer_epoch)]) + "\n")
output.flush()
output.close()

我收到一个错误：

/Users/zhangji/Desktop/CSE547/Project/NLP/venv/lib/python2.7/site-packages/smart_open/smart_open_lib.py:402: UserWarning: This function is deprecated, use smart_open.open instead. See the migration notes for details: https://github.com/RaRe-Technologies/smart_open/blob/master/README.rst#migrating-to-the-new-open-function
  'See the migration notes for details: %s' % _MIGRATION_NOTES_URL
Traceback (most recent call last):
  File "/Users/zhangji/Desktop/CSE547/Project/NLP/AbstractMapping.py", line 19, in <module>
    output.write(" ".join([str(x) for x in m.infer_vector(d, alpha=start_alpha, steps=infer_epoch)]) + "\n")
AttributeError: 'Word2Vec' object has no attribute 'infer_vector'

我知道有几个关于堆栈溢出时的infer_vector问题的线程，但是没有一个可以解决我的问题。我使用[]下载了gensim软件包

pip install git+https://github.com/jhlau/gensim

另外，在查看了gensim包中的源代码之后，我发现当我使用Doc2vec.load（）时，Doc2vec类本身并没有真正的load（）函数，但是由于它是一个子类在Word2vec中，它在Word2vec中调用load（）的超级方法，然后使模型成为Word2vec对象。但是，infer_vector（）函数是Doc2vec独有的，并且在Word2vec中不存在，因此这就是导致错误的原因。我也尝试过将模型m转换为Doc2vec，但出现此错误：

>>> g.Doc2Vec(m) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/zhangji/Library/Python/2.7/lib/python/site-packages/gensim/models/doc2vec.py", line 599, in __init__ self.build_vocab(documents, trim_rule=trim_rule) File "/Users/zhangji/Library/Python/2.7/lib/python/site-packages/gensim/models/word2vec.py", line 513, in build_vocab self.scan_vocab(sentences, trim_rule=trim_rule) # initial survey File "/Users/zhangji/Library/Python/2.7/lib/python/site-packages/gensim/models/doc2vec.py", line 635, in scan_vocab for document_no, document in enumerate(documents): File "/Users/zhangji/Library/Python/2.7/lib/python/site-packages/gensim/models/word2vec.py", line 1367, in __getitem__ return vstack([self.syn0[self.vocab[word].index] for word in words]) TypeError: 'int' object is not iterable

实际上，我现在想要gensim的全部就是使用一种预训练模型将段落转换为向量，该模型在学术文章上很好地起作用。由于某些原因，我不想自己训练模型。如果有人可以帮助我解决问题，我将非常感谢。

顺便说一句，我使用的是python2.7，当前的gensim版本为0.12.4。

谢谢！

我正在尝试使用gensim加载预先训练的Doc2vec模型，并使用它将段落映射到向量。我指的是https://github.com/jhlau/doc2vec，我下载的预训练模型是...

Answer 1

我会避免使用https://github.com/jhlau/doc2vec上使用了4年的非标准gensim叉，或仅使用此类代码加载的任何使用4年的已保存模型。

无法使用gensim加载Doc2vec对象

问题描述投票：1回答：1

1个回答

最新问题

无法使用gensim加载Doc2vec对象

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1