如何将numpy数组加载到gensim Keyedvector格式？

Question

在我训练了单词嵌入后，我将其保存为npz格式。当我尝试将其加载为KeyedVectors格式时，它会产生错误。如何将numpy数组加载为gensim.KeyedVectors格式？我真的需要它，因为我需要使用像most_similar（）这样的函数而不仅仅是向量值。

在带有张量流的model.py中，

self.verb_embeddings = tf.Variable(np.load(cfg.pretrained_target)["embeddings"],
                                               name="verb_embeddings",
                                               dtype=tf.float32,
                                               trainable=cfg.tune_emb)

在saving.py中

target_emb = sess.run(model.verb_embeddings)
np.savez_compressed("trained_target_emb.npz", embeddings=target_emb)

在main.py中

 model = KeyedVectors.load('trained_target_emb.npz')

我有

_pickle.UnpicklingError: A load persistent id instruction was encountered, but no persistent_load function was specified.

也试过了

 model = KeyedVectors.load_word2vec_format('trained_target_emb.npz')

但得到了

 UnicodeDecodeError: 'utf-8' codec can't decode byte 0xde in position 14: invalid continuation byte

Answer 1

Gensim KeyedVectors实例无法从纯数组中加载：没有关于表示哪些单词以及哪些索引包含哪些单词的信息。

gensim中的普通.load()期望使用gensim自己的.save()方法从gensim中保存的对象。

可以从与原始Google / Mikolov word2vec.c工具使用的格式相同的文件中加载Word向量。那么也许你的张量流代码可以这样保存它们？

然后，你会使用.load_word2vec_format()。

如何将numpy数组加载到gensim Keyedvector格式？

问题描述投票：0回答：1

1个回答

最新问题

如何将numpy数组加载到gensim Keyedvector格式？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1