GENSIM 4.0新版本

问题描述 投票:0回答:0

我正在尝试实现 word2vec,但它给了我一个错误

import gensim
from gensim.models import Word2Vec, KeyedVectors
# Loading the pre-trained Word2Vec model
word2vec_path = '/content/drive/MyDrive/GoogleNews-vectors-negative300.bin'
word2vec = KeyedVectors.load_word2vec_format(word2vec_path, binary = True)


向量化函数

def vectorize(text):
  vectorized=[]
  for sentence in text:

    sentvec= []
    #iterate over the word
    for w in sentence:
      if w in word2vec.vocab:
        sentvec.append(word2vec[w])
      else:
          sentvec.append(np.zeros((300)))
    vectorized.append(sentvec)
  return vectorized

分为训练集和测试集

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)
print(len(x_train), len(y_train))
print(len(x_test), len(y_test))

调用向量化函数

x_train_vec = vectorize(x_train)
x_test_vec = vectorize(x_test)

然后我得到了这个错误


AttributeError: The vocab attribute was removed from KeyedVector in Gensim 4.0.0.
Use KeyedVector's .key_to_index dict, .index_to_key list, and methods .get_vecattr(key, attr) and .set_vecattr(key, attr, new_val) instead.
See https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4

如何解决这个问题?我正在尝试将词汇更改为index_to_key,但它仍然不起作用。预先感谢

python gensim word2vec
© www.soinside.com 2019 - 2024. All rights reserved.