我正在尝试实现 word2vec,但它给了我一个错误
import gensim
from gensim.models import Word2Vec, KeyedVectors
# Loading the pre-trained Word2Vec model
word2vec_path = '/content/drive/MyDrive/GoogleNews-vectors-negative300.bin'
word2vec = KeyedVectors.load_word2vec_format(word2vec_path, binary = True)
def vectorize(text):
vectorized=[]
for sentence in text:
sentvec= []
#iterate over the word
for w in sentence:
if w in word2vec.vocab:
sentvec.append(word2vec[w])
else:
sentvec.append(np.zeros((300)))
vectorized.append(sentvec)
return vectorized
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)
print(len(x_train), len(y_train))
print(len(x_test), len(y_test))
x_train_vec = vectorize(x_train)
x_test_vec = vectorize(x_test)
然后我得到了这个错误
AttributeError: The vocab attribute was removed from KeyedVector in Gensim 4.0.0.
Use KeyedVector's .key_to_index dict, .index_to_key list, and methods .get_vecattr(key, attr) and .set_vecattr(key, attr, new_val) instead.
See https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4
如何解决这个问题?我正在尝试将词汇更改为index_to_key,但它仍然不起作用。预先感谢