我正在尝试使用
Gensim 4.0实现
word2vec
,但它返回错误;
AttributeError: The vocab attribute was removed from KeyedVector in Gensim 4.0.0.
Use KeyedVector's .key_to_index dict, .index_to_key list, and methods .get_vecattr(key, attr) and .set_vecattr(key, attr, new_val) instead.
See https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4
import gensim
from gensim.models import Word2Vec, KeyedVectors
# Loading the pre-trained Word2Vec model
word2vec_path = '/content/drive/MyDrive/GoogleNews-vectors-negative300.bin'
word2vec = KeyedVectors.load_word2vec_format(word2vec_path, binary = True)
def vectorize(text):
vectorized=[]
for sentence in text:
sentvec= []
#iterate over the word
for w in sentence:
if w in word2vec.vocab:
sentvec.append(word2vec[w])
else:
sentvec.append(np.zeros((300)))
vectorized.append(sentvec)
return vectorized
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)
print(len(x_train), len(y_train))
print(len(x_test), len(y_test))
x_train_vec = vectorize(x_train)
x_test_vec = vectorize(x_test)
我尝试将
vocab
更改为 index_to_key
但仍然不起作用。
我该如何解决这个问题?
您是否尝试过错误消息中提到的第一件事,使用
.key_to_index
而不是 .vocab
?
如果你尝试这样做但失败了,它是如何失败的? (如果出现的话,您可以编辑您的问题以添加任何新的详细错误消息。)
(有一个较长的指南,其中包含有关将旧代码更新为 Gensim 4 更改的提示 - 通常只是访问器名称的一些更改 - 位于 https://github.com/RaRe-Technologies/gensim/wiki/Migration-from-Gensim- 3.x-to-4 但我很有可能尝试错误消息中的第一件事来修复您遇到的特定错误。如果使用较旧的代码示例,您可能需要使用其他提示来修复其他问题.)