KeyError.当从文本文件中读取文本时，"单词'限制'不在词汇中"。从文本文件中读取的文本生成词嵌入向量时，出现 "词'限制'不在词汇中"。

Question

我得到了这个错误。"KeyError: word 'restriction' not in vocabulary", 当我读取一个文本文件来生成单词嵌入向量时，而单词 "restriction "在文本文件中。我想知道我读取文本文件（一个简单的段落）的代码是否有误？

我的代码写在下面。

from gensim.models import Word2Vec
# define training data
with open('D:\\test.txt', 'r') as file:
sentences = ""
#read from textfile
for line in file:
    for word in line.split(' '):
        sentences += word + ' '
# train model
model = Word2Vec(sentences, min_count=1)
# summarize the loaded model
print(model)
# summarize vocabulary
words = list(model.wv.vocab)
# save model
model.save('model.bin')
# load model
new_model = Word2Vec.load('model.bin')
print(new_model)
print(str(model['restriction']))

当我在下面的代码中使用预写的句子时，这个错误不会发生。

from gensim.models import Word2Vec
# define training data
sentences = [['this', 'is', 'the', 'first', 'sentence', 'for', 'word2vec'],  
                ['this', 'is', 'the', 'second', 'sentence'],  
                ['yet', 'another', 'sentence'],  
                ['one', 'more', 'sentence', 'with', 'restriction'],
                ['and', 'the', 'final', 'sentence']]
# train model
model = Word2Vec(sentences, min_count=1)
# summarize the loaded model
print(model)
# summarize vocabulary
words = list(model.wv.vocab)
print(words)
# access vector for one word
print(model['sentence'])
# save model
model.save('model.bin')
# load model
new_model = Word2Vec.load('model.bin')
print(new_model)
print('the model prints: ')
print(model['restriction'])

Answer 1

在你的代码中显示了这个问题，检查 sentences 仔细观察，在你构建了它之后，看看它是否是你所期望的格式（或者是任何类似于 sentences 的工作案例）。) 我怀疑它不是。

另外，看看这个令人失望的模型所学的单词列表----。words 变量应该足够了。它也可能不像你期望的那样。

具体来说，你的这段代码......

sentences = ""
for line in file:
    for word in line.split(' '):
        sentences += word + ' '

...使 sentences 一条长长的字符串，有很多空格分隔的单词。如果你这样做，对 sentences 在你的工作代码中，你将不再有一个列表，其中每个项目都是一个代币列表。 (这是对 Word2Vec.) 相反，你会有一个巨大的运行字符串。

sentences = 'this is the first sentence for word2vec this is the second sentence yet another sentence one more sentence with restriction and the final sentence'

试试吧

sentences = []  # empty list
# OOPS, DON'T DO: sentences = ""
for line in file:
    sentences.append(line.split(' '))

...那么你的 sentences 将是一个list-of-list-of-strings（像工作情况），而不是仅仅是一个字符串（像破碎情况）。

KeyError.当从文本文件中读取文本时，"单词'限制'不在词汇中"。从文本文件中读取的文本生成词嵌入向量时，出现 "词'限制'不在词汇中"。

问题描述投票：0回答：1

1个回答

最新问题

KeyError.当从文本文件中读取文本时，"单词'限制'不在词汇中"。从文本文件中读取的文本生成词嵌入向量时，出现 "词'限制'不在词汇中"。

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1