KeyError.当从文本文件中读取文本时,"单词'限制'不在词汇中"。从文本文件中读取的文本生成词嵌入向量时,出现 "词'限制'不在词汇中"。

问题描述 投票:0回答:1

我得到了这个错误。"KeyError: word 'restriction' not in vocabulary", 当我读取一个文本文件来生成单词嵌入向量时,而单词 "restriction "在文本文件中。我想知道我读取文本文件(一个简单的段落)的代码是否有误?

我的代码写在下面。

from gensim.models import Word2Vec
# define training data
with open('D:\\test.txt', 'r') as file:
sentences = ""
#read from textfile
for line in file:
    for word in line.split(' '):
        sentences += word + ' '
# train model
model = Word2Vec(sentences, min_count=1)
# summarize the loaded model
print(model)
# summarize vocabulary
words = list(model.wv.vocab)
# save model
model.save('model.bin')
# load model
new_model = Word2Vec.load('model.bin')
print(new_model)
print(str(model['restriction']))

当我在下面的代码中使用预写的句子时,这个错误不会发生。

from gensim.models import Word2Vec
# define training data
sentences = [['this', 'is', 'the', 'first', 'sentence', 'for', 'word2vec'],  
                ['this', 'is', 'the', 'second', 'sentence'],  
                ['yet', 'another', 'sentence'],  
                ['one', 'more', 'sentence', 'with', 'restriction'],
                ['and', 'the', 'final', 'sentence']]
# train model
model = Word2Vec(sentences, min_count=1)
# summarize the loaded model
print(model)
# summarize vocabulary
words = list(model.wv.vocab)
print(words)
# access vector for one word
print(model['sentence'])
# save model
model.save('model.bin')
# load model
new_model = Word2Vec.load('model.bin')
print(new_model)
print('the model prints: ')
print(model['restriction'])
python deep-learning text-files word2vec
1个回答
0
投票

在你的代码中显示了这个问题,检查 sentences 仔细观察,在你构建了它之后,看看它是否是你所期望的格式(或者是任何类似于 sentences 的工作案例)。) 我怀疑它不是。

另外,看看这个令人失望的模型所学的单词列表----。words 变量应该足够了。它也可能不像你期望的那样。

具体来说,你的这段代码......

sentences = ""
for line in file:
    for word in line.split(' '):
        sentences += word + ' '

...使 sentences 一条长长的字符串,有很多空格分隔的单词。如果你这样做,对 sentences 在你的工作代码中,你将不再有一个列表,其中每个项目都是一个代币列表。 (这是对 Word2Vec.) 相反,你会有一个巨大的运行字符串。

sentences = 'this is the first sentence for word2vec this is the second sentence yet another sentence one more sentence with restriction and the final sentence'

试试吧

sentences = []  # empty list
# OOPS, DON'T DO: sentences = ""
for line in file:
    sentences.append(line.split(' '))

...那么你的 sentences 将是一个list-of-list-of-strings(像工作情况),而不是仅仅是一个字符串(像破碎情况)。

© www.soinside.com 2019 - 2024. All rights reserved.