我已经按照此笔记本(https://nbviewer.jupyter.org/github/amaiya/ktrain/blob/master/tutorials/tutorial-06-sequence-tagging.ipynb)训练了英语模型。我能够保存我的预训练模型并可以毫无问题地运行它。
但是,我需要再次运行它,但是离线并且它不起作用,我理解我需要下载文件并执行与此处相同的操作。
https://github.com/huggingface/transformers/issues/136
但是,我不知道需要在哪里更改ktrain的设置。
我运行这个:
ktrain.load_predictor('Functions/my_english_nermodel')
这是我得到的错误:
Traceback (most recent call last):
File "Z:\Functions\NER.py", line 155, in load_bert
reloaded_predictor= ktrain.load_predictor('Z:/Functions/my_english_nermodel')
File "C:\Program Files\Python37\lib\site-packages\ktrain\core.py", line 1316, in load_predictor
preproc = pickle.load(f)
File "C:\Program Files\Python37\lib\site-packages\ktrain\text\ner\anago\preprocessing.py", line 76, in __setstate__
if self.te_model is not None: self.activate_transformer(self.te_model, layers=self.te_layers)
File "C:\Program Files\Python37\lib\site-packages\ktrain\text\ner\anago\preprocessing.py", line 100, in activate_transformer
self.te = TransformerEmbedding(model_name, layers=layers)
File "C:\Program Files\Python37\lib\site-packages\ktrain\text\preprocessor.py", line 1095, in __init__
self.tokenizer = self.tokenizer_type.from_pretrained(model_name)
File "C:\Program Files\Python37\lib\site-packages\transformers\tokenization_utils.py", line 903, in from_pretrained
return cls._from_pretrained(*inputs, **kwargs)
File "C:\Program Files\Python37\lib\site-packages\transformers\tokenization_utils.py", line 1008, in _from_pretrained
list(cls.vocab_files_names.values()),
OSError: Model name 'bert-base-uncased' was not found in tokenizers model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, bert-base-finnish-cased-v1, bert-base-finnish-uncased-v1, bert-base-dutch-cased). We assumed 'bert-base-dutch-cased' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.
Process finished with exit code 1
我找到了一个解决方案,当ktrain通过互联网连接运行时,它会创建一个文件夹:'''C:\ Users \ lemolina.cache \ torch \ transformers'''我需要在无法访问互联网的机器上复制同一文件夹
更一般而言,基于transformers的预训练模型已下载到<home_directory>/.cache/torch/transformers
。例如,在Linux上,该值为/home/<user_name>/.cache/torch/transformers
。
如以上答案所示,要在没有互联网访问权限的计算机上重新加载ktrain predictor
(对于使用ktrain
库中模型的transformers
模型),您需要复制模型该文件夹中的文件到新计算机上的相同位置。