所以我正在尝试创建一个自定义 NER 模型,并按照以下步骤操作:
我得到了带有文本示例和标签以及开始和结束索引的培训日期。
现在我运行以下代码:
from spacy.tokens import DocBin
from tqdm import tqdm
nlp = spacy.blank("en") # load a new spacy model
doc_bin = DocBin()
from spacy.util import filter_spans
for training_example in tqdm(training_data):
text = training_example['text']
labels = training_example['entities']
doc = nlp.make_doc(text)
ents = []
for start, end, label in labels:
span = doc.char_span(start, end, label=label, alignment_mode="contract")
if span is None:
print("Skipping entity")
else:
ents.append(span)
filtered_ents = filter_spans(ents)
doc.ents = filtered_ents
doc_bin.add(doc)
doc_bin.to_disk("train.spacy")
!python -m spacy init fill-config base_config.cfg config.cfg
!python -m spacy train config.cfg --output ./ --paths.train ./train.spacy --paths.dev ./train.spacy
我应该得到的输出是:
ℹ Using CPU
=========================== Initializing pipeline ===========================
[2022-07-01 18:31:37,021] [INFO] Set up nlp object from config
[2022-07-01 18:31:37,041] [INFO] Pipeline: ['tok2vec', 'ner']
[2022-07-01 18:31:37,047] [INFO] Created vocabulary
[2022-07-01 18:31:40,116] [INFO] Added vectors: en_core_web_lg
[2022-07-01 18:31:43,239] [INFO] Finished initializing nlp object
[2022-07-01 18:31:45,876] [INFO] Initialized pipeline components: ['tok2vec', 'ner']
✔ Initialized pipeline
============================= Training pipeline =============================
ℹ Pipeline: ['tok2vec', 'ner']
ℹ Initial learn rate: 0.001
E # LOSS TOK2VEC LOSS NER ENTS_F ENTS_P ENTS_R SCORE
--- ------ ------------ -------- ------ ------ ------ ------
0 0 0.00 153.29 0.49 0.64 0.39 0.00
7 200 501.32 3113.23 78.43 78.12 78.74 0.78
✔ Saved pipeline to output directory
model-last
但我得到的是
ℹ Saving to output directory: .
ℹ Using CPU
=========================== Initializing pipeline ===========================
[2023-03-14 02:40:38,422] [INFO] Set up nlp object from config
[2023-03-14 02:40:38,441] [INFO] Pipeline: ['tok2vec', 'ner']
[2023-03-14 02:40:38,445] [INFO] Created vocabulary
[2023-03-14 02:42:09,609] [INFO] Added vectors: en_core_web_lg
然后单元格停止执行 jupyter notebook。这里可能是什么情况?我没有收到任何错误消息或任何东西。
`- spaCy 版本: 3.5.1
我对配置文件所做的唯一更改是将批量大小更改为 80,将训练时期更改为 300。
有什么帮助吗?