训练自定义 Spacy NER 模型但无法进入训练循环

问题描述 投票:0回答:0

所以我正在尝试创建一个自定义 NER 模型,并按照以下步骤操作:

我得到了带有文本示例和标签以及开始和结束索引的培训日期。

现在我运行以下代码:

from spacy.tokens import DocBin
from tqdm import tqdm

nlp = spacy.blank("en") # load a new spacy model
doc_bin = DocBin()

from spacy.util import filter_spans

for training_example  in tqdm(training_data): 
    text = training_example['text']
    labels = training_example['entities']
    doc = nlp.make_doc(text) 
    ents = []
    for start, end, label in labels:
        span = doc.char_span(start, end, label=label, alignment_mode="contract")
        if span is None:
            print("Skipping entity")
        else:
            ents.append(span)
    filtered_ents = filter_spans(ents)
    doc.ents = filtered_ents 
    doc_bin.add(doc)

doc_bin.to_disk("train.spacy") 

!python -m spacy init fill-config base_config.cfg config.cfg

!python -m spacy train config.cfg --output ./ --paths.train ./train.spacy --paths.dev ./train.spacy 

我应该得到的输出是:

ℹ Using CPU

=========================== Initializing pipeline ===========================
[2022-07-01 18:31:37,021] [INFO] Set up nlp object from config
[2022-07-01 18:31:37,041] [INFO] Pipeline: ['tok2vec', 'ner']
[2022-07-01 18:31:37,047] [INFO] Created vocabulary
[2022-07-01 18:31:40,116] [INFO] Added vectors: en_core_web_lg
[2022-07-01 18:31:43,239] [INFO] Finished initializing nlp object
[2022-07-01 18:31:45,876] [INFO] Initialized pipeline components: ['tok2vec', 'ner']
✔ Initialized pipeline

============================= Training pipeline =============================
ℹ Pipeline: ['tok2vec', 'ner']
ℹ Initial learn rate: 0.001
E    #       LOSS TOK2VEC  LOSS NER  ENTS_F  ENTS_P  ENTS_R  SCORE 
---  ------  ------------  --------  ------  ------  ------  ------
  0       0          0.00    153.29    0.49    0.64    0.39    0.00
  7     200        501.32   3113.23   78.43   78.12   78.74    0.78
✔ Saved pipeline to output directory
model-last

但我得到的是

ℹ Saving to output directory: .
ℹ Using CPU

=========================== Initializing pipeline ===========================
[2023-03-14 02:40:38,422] [INFO] Set up nlp object from config
[2023-03-14 02:40:38,441] [INFO] Pipeline: ['tok2vec', 'ner']
[2023-03-14 02:40:38,445] [INFO] Created vocabulary
[2023-03-14 02:42:09,609] [INFO] Added vectors: en_core_web_lg

然后单元格停止执行 jupyter notebook。这里可能是什么情况?我没有收到任何错误消息或任何东西。

您的环境

`- spaCy 版本: 3.5.1

  • 平台: Linux-4.14.304-226.531.amzn2.x86_64-x86_64-with-glibc2.31
  • Python版本: 3.10.6
  • 管道: en_core_web_sm (3.5.0), en_core_web_lg (3.5.0)`

我对配置文件所做的唯一更改是将批量大小更改为 80,将训练时期更改为 300。

有什么帮助吗?

python spacy named-entity-recognition
© www.soinside.com 2019 - 2024. All rights reserved.