我正在尝试训练一个 spacy textcat_multilabel 模型。我以为我已经正确设置了所有内容,但我仍然收到验证错误。
这是我的配置的标签部分:
[components.textcat_multilabel]
factory = "textcat_multilabel"
scorer = {"@scorers": "spacy.textcat_multilabel_scorer.v2"}
threshold = 0.5
labels = ["Operational (Frontline)", "Certified/Technical", "Administrative (General)", "Corporate (HR/Finance/Procurement)", "Digital (Applications)", "Digital (ICT)", "Communication and Engagement", "Environmental/Scientific", "Leadership/Management/Coaching/Mentoring", "Policy/Legislation/Regulatory", "Cultural Capability", "Project Management", "Workplace Health and Safety", "Analytical (Data/GIS/Modelling)", "Other"]
这个命令
python -m spacy train .\config.cfg --output ..\output --paths.tain .\train.spacy --paths.dev .\dev.spacy
抛出此错误
=========================== Initializing pipeline ===========================
✘ Config validation error
textcat_multilabel -> labels extra fields not permitted
{'nlp': <spacy.lang.en.English object at 0x00000210C3758B10>, 'name': 'textcat_multilabel', 'labels': ['Operational (Frontline)', 'Certified/Technical', 'Administrative (General)',
'Corporate (HR/Finance/Procurement)', 'Digital (Applications)', 'Digital (ICT)', 'Communication and Engagement', 'Environmental/Scientific', 'Leadership/Management/Coaching/Mentor
ing', 'Policy/Legislation/Regulatory', 'Cultural Capability', 'Project Management', 'Workplace Health and Safety', 'Analytical (Data/GIS/Modelling)', 'Other'], 'model': {'@architec
tures': 'spacy.TextCatBOW.v2', 'exclusive_classes': False, 'ngram_size': 1, 'no_output_layer': False, 'nO': None}, 'scorer': {'@scorers': 'spacy.textcat_multilabel_scorer.v2'}, 'threshold': 0.5, '@factories': 'textcat_multilabel'}
我这一切都错了吗?还有另一种方法可以在配置中指定标签吗?
谢谢;
我自己修好了。我只是使用该工具制作了一个
base_config.cfg
,这对它很有帮助。