自定义名称实体识别

Question

我有以下一句话：

text="The weather is extremely severe in England"

我想执行自定义

Name Entity Recognition (NER)

程序

首先，正常的

NER

程序将输出带有

England

标签的

GPE

pip install spacy

!python -m spacy download en_core_web_lg

import spacy
nlp = spacy.load('en_core_web_lg')

doc = nlp(text)

for ent in doc.ents:
    print(ent.text+' - '+ent.label_+' - '+str(spacy.explain(ent.label_)))

Result: England - GPE - Countries, cities, states

但是，我希望整个句子都带有标签

High-Severity

。

所以我正在执行以下程序：

from spacy.strings import StringStore

new_hash = StringStore([u'High_Severity']) # <-- match id
nlp.vocab.strings.add('High_Severity')

from spacy.tokens import Span

# Get the hash value of the ORG entity label
High_Severity = doc.vocab.strings[u'High_Severity']  

# Create a Span for the new entity
new_ent = Span(doc, 0, 7, label=High_Severity)

# Add the entity to the existing Doc object
doc.ents = list(doc.ents) + [new_ent]

我犯了以下错误：

ValueError: [E1010] Unable to set entity information for token 6 which is included in more than one span in entities, blocked, missing or outside.

根据我的理解，发生这种情况是因为

NER

已经将

England

识别为

GRE

并且无法在现有标签上添加标签。

我尝试执行自定义

NER

代码（即，无需先运行正常的

NER

代码），但这并没有解决我的问题。

关于如何解决这个问题有什么想法吗？

Answer 1

事实上，看起来 NER 不允许重叠，这就是你的问题，代码的第二部分尝试创建一个包含另一个 ner 的 ner，因此它失败了。参见：

https://github.com/explosion/spaCy/discussions/10885

因此 spacy 具有跨度分类。

我还没有找到表征预定义跨度的方法（不是来自经过训练的模型）

Answer 2

为什么在字符串存储中需要新的哈希值？由于下划线？

自定义名称实体识别

问题描述投票：0回答：2

2个回答

最新问题

自定义名称实体识别

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2