获取找到的命名实体的开始和结束位置

问题描述 投票:0回答:1

我对ML和Spacy还是陌生的。我正在尝试从输入文本中显示命名实体

这是我的方法:

def run():

    nlp = spacy.load('en_core_web_sm')
    sentence = "Hi my name is Oliver!"
    doc = nlp(sentence)

    #Threshold for the confidence socres.
    threshold = 0.2
    beams = nlp.entity.beam_parse(
        [doc], beam_width=16, beam_density=0.0001)

    entity_scores = defaultdict(float)
    for beam in beams:
        for score, ents in nlp.entity.moves.get_beam_parses(beam):
            for start, end, label in ents:
                entity_scores[(start, end, label)] += score

    #Create a dict to store output.
    ners = defaultdict(list)
    ners['text'] = str(sentence)

    for key in entity_scores:
        start, end, label = key
        score = entity_scores[key]
        if (score > threshold):
            ners['extractions'].append({
                "label": str(label),
                "text": str(doc[start:end]),
                "confidence": round(score, 2)
            })

    pprint(ners)

上述方法可以正常工作,并且将打印类似:

'extractions': [{'confidence': 1.0,
                'label': 'PERSON',
                'text': 'Oliver'}],
'text': 'Hi my name is Oliver'})

到目前为止很好。现在,我试图获取找到的命名实体的实际位置。在这种情况下,“ Oliver”。

查看documentation,有:ent.start_char, ent.end_char可用,但是如果我使用它:

"start_position": doc.start_char,
"end_position": doc.end_char

我收到以下错误:

[AttributeError:'spacy.tokens.doc.Doc'对象没有属性'start_char']

有人可以引导我朝正确的方向吗?

python-3.x nlp spacy named-entity-recognition
1个回答
0
投票

所以我实际上在发布此问题(典型)后立即找到了答案。

[我发现我不需要将信息保存到entity_scores中,而只需遍历实际找到的实体ent

我最终改为添加for ent in doc.ents:,这使我可以访问所有标准Spacy attributes。见下文:

ners = defaultdict(list)
ners['text'] = str(sentence)
for beam in beams:
    for score, ents in nlp.entity.moves.get_beam_parses(beam):
        for ent in doc.ents:
            if (score > threshold):
                ners['extractions'].append({
                    "label": str(ent.label_),
                    "text": str(ent.text),
                    "confidence": round(score, 2),
                    "start_position": ent.start_char,
                    "end_position": ent.end_char

我的整个方法最终看起来像这样:

def run():
    nlp = spacy.load('en_core_web_sm')
    sentence = "Hi my name is Oliver!"
    doc = nlp(sentence)

    threshold = 0.2
    beams = nlp.entity.beam_parse(
        [doc], beam_width=16, beam_density=0.0001)

    ners = defaultdict(list)
    ners['text'] = str(sentence)
    for beam in beams:
        for score, ents in nlp.entity.moves.get_beam_parses(beam):
            for ent in doc.ents:
                if (score > threshold):
                    ners['extractions'].append({
                        "label": str(ent.label_),
                        "text": str(ent.text),
                        "confidence": round(score, 2),
                        "start_position": ent.start_char,
                        "end_position": ent.end_char
                    })
© www.soinside.com 2019 - 2024. All rights reserved.