如何获取每个 Spacy NER 实体的描述?

问题描述 投票:0回答:3

我正在使用 Spacy NER 模型从文本中提取一些与我的问题相关的命名实体,例如日期、时间、GPE 等。

例如,我需要识别以下句子中的时区:

"Australian Central Time"

使用Spacy模型

en_core_web_lg
,我得到了以下结果:

doc = nlp("Australian Central Time")
print([(ent.label_, ent.text) for ent in doc.ents])
    
>> [('NORP', 'Australian')]

我的问题是:我不清楚实体

NORP
的确切含义以及更一般的每个Spacy NER实体的确切含义(当然不考虑直观值)。

我找到了以下代码片段来获取完整的实体列表,但之后我被阻止了:

import spacy
nlp = spacy.load("en_core_web_lg")
nlp.get_pipe("ner").labels

我对使用 Spacy NLP 还很陌生,在官方文档中没有找到我想要的内容,因此我们将不胜感激!

顺便说一句,我正在使用 Spacy 版本

3.2.1

spacy named-entity-recognition spacy-3
3个回答
7
投票

大多数标签都有定义,您可以使用

spacy.explain(label)
访问。

对于

NORP
:“民族或宗教或政治团体”

有关更多详细信息,您需要查看https://spacy.io/models/下的模型文档中列出的资源的注释指南。


6
投票

完整列表如下。截至2023年2月,英文模型有18个标签。

PERSON:      People, including fictional.
NORP:        Nationalities or religious or political groups.
FAC:         Buildings, airports, highways, bridges, etc.
ORG:         Companies, agencies, institutions, etc.
GPE:         Countries, cities, states.
LOC:         Non-GPE locations, mountain ranges, bodies of water.
PRODUCT:     Objects, vehicles, foods, etc. (Not services.)
EVENT:       Named hurricanes, battles, wars, sports events, etc.
WORK_OF_ART: Titles of books, songs, etc.
LAW:         Named documents made into laws.
LANGUAGE:    Any named language.
DATE:        Absolute or relative dates or periods.
TIME:        Times smaller than a day.
PERCENT:     Percentage, including ”%“.
MONEY:       Monetary values, including unit.
QUANTITY:    Measurements, as of weight or distance.
ORDINAL:     “first”, “second”, etc.
CARDINAL:    Numerals that do not fall under another type.

来源:Medium 上的 Mikael Davidsson。


0
投票
这将给出每个标签和描述:

nlp = spacy.load("en_core_web_trf", disable=["tagger", "parser", "attribute_ruler", "lemmatizer"]) for label in nlp.get_pipe('ner').labels: print(f"{label}: {spacy.explain(label)}")
返回:

CARDINAL: Numerals that do not fall under another type DATE: Absolute or relative dates or periods EVENT: Named hurricanes, battles, wars, sports events, etc. FAC: Buildings, airports, highways, bridges, etc. GPE: Countries, cities, states LANGUAGE: Any named language LAW: Named documents made into laws. LOC: Non-GPE locations, mountain ranges, bodies of water MONEY: Monetary values, including unit NORP: Nationalities or religious or political groups ORDINAL: "first", "second", etc. ORG: Companies, agencies, institutions, etc. PERCENT: Percentage, including "%" PERSON: People, including fictional PRODUCT: Objects, vehicles, foods, etc. (not services) QUANTITY: Measurements, as of weight or distance TIME: Times smaller than a day WORK_OF_ART: Titles of books, songs, etc.
    
© www.soinside.com 2019 - 2024. All rights reserved.