我正在使用 Spacy NER 模型从文本中提取一些与我的问题相关的命名实体,例如日期、时间、GPE 等。
例如,我需要识别以下句子中的时区:
"Australian Central Time"
使用Spacy模型
en_core_web_lg
,我得到了以下结果:
doc = nlp("Australian Central Time")
print([(ent.label_, ent.text) for ent in doc.ents])
>> [('NORP', 'Australian')]
我的问题是:我不清楚实体
NORP
的确切含义以及更一般的每个Spacy NER实体的确切含义(当然不考虑直观值)。
我找到了以下代码片段来获取完整的实体列表,但之后我被阻止了:
import spacy
nlp = spacy.load("en_core_web_lg")
nlp.get_pipe("ner").labels
我对使用 Spacy NLP 还很陌生,在官方文档中没有找到我想要的内容,因此我们将不胜感激!
顺便说一句,我正在使用 Spacy 版本
3.2.1
。
大多数标签都有定义,您可以使用
spacy.explain(label)
访问。
对于
NORP
:“民族或宗教或政治团体”
有关更多详细信息,您需要查看https://spacy.io/models/下的模型文档中列出的资源的注释指南。
完整列表如下。截至2023年2月,英文模型有18个标签。
PERSON: People, including fictional.
NORP: Nationalities or religious or political groups.
FAC: Buildings, airports, highways, bridges, etc.
ORG: Companies, agencies, institutions, etc.
GPE: Countries, cities, states.
LOC: Non-GPE locations, mountain ranges, bodies of water.
PRODUCT: Objects, vehicles, foods, etc. (Not services.)
EVENT: Named hurricanes, battles, wars, sports events, etc.
WORK_OF_ART: Titles of books, songs, etc.
LAW: Named documents made into laws.
LANGUAGE: Any named language.
DATE: Absolute or relative dates or periods.
TIME: Times smaller than a day.
PERCENT: Percentage, including ”%“.
MONEY: Monetary values, including unit.
QUANTITY: Measurements, as of weight or distance.
ORDINAL: “first”, “second”, etc.
CARDINAL: Numerals that do not fall under another type.
nlp = spacy.load("en_core_web_trf", disable=["tagger", "parser", "attribute_ruler", "lemmatizer"])
for label in nlp.get_pipe('ner').labels:
print(f"{label}: {spacy.explain(label)}")
返回:
CARDINAL: Numerals that do not fall under another type
DATE: Absolute or relative dates or periods
EVENT: Named hurricanes, battles, wars, sports events, etc.
FAC: Buildings, airports, highways, bridges, etc.
GPE: Countries, cities, states
LANGUAGE: Any named language
LAW: Named documents made into laws.
LOC: Non-GPE locations, mountain ranges, bodies of water
MONEY: Monetary values, including unit
NORP: Nationalities or religious or political groups
ORDINAL: "first", "second", etc.
ORG: Companies, agencies, institutions, etc.
PERCENT: Percentage, including "%"
PERSON: People, including fictional
PRODUCT: Objects, vehicles, foods, etc. (not services)
QUANTITY: Measurements, as of weight or distance
TIME: Times smaller than a day
WORK_OF_ART: Titles of books, songs, etc.