使用NLTK和Spacy的NLP命名实体识别

问题描述 投票:1回答:1

我在NLTK和Spacy上的以下句子中使用了NER,下面是结果:

"Zoni I want to find a pencil, a eraser and a sharpener"

我在Google Colab上运行了以下代码。

import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag

ex = "Zoni I want to find a pencil, a eraser and a sharpener"

def preprocess(sent):
    sent = nltk.word_tokenize(sent)
    sent = nltk.pos_tag(sent)
    return sent

sent = preprocess(ex)
sent

#Output:
[('Zoni', 'NNP'),
 ('I', 'PRP'),
 ('want', 'VBP'),
 ('to', 'TO'),
 ('find', 'VB'),
 ('a', 'DT'),
 ('pencil', 'NN'),
 (',', ','),
 ('a', 'DT'),
 ('eraser', 'NN'),
 ('and', 'CC'),
 ('a', 'DT'),
 ('sharpener', 'NN')]

但是当我在同一文本上使用spacy时,它没有返回任何结果

import spacy
from spacy import displacy
from collections import Counter
import en_core_web_sm
nlp = en_core_web_sm.load()

text = "Zoni I want to find a pencil, a eraser and a sharpener"

doc = nlp(text)
doc.ents

#Output:
()

仅适用于某些句子。

import spacy
from spacy import displacy
from collections import Counter
import en_core_web_sm
nlp = en_core_web_sm.load()

# text = "Zoni I want to find a pencil, a eraser and a sharpener"

text = 'European authorities fined Google a record $5.1 billion on Wednesday for abusing its power in the mobile phone market and ordered the company to alter its practices'

doc = nlp(text)
doc.ents

#Output:
(European, Google, $5.1 billion, Wednesday)

请让我知道是否有问题。

python-3.x nlp nltk spacy named-entity-recognition
1个回答
0
投票

Spacy模型是统计的。因此,这些模型可以识别的命名实体取决于训练这些模型的数据集。

根据spacy文档,命名实体是分配了名称的“ 现实世界对象”,例如,人物,国家,产品或书名。

例如,名称Zoni并不常见,因此模型无法将名称识别为命名实体(人)。如果我在您的句子中将名称[[Zoni更改为William spacy,请确认William为人。

import spacy nlp = spacy.load('en_core_web_lg') doc = nlp('William I want to find a pencil, a eraser and a sharpener') for entity in doc.ents: print(entity.label_, ' | ', entity.text) #output PERSON | William
一个人会假设

pencil

erasersharpener是对象,所以它们有可能被归类为产品,因为spacy documentation指出“对象”是产品。但是,句子中的3个对象似乎并非如此。 我还注意到,如果在输入文本中找不到命名实体,则输出将为空。

import spacy nlp = spacy.load("en_core_web_lg") doc = nlp('Zoni I want to find a pencil, a eraser and a sharpener') if not doc.ents: print ('No named entities were recognized in the input text.') else: for entity in doc.ents: print(entity.label_, ' | ', entity.text)

© www.soinside.com 2019 - 2024. All rights reserved.