CV分析器名称匹配

问题描述 投票:0回答:1

我正在将NLP与python配合使用,以从字符串中查找名称。如果我有全名(名字和姓氏),我可以找到,但是在字符串中我只有名字意味着我的代码无法识别为Person。下面是我的代码。

import re
import nltk
from nltk.corpus import stopwords
stop = stopwords.words('english')

string = """
Sriram is working as a python developer 
"""


def ie_preprocess(document):
    document = ' '.join([i for i in document.split() if i not in stop])
    sentences = nltk.sent_tokenize(document)
    sentences = [nltk.word_tokenize(sent) for sent in sentences]
    sentences = [nltk.pos_tag(sent) for sent in sentences]
    return sentences

def extract_names(document):
    names = []
    sentences = ie_preprocess(document)
    #print(sentences)
    for tagged_sentence in sentences:
        for chunk in nltk.ne_chunk(tagged_sentence):
            #print("Out Side ",chunk)
            if type(chunk) == nltk.tree.Tree:

                if chunk.label() == 'PERSON':
                    print("In Side ",chunk)
                    names.append(' '.join([c[0] for c in chunk]))
    return names

if __name__ == '__main__':
    names = extract_names(string)
    print(names) 
python machine-learning nlp data-science named-entity-recognition
1个回答
0
投票

我的建议是使用StanfordNLP / Spacy NER,使用nltk ne块有点麻烦。研究人员更常使用StanfordNLP,但Spacy更易于使用。这是一个使用Spacy打印每个命名实体的名称及其类型的示例:

>>> import spacy
>>> nlp = spacy.load('en_core_web_sm')
>>> text = 'Sriram is working as a python developer'
>>> doc = nlp(text)
>>> for ent in doc.ents:
    print(ent.text,ent.label_)


Sriram ORG
>>> 

[注意,它将Sriram归类为组织,这可能是因为它不是通用的英文名称,并且Spacy接受了英语corpa的培训。祝你好运!

© www.soinside.com 2019 - 2024. All rights reserved.