CV分析器名称匹配

Question

我正在将NLP与python配合使用，以从字符串中查找名称。如果我有全名（名字和姓氏），我可以找到，但是在字符串中我只有名字意味着我的代码无法识别为Person。下面是我的代码。

import re
import nltk
from nltk.corpus import stopwords
stop = stopwords.words('english')

string = """
Sriram is working as a python developer 
"""


def ie_preprocess(document):
    document = ' '.join([i for i in document.split() if i not in stop])
    sentences = nltk.sent_tokenize(document)
    sentences = [nltk.word_tokenize(sent) for sent in sentences]
    sentences = [nltk.pos_tag(sent) for sent in sentences]
    return sentences

def extract_names(document):
    names = []
    sentences = ie_preprocess(document)
    #print(sentences)
    for tagged_sentence in sentences:
        for chunk in nltk.ne_chunk(tagged_sentence):
            #print("Out Side ",chunk)
            if type(chunk) == nltk.tree.Tree:

                if chunk.label() == 'PERSON':
                    print("In Side ",chunk)
                    names.append(' '.join([c[0] for c in chunk]))
    return names

if __name__ == '__main__':
    names = extract_names(string)
    print(names)

Answer 1

我的建议是使用StanfordNLP / Spacy NER，使用nltk ne块有点麻烦。研究人员更常使用StanfordNLP，但Spacy更易于使用。这是一个使用Spacy打印每个命名实体的名称及其类型的示例：

>>> import spacy
>>> nlp = spacy.load('en_core_web_sm')
>>> text = 'Sriram is working as a python developer'
>>> doc = nlp(text)
>>> for ent in doc.ents:
    print(ent.text,ent.label_)


Sriram ORG
>>>

[注意，它将Sriram归类为组织，这可能是因为它不是通用的英文名称，并且Spacy接受了英语corpa的培训。祝你好运！

CV分析器名称匹配

问题描述投票：0回答：1

1个回答

最新问题

CV分析器名称匹配

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1