在 Spacy 中,当我们请求名词时,还会显示语法冠词(例如:“the”、“one”、“a”)
import spacy
nlp_en = spacy.load('en_core_web_sm') # v3.7.1
doc = nlp_en('The man has cars, houses and one dog')
nouns = [chunk.text for chunk in doc.noun_chunks]
print(nouns) # ['The man', 'cars', 'houses', 'one dog']
有办法获得
['man', 'cars', 'houses', 'dog']
吗?
它应该适用于每种语言,因此仅仅删除“点菜”一词并不是解决方案。
您还可以尝试在名词块上进行迭代,并提取每个块的根文本。我可以立即向您展示如何实现这一目标
def get_root_text(chunk):
return chunk.root.text
nouns = [get_root_text(chunk) for chunk in doc.noun_chunks]
print(nouns)