Spacy 匹配器未找到任何县的匹配项

问题描述 投票:0回答:1

我正在尝试在 spacy 中创建一个匹配器来提取国家/地区名称,包括缩写。例如,Kenya、KE 和 KEN 都应匹配为 Kenya。我构建了一个简单的匹配器,但它没有返回任何内容。

在 Jupyter 笔记本中尝试了以下简单代码

import spacy
import pycountry
from spacy.matcher import Matcher

nlp = spacy.load("en_core_web_sm")

for country in pycountry.countries:
    name = country.name
    pattern1 = [{'LOWER': name}]
    pattern2 = [{'LOWER': country.alpha_2}]
    pattern3 = [{'LOWER': country.alpha_3}]
    patterns = [pattern1, pattern2, pattern3]
    matcher.add(name, patterns)
doc = nlp(u"Kenya is a beautiful country. It is next to Somalia. KEN is in Africa. China is making investments there. It is near the UAE and SAU")
found_matches  = matcher(doc)
print(found_matches)
python jupyter spacy
1个回答
0
投票

看来您在使用 Matcher 对象之前没有初始化它。您需要创建一个 Matcher 对象并向其中添加模式。

试试这个:

import spacy
import pycountry
from spacy.matcher import Matcher

nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)  # Initialize the Matcher object

for country in pycountry.countries:
    name = country.name
    pattern1 = [{'LOWER': name.lower()}]  
    pattern2 = [{'LOWER': country.alpha_2.lower()}]  
    pattern3 = [{'LOWER': country.alpha_3.lower()}]  
    patterns = [pattern1, pattern2, pattern3]
    matcher.add(name, patterns)

doc = nlp(u"Kenya is a beautiful country. It is next to Somalia. KEN is in Africa. China is making investments there. It is near the UAE and SAU")
found_matches = matcher(doc)
for match_id, start, end in found_matches:
    print(doc[start:end])
© www.soinside.com 2019 - 2024. All rights reserved.