我正在尝试在 spacy 中创建一个匹配器来提取国家/地区名称,包括缩写。例如,Kenya、KE 和 KEN 都应匹配为 Kenya。我构建了一个简单的匹配器,但它没有返回任何内容。
在 Jupyter 笔记本中尝试了以下简单代码
import spacy
import pycountry
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
for country in pycountry.countries:
name = country.name
pattern1 = [{'LOWER': name}]
pattern2 = [{'LOWER': country.alpha_2}]
pattern3 = [{'LOWER': country.alpha_3}]
patterns = [pattern1, pattern2, pattern3]
matcher.add(name, patterns)
doc = nlp(u"Kenya is a beautiful country. It is next to Somalia. KEN is in Africa. China is making investments there. It is near the UAE and SAU")
found_matches = matcher(doc)
print(found_matches)
看来您在使用 Matcher 对象之前没有初始化它。您需要创建一个 Matcher 对象并向其中添加模式。
试试这个:
import spacy
import pycountry
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab) # Initialize the Matcher object
for country in pycountry.countries:
name = country.name
pattern1 = [{'LOWER': name.lower()}]
pattern2 = [{'LOWER': country.alpha_2.lower()}]
pattern3 = [{'LOWER': country.alpha_3.lower()}]
patterns = [pattern1, pattern2, pattern3]
matcher.add(name, patterns)
doc = nlp(u"Kenya is a beautiful country. It is next to Somalia. KEN is in Africa. China is making investments there. It is near the UAE and SAU")
found_matches = matcher(doc)
for match_id, start, end in found_matches:
print(doc[start:end])