我正在尝试在 spaCy 中编写一个与“黑色”匹配但不与“黑豆”匹配的模式。
我尝试了下面的代码,但它似乎与“black”旁边的标记匹配,只要它不是“bean”。如何修改以仅匹配“黑色”?
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
#pattern = [{"LOWER": "black"}, {"LEMMA": {"NOT_IN": ["bean", "beans"]}}]
pattern = [{"LOWER": "black"}, {"LEMMA": "bean", "OP": "!"}]
matcher.add("blackbeans", [pattern])
doc = nlp("I liked the black beans, but the avocado was black making the whole meal blackish-looking and not good.")
matches = matcher(doc)
for match_id, start, end in matches:
string_id = nlp.vocab.strings[match_id] # Get string representation
span = doc[start:end] # The matched span
print(match_id, string_id, start, end, span.text)
没有办法做到这一点 - 匹配器返回输入模式描述的每个标记。否定模式也不匹配非标记,因此如果“black”是句子中的最后一个标记,您的模式将会失败。
有几种方法可以解决这个问题:
pattern = [{"LOWER": "black"}, {"LOWER": {"NOT_IN": ["bean", "beans"]}, "OP" : "?"}]