我在通过 SpaCy 依赖项匹配器传递数据框列时遇到困难。我尝试修改在之前的问题“使用 Pandas 数据帧进行 Spacy 依赖解析”中找到的解决方案,但没有成功。
import pandas as pd
import spacy
from spacy import displacy
from spacy.matcher import DependencyMatcher
from spacy.symbols import nsubj, VERB, dobj, NOUN
nlp = spacy.load("en_core_web_lg")
text = 'REPAIRED CONNECTOR ON J3 SMS. REPLACED THE PRIMARY COMPUTER.'.lower()
dep_matcher = DependencyMatcher(vocab = nlp.vocab)
dep_pattern = [
{
"RIGHT_ID": "action",
"RIGHT_ATTRS": {'LEMMA' : {"IN": ["reseat", "cycle", 'replace' , 'repair', 'reinstall' , 'clean', ' treat', 'splice', 'swap', 'read', 'inspect','installed' ]}}
},
{
"LEFT_ID": "action",
"REL_OP": ">",
"RIGHT_ID": "component",
"RIGHT_ATTRS": {"DEP":{"IN": [ 'dobj']}},
}]
dep_matcher.add('maint_action' , patterns = [dep_pattern])
dep_matches = dep_matcher(doc)
for match in dep_matches:
dep_pattern = match[0]
matches = match[1]
verb , subject = matches[0], matches[1]
print (nlp.vocab[dep_pattern].text, '\t' ,doc[verb] , doc[subject])
>>>maint_action repaired connector
>>>maint_action replaced computer
传递一个字符串,上面的效果完美。但是当尝试传递 DF 时,新列返回空白。
这是 DF 的函数:
import pandas as pd
import spacy
from spacy import displacy
from spacy.matcher import DependencyMatcher
from spacy.symbols import nsubj, VERB, dobj, NOUN
nlp = spacy.load("en_core_web_lg")
data = {'new': ['repaired computer and replaced connector.', 'spliced wire on connector.', 'cycled power and reseated connectors and replaced computer on transmitter.']}
df = pd.DataFrame(data)
dep_matcher = DependencyMatcher(vocab = nlp.vocab)
dep_pattern = [
{
"RIGHT_ID": "action",
"RIGHT_ATTRS": {'LEMMA' : {"IN": ["reseat", "cycle", 'replace' , 'repair', 'reinstall' , 'clean', ' treat', 'splice', 'swap', 'read', 'inspect','installed' ]}}
},
{
"LEFT_ID": "action",
"REL_OP": ">",
"RIGHT_ID": "component",
"RIGHT_ATTRS": {"DEP":{"IN": [ 'dobj']}},
}]
dep_matcher.add('maint_action' , patterns = [dep_pattern])
dep_matches = dep_matcher(doc)
def find_matches(text):
doc = nlp(text)
rule3_pairs = []
for match in dep_matches:
dep_pattern = match[0]
matches = match[1]
verb , subject = matches[0], matches[1]
A = (nlp.vocab[dep_pattern].text, '\t' ,doc[verb] , doc[subject])
rule3_pairs.append(A)
return rule3_pairs
df['three_tuples'] = df['new'].apply(find_matches)
我试图让符合模式的每一行输出相应的名词和动词组合。如:
|three_tuples|
|maint_action repaired computer replaced connector|
|maint_action spliced wire|
|maint_action cycled power reseated connectors replaced computer|
我已经完全按照原样执行了您的代码(第二个示例),并且它已经提供了您想要的结果(下图)。
您在第一个代码示例中遇到了一个小问题,但您没有这样做:
文档 = nlp(文本)
但我不认为这就是导致问题的原因,如果您使用的是 jupyter,也许可以尝试重新启动内核。
def find_matches(text):
doc = nlp(text)
dep_matches = dep_matcher(doc)
rule3_pairs = []
for match in dep_matches:
dep_pattern = match[0]
matches = match[1]
verb , subject = matches[0], matches[1]
A = (nlp.vocab[dep_pattern].text, doc[verb] , doc[subject])
rule3_pairs.append(A)
return rule3_pairs