SpaCy 依赖项匹配器使用 Pandas Dataframe 进行解析

问题描述 投票:0回答:1

我在通过 SpaCy 依赖项匹配器传递数据框列时遇到困难。我尝试修改在之前的问题“使用 Pandas 数据帧进行 Spacy 依赖解析”中找到的解决方案,但没有成功。

import pandas as pd
import spacy
from spacy import displacy
from spacy.matcher import DependencyMatcher
from spacy.symbols import nsubj, VERB, dobj, NOUN

nlp = spacy.load("en_core_web_lg")
text = 'REPAIRED CONNECTOR ON J3 SMS. REPLACED THE PRIMARY COMPUTER.'.lower()
dep_matcher  = DependencyMatcher(vocab = nlp.vocab)
dep_pattern = [
    {
        "RIGHT_ID": "action",
        "RIGHT_ATTRS": {'LEMMA' : {"IN": ["reseat", "cycle", 'replace' , 'repair', 'reinstall' , 'clean', ' treat', 'splice', 'swap', 'read', 'inspect','installed' ]}}
    },

    {
        "LEFT_ID": "action",
        "REL_OP": ">",
        "RIGHT_ID": "component",
        "RIGHT_ATTRS": {"DEP":{"IN": [ 'dobj']}},     
    }]

dep_matcher.add('maint_action' , patterns = [dep_pattern])
dep_matches = dep_matcher(doc)

for match in dep_matches:
    dep_pattern = match[0]
    matches = match[1]
    verb , subject = matches[0], matches[1] 
    print (nlp.vocab[dep_pattern].text, '\t' ,doc[verb] , doc[subject])
>>>maint_action   repaired connector
>>>maint_action   replaced computer 

传递一个字符串,上面的效果完美。但是当尝试传递 DF 时,新列返回空白。

这是 DF 的函数:

import pandas as pd
    import spacy
    from spacy import displacy
    from spacy.matcher import DependencyMatcher
    from spacy.symbols import nsubj, VERB, dobj, NOUN

nlp = spacy.load("en_core_web_lg")
data = {'new':  ['repaired computer and replaced connector.', 'spliced wire on connector.', 'cycled power and reseated connectors and replaced computer on transmitter.']}

df = pd.DataFrame(data)    

dep_matcher  = DependencyMatcher(vocab = nlp.vocab)
    dep_pattern = [
        {
            "RIGHT_ID": "action",
            "RIGHT_ATTRS": {'LEMMA' : {"IN": ["reseat", "cycle", 'replace' , 'repair', 'reinstall' , 'clean', ' treat', 'splice', 'swap', 'read', 'inspect','installed' ]}}
        },
    
        {
            "LEFT_ID": "action",
            "REL_OP": ">",
            "RIGHT_ID": "component",
            "RIGHT_ATTRS": {"DEP":{"IN": [ 'dobj']}},     
        }]
    
    dep_matcher.add('maint_action' , patterns = [dep_pattern])
    dep_matches = dep_matcher(doc)
def find_matches(text):
        doc = nlp(text)
        rule3_pairs = []
        for match in dep_matches:
            dep_pattern = match[0]
            matches = match[1]
            verb , subject = matches[0], matches[1] 
            A = (nlp.vocab[dep_pattern].text, '\t' ,doc[verb] , doc[subject])
            rule3_pairs.append(A)
            return rule3_pairs
      
df['three_tuples'] = df['new'].apply(find_matches) 

我试图让符合模式的每一行输出相应的名词和动词组合。如:

|three_tuples|
|maint_action    repaired computer  replaced connector|
|maint_action    spliced wire|
|maint_action    cycled power  reseated connectors  replaced computer|
python dataframe dependencies spacy matcher
1个回答
0
投票

我已经完全按照原样执行了您的代码(第二个示例),并且它已经提供了您想要的结果(下图)。
您在第一个代码示例中遇到了一个小问题,但您没有这样做:
文档 = nlp(文本)
但我不认为这就是导致问题的原因,如果您使用的是 jupyter,也许可以尝试重新启动内核。

更新

编辑后,我注意到您有很多缩进错误,请修复这些错误。
另外,您是从函数外部而不是内部调用 dep_matcher ,这就是它不起作用的原因。
最后,您将使用 return 语句来打破循环。如果你想得到所有的结果,你应该从for循环中得到return。
这是对我有用的代码:
def find_matches(text):
    doc = nlp(text)
    dep_matches = dep_matcher(doc)
    rule3_pairs = []
    for match in dep_matches:
        dep_pattern = match[0]
        matches = match[1]
        verb , subject = matches[0], matches[1]
        A = (nlp.vocab[dep_pattern].text, doc[verb] , doc[subject])
        rule3_pairs.append(A)
    return rule3_pairs
© www.soinside.com 2019 - 2024. All rights reserved.