SpaCy 依赖项匹配器使用 Pandas Dataframe 进行解析

Question

我在通过 SpaCy 依赖项匹配器传递数据框列时遇到困难。我尝试修改在之前的问题“使用 Pandas 数据帧进行 Spacy 依赖解析”中找到的解决方案，但没有成功。

import pandas as pd
import spacy
from spacy import displacy
from spacy.matcher import DependencyMatcher
from spacy.symbols import nsubj, VERB, dobj, NOUN

nlp = spacy.load("en_core_web_lg")
text = 'REPAIRED CONNECTOR ON J3 SMS. REPLACED THE PRIMARY COMPUTER.'.lower()
dep_matcher  = DependencyMatcher(vocab = nlp.vocab)
dep_pattern = [
    {
        "RIGHT_ID": "action",
        "RIGHT_ATTRS": {'LEMMA' : {"IN": ["reseat", "cycle", 'replace' , 'repair', 'reinstall' , 'clean', ' treat', 'splice', 'swap', 'read', 'inspect','installed' ]}}
    },

    {
        "LEFT_ID": "action",
        "REL_OP": ">",
        "RIGHT_ID": "component",
        "RIGHT_ATTRS": {"DEP":{"IN": [ 'dobj']}},     
    }]

dep_matcher.add('maint_action' , patterns = [dep_pattern])
dep_matches = dep_matcher(doc)

for match in dep_matches:
    dep_pattern = match[0]
    matches = match[1]
    verb , subject = matches[0], matches[1] 
    print (nlp.vocab[dep_pattern].text, '\t' ,doc[verb] , doc[subject])
>>>maint_action   repaired connector
>>>maint_action   replaced computer

传递一个字符串，上面的效果完美。但是当尝试传递 DF 时，新列返回空白。

这是 DF 的函数：

import pandas as pd
    import spacy
    from spacy import displacy
    from spacy.matcher import DependencyMatcher
    from spacy.symbols import nsubj, VERB, dobj, NOUN

nlp = spacy.load("en_core_web_lg")
data = {'new':  ['repaired computer and replaced connector.', 'spliced wire on connector.', 'cycled power and reseated connectors and replaced computer on transmitter.']}

df = pd.DataFrame(data)    

dep_matcher  = DependencyMatcher(vocab = nlp.vocab)
    dep_pattern = [
        {
            "RIGHT_ID": "action",
            "RIGHT_ATTRS": {'LEMMA' : {"IN": ["reseat", "cycle", 'replace' , 'repair', 'reinstall' , 'clean', ' treat', 'splice', 'swap', 'read', 'inspect','installed' ]}}
        },
    
        {
            "LEFT_ID": "action",
            "REL_OP": ">",
            "RIGHT_ID": "component",
            "RIGHT_ATTRS": {"DEP":{"IN": [ 'dobj']}},     
        }]
    
    dep_matcher.add('maint_action' , patterns = [dep_pattern])
    dep_matches = dep_matcher(doc)
def find_matches(text):
        doc = nlp(text)
        rule3_pairs = []
        for match in dep_matches:
            dep_pattern = match[0]
            matches = match[1]
            verb , subject = matches[0], matches[1] 
            A = (nlp.vocab[dep_pattern].text, '\t' ,doc[verb] , doc[subject])
            rule3_pairs.append(A)
            return rule3_pairs
      
df['three_tuples'] = df['new'].apply(find_matches)

我试图让符合模式的每一行输出相应的名词和动词组合。如：

|three_tuples|
|maint_action    repaired computer  replaced connector|
|maint_action    spliced wire|
|maint_action    cycled power  reseated connectors  replaced computer|

Answer 1

我已经完全按照原样执行了您的代码（第二个示例），并且它已经提供了您想要的结果（下图）。
您在第一个代码示例中遇到了一个小问题，但您没有这样做：
文档 = nlp(文本)
但我不认为这就是导致问题的原因，如果您使用的是 jupyter，也许可以尝试重新启动内核。

更新

编辑后，我注意到您有很多缩进错误，请修复这些错误。
另外，您是从函数外部而不是内部调用 dep_matcher ，这就是它不起作用的原因。
最后，您将使用 return 语句来打破循环。如果你想得到所有的结果，你应该从for循环中得到return。
这是对我有用的代码：

def find_matches(text):
    doc = nlp(text)
    dep_matches = dep_matcher(doc)
    rule3_pairs = []
    for match in dep_matches:
        dep_pattern = match[0]
        matches = match[1]
        verb , subject = matches[0], matches[1]
        A = (nlp.vocab[dep_pattern].text, doc[verb] , doc[subject])
        rule3_pairs.append(A)
    return rule3_pairs

SpaCy 依赖项匹配器使用 Pandas Dataframe 进行解析

问题描述投票：0回答：1

1个回答

更新

最新问题

SpaCy 依赖项匹配器使用 Pandas Dataframe 进行解析

问题描述 投票：0回答：1

1个回答

更新

最新问题

问题描述投票：0回答：1