如何从依赖解析器的输出中创建树?

问题描述 投票:2回答:2

我试图从依赖解析器的输出中创建一个树(嵌套字典)。这句话是“我在睡梦中拍了一头大象”。我能够获得链接上描述的输出:How do I do dependency parsing in NLTK?

nsubj(shot-2, I-1)
det(elephant-4, an-3)
dobj(shot-2, elephant-4)
prep(shot-2, in-5)
poss(sleep-7, my-6)
pobj(in-5, sleep-7)

要将此元组列表转换为嵌套字典,我使用以下链接:How to convert python list of tuples into tree?

def build_tree(list_of_tuples):
    all_nodes = {n[2]:((n[0], n[1]),{}) for n in list_of_tuples}
    root = {}    
    print all_nodes
    for item in list_of_tuples:
        rel, gov,dep = item
        if gov is not 'ROOT':
            all_nodes[gov][1][dep] = all_nodes[dep]
        else:
            root[dep] = all_nodes[dep]
    return root

这给出了如下输出:

{'shot': (('ROOT', 'ROOT'),
  {'I': (('nsubj', 'shot'), {}),
   'elephant': (('dobj', 'shot'), {'an': (('det', 'elephant'), {})}),
   'sleep': (('nmod', 'shot'),
    {'in': (('case', 'sleep'), {}), 'my': (('nmod:poss', 'sleep'), {})})})}

为了找到root到leaf路径,我使用了以下链接:Return root to specific leaf from a nested dictionary tree

[制作树并找到路径是两个不同的东西]第二个目标是找到根到叶节点路径,如完成Return root to specific leaf from a nested dictionary tree。但我想得到root-to-leaf(依赖关系路径)所以,例如,当我将调用recurse_category(categories,'an')时,其中categories是嵌套的树结构,'an'是树中的单词,我应该得到ROOT-nsubj-dobj(依赖关系直到root)作为输出。

python dictionary nlp nltk stanford-nlp
2个回答
0
投票

这会将输出转换为嵌套字典表单。如果我能找到路径,我会告诉你更新。也许这个,很有帮助。

list_of_tuples = [('ROOT','ROOT', 'shot'),('nsubj','shot', 'I'),('det','elephant', 'an'),('dobj','shot', 'elephant'),('case','sleep', 'in'),('nmod:poss','sleep', 'my'),('nmod','shot', 'sleep')]

nodes={}

for i in list_of_tuples:
    rel,parent,child=i
    nodes[child]={'Name':child,'Relationship':rel}

forest=[]

for i in list_of_tuples:
    rel,parent,child=i
    node=nodes[child]

    if parent=='ROOT':# this should be the Root Node
            forest.append(node)
    else:
        parent=nodes[parent]
        if not 'children' in parent:
            parent['children']=[]
        children=parent['children']
        children.append(node)

print forest

输出是嵌套字典,

[{'Name': 'shot', 'Relationship': 'ROOT', 'children': [{'Name': 'I', 'Relationship': 'nsubj'}, {'Name': 'elephant', 'Relationship': 'dobj', 'children': [{'Name': 'an', 'Relationship': 'det'}]}, {'Name': 'sleep', 'Relationship': 'nmod', 'children': [{'Name': 'in', 'Relationship': 'case'}, {'Name': 'my', 'Relationship': 'nmod:poss'}]}]}]

以下函数可以帮助您找到root-to-leaf路径:

def recurse_category(categories,to_find):
    for category in categories: 
        if category['Name'] == to_find:
            return True, [category['Relationship']]
        if 'children' in category:
            found, path = recurse_category(category['children'], to_find)
            if found:
                return True, [category['Relationship']] + path
    return False, []

1
投票

首先,如果您只是使用预先训练的Stanford CoreNLP依赖解析器模型,您应该使用CoreNLPDependencyParser中的nltk.parse.corenlp并避免使用旧的nltk.parse.stanford接口。

Stanford Parser and NLTK

在终端中下载并运行Java服务器后,在Python中:

>>> from nltk.parse.corenlp import CoreNLPDependencyParser
>>> dep_parser = CoreNLPDependencyParser(url='http://localhost:9000')
>>> sent = "I shot an elephant with a banana .".split()
>>> parses = list(dep_parser.parse(sent))
>>> type(parses[0])
<class 'nltk.parse.dependencygraph.DependencyGraph'>

现在我们看到解析是来自DependencyGraph nltk.parse.dependencygraphhttps://github.com/nltk/nltk/blob/develop/nltk/parse/dependencygraph.py#L36类型

要通过简单地执行DependencyGraphnltk.tree.Tree转换为DependencyGraph.tree()对象:

>>> parses[0].tree()
Tree('shot', ['I', Tree('elephant', ['an']), Tree('banana', ['with', 'a']), '.'])

>>> parses[0].tree().pretty_print()
          shot                  
  _________|____________         
 |   |  elephant      banana    
 |   |     |       _____|_____   
 I   .     an    with         a 

要将其转换为括号内的解析格式:

>>> print(parses[0].tree())
(shot I (elephant an) (banana with a) .)

如果你正在寻找依赖三胞胎:

>>> [(governor, dep, dependent) for governor, dep, dependent in parses[0].triples()]
[(('shot', 'VBD'), 'nsubj', ('I', 'PRP')), (('shot', 'VBD'), 'dobj', ('elephant', 'NN')), (('elephant', 'NN'), 'det', ('an', 'DT')), (('shot', 'VBD'), 'nmod', ('banana', 'NN')), (('banana', 'NN'), 'case', ('with', 'IN')), (('banana', 'NN'), 'det', ('a', 'DT')), (('shot', 'VBD'), 'punct', ('.', '.'))]

>>> for governor, dep, dependent in parses[0].triples():
...     print(governor, dep, dependent)
... 
('shot', 'VBD') nsubj ('I', 'PRP')
('shot', 'VBD') dobj ('elephant', 'NN')
('elephant', 'NN') det ('an', 'DT')
('shot', 'VBD') nmod ('banana', 'NN')
('banana', 'NN') case ('with', 'IN')
('banana', 'NN') det ('a', 'DT')
('shot', 'VBD') punct ('.', '.')

以CONLL格式:

>>> print(parses[0].to_conll(style=10))
1   I   I   PRP PRP _   2   nsubj   _   _
2   shot    shoot   VBD VBD _   0   ROOT    _   _
3   an  a   DT  DT  _   4   det _   _
4   elephant    elephant    NN  NN  _   2   dobj    _   _
5   with    with    IN  IN  _   7   case    _   _
6   a   a   DT  DT  _   7   det _   _
7   banana  banana  NN  NN  _   2   nmod    _   _
8   .   .   .   .   _   2   punct   _   _
© www.soinside.com 2019 - 2024. All rights reserved.