如何读取基于选区的解析树

问题描述 投票:0回答:3

我有一个由斯坦福大学的 CoreNLP 系统预处理的句子语料库。它提供的功能之一是句子的解析树(基于选区)。虽然我可以在绘制解析树时理解它(就像一棵树),但我不确定如何以这种格式读取它:

例如:

          (ROOT
          (FRAG
          (NP (NN sent28))
          (: :)
          (S
          (NP (NNP Rome))
          (VP (VBZ is)
          (PP (IN in)
          (NP
          (NP (NNP Lazio) (NN province))
          (CC and)
          (NP
          (NP (NNP Naples))
          (PP (IN in)
          (NP (NNP Campania))))))))
          (. .)))

原句是:

sent28: Rome is in Lazio province and Naples in Campania .

我应该如何阅读这棵树,或者是否有一个代码(Python)可以正确执行它? 谢谢。

python parsing nlp parse-tree
3个回答
11
投票

NLTK
有一个用于读取解析树的类:
nltk.tree.Tree
。相关方法称为
fromstring
。然后你可以迭代它的子树、叶子等等......

顺便说一句:您可能想删除“

sent28:
”部分,因为它会混淆解析器(它也不是句子的一部分)。您没有得到完整的解析树,而只是一个句子片段。


0
投票

我知道这篇文章已经很老了,但我相信我的解决方案也可能与其他人相关。

我编写了一个名为 Constituent Treelib 的库,它提供了一种便捷的方法来将句子解析为成分树,根据其结构对其进行修改,以及将它们可视化并导出为各种文件格式。此外,人们可以根据短语类别提取短语(例如,可以用作各种 NLP 任务的特征),验证括号符号中已解析的句子或将它们转换回句子。后者是OP所要求的。以下是实现此目标的步骤:

首先,通过以下方式安装库:

pip install constituent-treelib

接下来,从库中加载相应的组件,并根据括号中的树表示形式创建给定句子的组成树:

from constituent_treelib import ConstituentTree, BracketedTree, Language

# Define the language for the sentence as well as for the spaCy and benepar models
language = Language.English

# Define which specific SpaCy model should be used (default is Medium)
spacy_model_size = ConstituentTree.SpacyModelSize.Medium

# Create the pipeline (note, the required models will be downloaded and installed automatically)
nlp = ConstituentTree.create_pipeline(language, spacy_model_size)

# Your sentence
bracketed_tree_string = """(ROOT
(FRAG
(NP (NN sent28))
(: :)
(S
(NP (NNP Rome))
(VP (VBZ is)
(PP (IN in)
(NP
(NP (NNP Lazio) (NN province))
(CC and)
(NP
(NP (NNP Naples))
(PP (IN in)
(NP (NNP Campania))))))))
(. .)))""".splitlines()

bracketed_tree_string = " ".join(bracketed_tree_string)
sentence = BracketedTree(bracketed_tree_string)

# Create the tree from where we are going to extract the desired noun phrases
tree = ConstituentTree(sentence, nlp) 

最后,我们使用以下命令从构成树中恢复原始句子:

tree.leaves(tree.nltk_tree, ConstituentTree.NodeContent.Text)

结果:

'sent28 : Rome is in Lazio province and Naples in Campania .'

-3
投票

您可以使用斯坦福解析器,例如:

sentences = parser.raw_parse_sents(["Hello, My name is Melroy.", "What is your name?"])  #probably raw_parse(just a string) or parse_sents(list but has been splited)
for line in sentences:
    for sentence in line:
        ***sentence.draw()***
© www.soinside.com 2019 - 2024. All rights reserved.