如何读取基于选区的解析树

Question

我有一个由斯坦福大学的 CoreNLP 系统预处理的句子语料库。它提供的功能之一是句子的解析树（基于选区）。虽然我可以在绘制解析树时理解它（就像一棵树），但我不确定如何以这种格式读取它：

例如：

          (ROOT
          (FRAG
          (NP (NN sent28))
          (: :)
          (S
          (NP (NNP Rome))
          (VP (VBZ is)
          (PP (IN in)
          (NP
          (NP (NNP Lazio) (NN province))
          (CC and)
          (NP
          (NP (NNP Naples))
          (PP (IN in)
          (NP (NNP Campania))))))))
          (. .)))

原句是：

sent28: Rome is in Lazio province and Naples in Campania .

我应该如何阅读这棵树，或者是否有一个代码（Python）可以正确执行它？谢谢。

Answer 1

NLTK

有一个用于读取解析树的类：

nltk.tree.Tree

。相关方法称为

fromstring

。然后你可以迭代它的子树、叶子等等......

顺便说一句：您可能想删除“

sent28:

”部分，因为它会混淆解析器（它也不是句子的一部分）。您没有得到完整的解析树，而只是一个句子片段。

Answer 2

我知道这篇文章已经很老了，但我相信我的解决方案也可能与其他人相关。

我编写了一个名为 Constituent Treelib 的库，它提供了一种便捷的方法来将句子解析为成分树，根据其结构对其进行修改，以及将它们可视化并导出为各种文件格式。此外，人们可以根据短语类别提取短语（例如，可以用作各种 NLP 任务的特征），验证括号符号中已解析的句子或将它们转换回句子。后者是OP所要求的。以下是实现此目标的步骤：

首先，通过以下方式安装库：

pip install constituent-treelib

接下来，从库中加载相应的组件，并根据括号中的树表示形式创建给定句子的组成树：

from constituent_treelib import ConstituentTree, BracketedTree, Language

# Define the language for the sentence as well as for the spaCy and benepar models
language = Language.English

# Define which specific SpaCy model should be used (default is Medium)
spacy_model_size = ConstituentTree.SpacyModelSize.Medium

# Create the pipeline (note, the required models will be downloaded and installed automatically)
nlp = ConstituentTree.create_pipeline(language, spacy_model_size)

# Your sentence
bracketed_tree_string = """(ROOT
(FRAG
(NP (NN sent28))
(: :)
(S
(NP (NNP Rome))
(VP (VBZ is)
(PP (IN in)
(NP
(NP (NNP Lazio) (NN province))
(CC and)
(NP
(NP (NNP Naples))
(PP (IN in)
(NP (NNP Campania))))))))
(. .)))""".splitlines()

bracketed_tree_string = " ".join(bracketed_tree_string)
sentence = BracketedTree(bracketed_tree_string)

# Create the tree from where we are going to extract the desired noun phrases
tree = ConstituentTree(sentence, nlp)

最后，我们使用以下命令从构成树中恢复原始句子：

tree.leaves(tree.nltk_tree, ConstituentTree.NodeContent.Text)

结果：

'sent28 : Rome is in Lazio province and Naples in Campania .'

Answer 3

您可以使用斯坦福解析器，例如：

sentences = parser.raw_parse_sents(["Hello, My name is Melroy.", "What is your name?"])  #probably raw_parse(just a string) or parse_sents(list but has been splited)
for line in sentences:
    for sentence in line:
        ***sentence.draw()***

如何读取基于选区的解析树

问题描述投票：0回答：3

3个回答

最新问题

如何读取基于选区的解析树

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3