如何使用 python nltk 获取解析树?

问题描述 投票:0回答:5

给定以下句子:

The old oak tree from India fell down.

如何使用 python NLTK 获得句子的以下解析树表示形式?

(ROOT (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (PRT (RP down)))))

我需要一个在网上找不到的完整示例!


编辑

我已经阅读了这本书的章节来学习使用 NLTK 进行解析,但问题是,我需要一个语法来解析我没有的句子或短语。我发现this stackoverflow post它也询问了语法分析但那里没有令人信服的答案。

所以,我正在寻找一个完整的答案,可以给我一个句子的解析树。

python nltk
5个回答
10
投票

这里是使用

StanfordCoreNLP
代替
nltk
的替代解决方案。很少有图书馆建立在
StanfordCoreNLP
之上,我个人使用pycorenlp来解析句子。

首先,您必须下载

stanford-corenlp-full
文件夹,其中包含
*.jar
文件。并在文件夹内运行服务器(默认端口为 9000)。

export CLASSPATH="`find . -name '*.jar'`"
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer [port?] # run server

然后在 Python 中,您可以运行以下命令来标记句子。

from pycorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('http://localhost:9000')

text = "The old oak tree from India fell down."

output = nlp.annotate(text, properties={
  'annotators': 'parse',
  'outputFormat': 'json'
})

print(output['sentences'][0]['parse']) # tagged output sentence

2
投票

较旧的问题,但您可以将 nltk 与 bllipparser 一起使用。这是来自 nltk更长的例子。经过一些摆弄后,我自己使用了以下内容:

安装(已安装 nltk):

sudo python3 -m nltk.downloader bllip_wsj_no_aux
pip3 install bllipparser

使用:

from nltk.data import find
from bllipparser import RerankingParser

model_dir = find('models/bllip_wsj_no_aux').path
parser = RerankingParser.from_unified_model_dir(model_dir)

best = parser.parse("The old oak tree from India fell down.")

print(best.get_reranker_best())
print(best.get_parser_best())

输出:

-80.435259246021 -23.831876011253 (S1 (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (PRT (RP down))) (. .)))
-79.703612178593 -24.505514522222 (S1 (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (ADVP (RB down))) (. .)))

1
投票

要使用 nltk 库获取解析树,您可以使用以下代码

# Import required libraries
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
from nltk import pos_tag, word_tokenize, RegexpParser

# Example text
sample_text = "The quick brown fox jumps over the lazy dog"

# Find all parts of speech in above sentence
tagged = pos_tag(word_tokenize(sample_text))

#Extract all parts of speech from any text
chunker = RegexpParser("""
                    NP: {<DT>?<JJ>*<NN>} #To extract Noun Phrases
                    P: {<IN>}            #To extract Prepositions
                    V: {<V.*>}           #To extract Verbs
                    PP: {<p> <NP>}       #To extract Prepositional Phrases
                    VP: {<V> <NP|PP>*}   #To extract Verb Phrases
                    """)

# Print all parts of speech in above sentence
output = chunker.parse(tagged)
print("After Extracting\n", output)
# output looks something like this
 (S
  (NP The/DT old/JJ oak/NN)
  (NP tree/NN)
  (P from/IN)
  India/NNP
  (VP (V fell/VBD))
  down/RB
  ./.)

您还可以获得这棵树的图表

# To draw the parse tree
output.draw()

输出图看起来像这样


0
投票

OP问题的另一种解决方案是使用Constituent-Treelib库,可以通过以下方式安装:

pip install constituent-treelib

您只需执行以下步骤:

from constituent_treelib import ConstituentTree

# First, we have to provide a sentence that should be parsed
sentence = "The way to get started is to quit talking and begin doing."

# Then, we define the language that should be considered with respect to the underlying models 
language = ConstituentTree.Language.English

# You can also specify the desired model for the language ("Small" is selected by default)
spacy_model_size = ConstituentTree.SpacyModelSize.Medium

# Next, we must create the neccesary NLP pipeline. 
# If you wish, you can instruct the library to download and install the models automatically
nlp = ConstituentTree.create_pipeline(language, spacy_model_size) #, download_models=True

# Now, we can instantiate a ConstituentTree object and pass it the sentence and the NLP pipeline
tree = ConstituentTree(sentence, nlp)

# Finally, we can print the parsed tree
print(tree)

结果...

(S
  (NP
    (NP (DT The) (NN way))
    (SBAR (S (VP (TO to) (VP (VB get) (VP (VBN started)))))))
  (VP
    (VBZ is)
    (S
      (VP
        (TO to)
        (VP
          (VP (VB quit) (NP (VBG talking)))
          (CC and)
          (VP (VB begin) (S (VP (VBG doing))))))))
  (. .))

0
投票

您还可以使用 Spacy 中提供的更高级的“Constituency Parsing with a Self-Attentive Encoder”:

import benepar, spacy
nlp = spacy.load('en_core_web_md')
nlp.add_pipe('benepar', config={'model': 'benepar_en3'})
doc = nlp('The time for action is now. It is never too late to do 
something.')
sent = list(doc.sents)[0]
print(sent._.parse_string)
# (S (NP (NP (DT The) (NN time)) (PP (IN for) (NP (NN action)))) (VP (VBZ 
is) (ADVP (RB now))) (. .))
print(sent._.labels)
# ('S',)
print(list(sent._.children)[0])
# The time for action

更多信息:Berkeley Neural Parser

© www.soinside.com 2019 - 2024. All rights reserved.