Python 的斯坦福解析器：输出格式

Question

我目前正在使用斯坦福解析器的 Python 接口。

    from nltk.parse.stanford import StanfordParser
    import os

    os.environ['STANFORD_PARSER'] ='/Users/au571533/Downloads/stanford-parser-full-2016-10-31'
    os.environ['STANFORD_MODELS'] = '/Users/au571533/Downloads/stanford-parser-full-2016-10-31'
    parser=StanfordParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")

    new=list(parser.raw_parse("The young man who boarded his usual train that Sunday afternoon was twenty-four years old and fat. "))
    print new

我得到的输出看起来像这样：

    [Tree('ROOT', [Tree('S', [Tree('NP', [Tree('NP', [Tree('DT', ['The']), Tree('JJ', ['young']), Tree('NN', ['man'])]), Tree('SBAR', [Tree('WHNP', [Tree('WP', ['who'])]), Tree('S', [Tree('VP', [Tree('VBD', ['boarded']), Tree('NP', [Tree('PRP$', ['his']), Tree('JJ', ['usual']), Tree('NN', ['train'])]), Tree('NP', [Tree('DT', ['that']), Tree('NNP', ['Sunday'])])])])])]), Tree('NP', [Tree('NN', ['afternoon'])]), Tree('VP', [Tree('VBD', ['was']), Tree('NP', [Tree('NP', [Tree('JJ', ['twenty-four']), Tree('NNS', ['years'])]), Tree('ADJP', [Tree('JJ', ['old']), Tree('CC', ['and']), Tree('JJ', ['fat'])])])]), Tree('.', ['.'])])])]

但是，我只需要词性标签，因此我希望输出的格式类似于单词/标签。

在java中可以指定-outputFormat'wordsAndTags'并且它给出了我想要的。有关如何在 Python 中实现此功能的任何提示吗？

非常感谢您的帮助。谢谢！

PS：尝试使用斯坦福 POSTagger，但它在我感兴趣的一些单词上还不太准确。

Answer 1

如果您查看斯坦福解析器的 NLTK 类，您可以看到

raw_parse_sents()

方法不会发送您想要的

-outputFormat wordsAndTags

选项，而是发送

-outputFormat Penn

。如果您从

StanfordParser

派生自己的类，则可以重写此方法并指定

wordsAndTags

格式。

from nltk.parse.stanford import StanfordParser

class MyParser(StanfordParser):

        def raw_parse_sents(self, sentences, verbose=False):
        """
        Use StanfordParser to parse multiple sentences. Takes multiple sentences as a
        list of strings.
        Each sentence will be automatically tokenized and tagged by the Stanford Parser.
        The output format is `wordsAndTags`.

        :param sentences: Input sentences to parse
        :type sentences: list(str)
        :rtype: iter(iter(Tree))
        """
        cmd = [
            self._MAIN_CLASS,
            '-model', self.model_path,
            '-sentences', 'newline',
            '-outputFormat', 'wordsAndTags',
        ]
        return self._parse_trees_output(self._execute(cmd, '\n'.join(sentences), verbose))

Answer 2

StanfordParser 即将弃用。知道如何更改

nltk.parse.corenlp.CoreNLPParser

中的输出格式吗？

<stdin>:1: DeprecationWarning: The StanfordParser will be deprecated
Please use nltk.parse.corenlp.CoreNLPParser instead.

Python 的斯坦福解析器：输出格式

问题描述投票：0回答：2

2个回答

最新问题

Python 的斯坦福解析器：输出格式

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2