如何在NLTK CFG中匹配整数？

Question

如果我想定义一种语法，其中一个标记将与一个整数匹配，我如何使用nltk的字符串CFG来实现它？

例如-

S -> SK SO FK
SK -> 'SELECT'
SO -> '\d+'
FK -> 'FROM'

Answer 1

这样创建一个数字短语：

import nltk

groucho_grammar = nltk.CFG.fromstring("""
S -> NP VP
PP -> P NP
NP -> Det N | Det N PP | 'I' | NUM N
VP -> V NP | VP PP
Det -> 'an' | 'my'
N -> 'elephant' | 'pajamas' | 'elephants'
V -> 'shot'
P -> 'in'
NUM -> '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | '10'
""")

sent = 'I shot 3 elephants'.split()
parser = nltk.ChartParser(groucho_grammar)
for tree in parser.parse(sent):
    print(tree)

[out]：

(S (NP I) (VP (V shot) (NP (NUM 3) (N elephants))))

但是请注意，那只能处理一位数字。因此，让我们尝试将整数压缩为单个令牌类型，例如'#NUM＃'：

import nltk

groucho_grammar = nltk.CFG.fromstring("""
S -> NP VP
PP -> P NP
NP -> Det N | Det N PP | 'I' | NUM N
VP -> V NP | VP PP
Det -> 'an' | 'my'
N -> 'elephant' | 'pajamas' | 'elephants'
V -> 'shot'
P -> 'in'
NUM -> '#NUM#'
""")

sent = 'I shot 333 elephants'.split()
sent = ['#NUM#' if i.isdigit() else i for i in sent]

parser = nltk.ChartParser(groucho_grammar)
for tree in parser.parse(sent):
    print(tree)

[out]：

(S (NP I) (VP (V shot) (NP (NUM #NUM#) (N elephants))))

要放回数字，请尝试：

import nltk

groucho_grammar = nltk.CFG.fromstring("""
S -> NP VP
PP -> P NP
NP -> Det N | Det N PP | 'I' | NUM N
VP -> V NP | VP PP
Det -> 'an' | 'my'
N -> 'elephant' | 'pajamas' | 'elephants'
V -> 'shot'
P -> 'in'
NUM -> '#NUM#'
""")

original_sent = 'I shot 333 elephants'.split()
sent = ['#NUM#' if i.isdigit() else i for i in original_sent]
numbers = [i for i in original_sent if i.isdigit()]

parser = nltk.ChartParser(groucho_grammar)
for tree in parser.parse(sent):
    treestr = str(tree)
    for n in numbers:
        treestr = treestr.replace('#NUM#', n, 1)
    print(treestr)

[out]：

(S (NP I) (VP (V shot) (NP (NUM 333) (N elephants))))

Answer 2

一个简单的解决方案是定义一个函数，该函数根据给定的句子和语法创建一个解析器。通过扩展每个函数调用的语法以包括句子中整数的乘积，可以解决整数问题。这是一个示例函数：

def name_parser(G,sent):
    ints = [i for i in sent if i.isdigit()]
    lproductions = list(G.productions())
    lproduction.extend([nltk.grammar.Production(nltk.grammar.Nonterminal('INT'),[i]) for i in ints])
    lgrammar = nltk.grammar.CFG(G.start(),lproductions)
    parser = nltk.ChartParser(lgrammar)
    for tree in parser.parse(sent):
        print(tree)

如何在NLTK CFG中匹配整数？

问题描述投票：2回答：2

2个回答

最新问题

如何在NLTK CFG中匹配整数？

问题描述 投票：2回答：2

2个回答

最新问题

问题描述投票：2回答：2