我正在尝试将一些命令(例如字符串/文件内容)解析为 Dict 输出,并且我了解了 pyparsing。
假设我有以下输入:
str = "p1 start a, {alias = b, for : 30}; c, d stop e"
为了解析它,我正在使用这个:
import pyparsing as pp
grammar = pp.Forward()
SEP = pp.one_of(", ;")
EQ = pp.Suppress(pp.one_of(': ='))
LBRACE, RBRACE = map(pp.Suppress,"{}")
CMD_KEYWORD = (pp.CaselessKeyword("start") | pp.CaselessKeyword("stop") | pp.CaselessKeyword("resume"))
platform = pp.one_of("p1 p2 p3")("platform")
alias = pp.Word(pp.alphanums)
prop = pp.Word(pp.alphanums)
value = pp.Word(pp.alphanums)
prop_value = pp.Dict(pp.Group(prop + EQ + value))
task_config = LBRACE + pp.delimitedList(prop_value, delim = SEP) + RBRACE
command = CMD_KEYWORD + pp.Group(pp.delimitedList(task_config | alias, delim = SEP))("tasks")
expr = platform + command[1, ...]("commands")
grammar <<= expr
res = grammar.parse_string(str)
print(res.as_dict())
print(res.as_list())
产生以下字典和列表
{'platform': 'p1', 'tasks': ['e'], 'commands': ['start', {'alias': 'b', 'for': '30'}, 'stop', ['e']]}
['p1', 'start', ['a', ['alias', 'b'], ['for', '30'], 'c', 'd'], 'stop', ['e']]
虽然我(仍然)试图实现的目标是获得特定 Dict 格式的输出,如下所示:
{
[
{
'platform': 'p1',
'commands': [
{'cmd': 'start', 'tasks': [{'alias': 'a'}, {'alias': 'b', 'for': '30'}, {'alias': 'c'}, {'alias': 'd'}]},
{'cmd': 'stop', 'tasks': [{'alias': 'e'}]}
],
}
]
}
编辑:
经过一些尝试和错误,在对解析器语法进行一些更改后,我设法(几乎)实现了我的目标:
import pyparsing as pp
def set_alias(t):
return {"alias": t[0]}
grammar = pp.Forward()
SEP = pp.one_of(", ;")
EQ = pp.Suppress(pp.one_of(': ='))
LBRACE, RBRACE = map(pp.Suppress,"{}")
OPT_SEP = pp.Suppress(pp.Opt(SEP))
CMD_KEYWORD = (pp.CaselessKeyword("start") | pp.CaselessKeyword("stop") | pp.CaselessKeyword("resume"))("cmd")
platform = pp.one_of("p1 p2 p3")("platform")
alias = ~(CMD_KEYWORD | platform) + pp.Word(pp.alphanums)
prop = pp.Word(pp.alphanums)
value = pp.Word(pp.alphanums)
prop_value = pp.Dict(pp.Group(prop + EQ + value))
task_config = LBRACE + pp.Group(pp.delimitedList(prop_value, delim = SEP)) + RBRACE
command = pp.Group(CMD_KEYWORD + pp.Group(pp.OneOrMore((task_config | alias.set_parse_action(set_alias)) + OPT_SEP))("tasks"))
expr = platform + command[1, ...]("commands")
grammar <<= pp.OneOrMore(expr + OPT_SEP)
print(res.as_dict())
print(res.as_list())
但是当我使用以下输入对其进行测试时:
p1 start a, {alias = b, for : 30}; c, d stop e p2 resume f
我得到:
{'platform': 'p2', 'commands': [{'cmd': 'resume', 'tasks': [{'alias': 'f'}]}]}
['p1', ['start', [{'alias': 'a'}, [['alias', 'b'], ['for', '30']], {'alias': 'c'}, {'alias': 'd'}]], ['stop', [{'alias': 'e'}]], 'p2', ['resume', [{'alias': 'f'}]]]
正如您所看到的 res.as_list() 返回所有预期的标记,但是 res.as_dict() 只返回 'platform': 'p2' 部分缺少 'platfrom': ' p1' 一个,我不明白其原因。
编辑2:
我现在已经解决了这个问题,将最后一部分更改为:
expr = pp.Dict(pp.Group(platform + command[1, ...]("commands")))
grammar <<= pp.OneOrMore(expr + OPT_SEP)
我得到了以下 Dict 作为输出:
{'p1': {'platform': 'p1', 'commands': [{'cmd': 'start', 'tasks': [{'alias': 'a'}, {'alias': 'b', 'for': '30'}, {'alias': 'c'}, {'alias': 'd'}]}, {'cmd': 'stop', 'tasks': [{'alias': 'e'}]}]},
'p2': {'platform': 'p2', 'commands': [{'cmd': 'resume', 'tasks': [{'alias': 'f'}]}]}}
[['p1', ['start', [{'alias': 'a'}, [['alias', 'b'], ['for', '30']], {'alias': 'c'}, {'alias': 'd'}]], ['stop', [{'alias': 'e'}]]], ['p2', ['resume', [{'alias': 'f'}]]]]
也许我会回答我自己的问题并稍后将其标记为已解决,因为我意识到我需要更新我的解析器以重新运行 Dictionaries 的 List 而不是只是一个大 Dict 因为处理解析器输出的顺序非常重要。
所以,我将回答我自己在以下输入字符串上测试的问题:
p1 开始 a, {alias = b, for : 30}; c、d 停止 e; p2 恢复 f p3 开始 {名称 = g,at = 5}
import pyparsing as pp
def set_alias(t):
return {"alias": t[0]}
def set_expr(t):
for expre in t.as_dict().values():
result.append(expre)
str = "p1 start a, {alias = b, for : 30}; c, d stop e; p2 resume f p3 start {name = g, at = 5}"
# result will represent the output as a List of Dictionaries
result = []
grammar = pp.Forward()
SEP = pp.one_of(", ;")
EQ = pp.Suppress(pp.one_of(': ='))
LBRACE, RBRACE = map(pp.Suppress,"{}")
OPT_SEP = pp.Suppress(pp.Opt(SEP))
CMD_KEYWORD = (pp.CaselessKeyword("start") | pp.CaselessKeyword("stop") | pp.CaselessKeyword("resume"))("cmd")
platform = pp.one_of("p1 p2 p3")("platform")
alias = ~(CMD_KEYWORD | platform) + pp.Word(pp.alphanums)
prop = pp.Word(pp.alphanums)
value = pp.Word(pp.alphanums)
prop_value = pp.Dict(pp.Group(prop + EQ + value))
task_config = LBRACE + pp.Group(pp.delimitedList(prop_value, delim = SEP)) + RBRACE
command = pp.Group(CMD_KEYWORD + pp.Group(pp.OneOrMore((task_config | alias.set_parse_action(set_alias)) + OPT_SEP))("tasks"))
expr = pp.Dict(pp.Group(platform + command[1, ...]("commands"))).setParseAction(set_expr)
grammar <<= pp.OneOrMore(expr + OPT_SEP)
print('\nDict = ', res.as_dict())
print('\n List of Dict = ', result)
print('\nList of Tokens =', res.as_list())
结果是:
Dict = {'p1': {'platform': 'p1', 'commands': [{'cmd': 'start', 'tasks': [{'alias': 'a'}, {'alias': 'b', 'for': '30'}, {'alias': 'c'}, {'alias': 'd'}]}, {'cmd': 'stop', 'tasks': [{'alias': 'e'}]}]}, 'p2': {'platform': 'p2', 'commands': [{'cmd': 'resume', 'tasks': [{'alias': 'f'}]}]}, 'p3': {'platform': 'p3', 'commands': [{'cmd': 'start', 'tasks': [{'name': 'g', 'at': '5'}]}]}}
List of Dict = [{'platform': 'p1', 'commands': [{'cmd': 'start', 'tasks': [{'alias': 'a'}, {'alias': 'b', 'for': '30'}, {'alias': 'c'}, {'alias':
'd'}]}, {'cmd': 'stop', 'tasks': [{'alias': 'e'}]}]}, {'platform': 'p2', 'commands': [{'cmd': 'resume', 'tasks': [{'alias': 'f'}]}]}, {'platform': 'p3', 'commands': [{'cmd': 'start', 'tasks': [{'name': 'g', 'at': '5'}]}]}]
List of Tokens = [['p1', ['start', [{'alias': 'a'}, [['alias', 'b'], ['for', '30']], {'alias': 'c'}, {'alias': 'd'}]], ['stop', [{'alias': 'e'}]]], ['p2', ['resume', [{'alias': 'f'}]]], ['p3', ['start', [[['name', 'g'], ['at', '5']]]]]]