我正在努力实现一个解析器,该解析器应该处理输入字符串,提取其组件,验证它们,然后从中创建 SQL Alchemy 查询。目前,我正在研究解析器的第一部分,并遇到了某个问题。我想定义一个异常来检查过滤器的正确性。
过滤器定义:
filter_term = Combine(Optional(space) + Word(alphas) + Optional(space)).set_results_name("filter").set_parse_action(
filter_validator).set_name("filter")
我想为过滤器添加额外的验证 - 我有可以用作过滤器的特定单词,它们将被定义为带有别名的字典,例如:
"animal": "animal",
"dog": "animal",
"cat": "animal",
"pet": "animal"
}
在提供的代码中,我使用一个简单的检查来查看过滤器是否等于“w”,如果是,则返回异常。
if t[0] == "w":
raise FilterException("Invalid filter")
但是,目前这并没有发生,因为我的解析器抛出异常,但它与过滤器验证无关。
ParseException:预期文本结束,找到“和”(在字符 15 处),(行:1,列:16) 失败:预期文本结束,发现“和”(在字符 15 处),(行:1,列:16)
我可以请你帮忙解决这个问题吗?”
解析器:
from pyparsing import Word, Combine, Optional, DelimitedList, alphanums, Suppress, Group, one_of, alphas, \
CaselessLiteral, infix_notation, opAssoc, OneOrMore, Keyword, CaselessKeyword, pyparsing_common, Forward, \
ParseException, ParseSyntaxException, ZeroOrMore
class OrOperation:
def __init__(self, instring, loc, toks):
raise ParseException(instring, loc, "invalid OR given")
class AndOperation:
def __init__(self, instring, loc, toks):
raise ParseException(instring, loc, "invalid AND given")
class FilterException(ParseException):
def __init__(self, pstr):
super().__init__(pstr)
def filter_validator(s, l, t):
if t[0] == "w":
raise FilterException("Invalid filter")
# utils:
comma = Suppress(",")
space = Suppress(" ")
lbrace = Suppress("(")
rbrace = Suppress(")")
and_operator = Suppress(CaselessKeyword("AND"))
or_operator = CaselessKeyword("OR")
search_parser = Forward().set_name("search_expression")
literal_value = Forward().set_name("literal_value").set_results_name("literal_value")
delimited_list_delim = Optional(comma + Optional(space))
delimited_list = DelimitedList(literal_value, delim=delimited_list_delim).set_parse_action(
lambda tokens: ", ".join(tokens))
string_literal = Word(alphanums + "_")
wildcard_literal = Combine(string_literal + "*").set_parse_action(lambda tokens: tokens[0].replace("*", "?"))
delimited_list_literal = lbrace + delimited_list + rbrace
filter_term = Combine(Optional(space) + Word(alphas) + Optional(space)).set_results_name("filter").set_parse_action(
filter_validator).set_name("filter")
literal_value <<= delimited_list_literal | wildcard_literal | string_literal
equals_operator = one_of("= :")
comparison_operator = one_of("> >= < <= ")
not_equals_operator = CaselessLiteral("!=")
contains_operator = CaselessLiteral("~").set_parse_action(lambda tokens: "LIKE")
not_contains_operator = CaselessLiteral("!~").set_parse_action(lambda tokens: "NOT LIKE")
operator = equals_operator | not_equals_operator | contains_operator | not_contains_operator | comparison_operator
operator_term = Combine(Optional(space) + operator + Optional(space)).set_results_name("operator")
expression_term = Group(filter_term + operator_term + literal_value).set_parse_action(filter_validator) | Group(
literal_value)
search_parser <<= infix_notation(expression_term,
[
(and_operator, 2, opAssoc.LEFT,
lambda instring, loc, toks: AndOperation(instring, loc, toks)),
(or_operator, 2, opAssoc.LEFT,
lambda instring, loc, toks: OrOperation(instring, loc, toks))
])
try:
result = search_parser.parse_string("w~(a, b c, d)")
print(result.dump())
except FilterException as e:
print("Filter failed:", e)
search_parser.run_tests('''
asas
was*
(as, b,c d)
((as, b,c d))
w=a
w=a*
w=(a, b c, d)
w:(a, b c, d)
w!=(a, b c, d)
w~(a, b c, d)
w!~(a, b c, d)
w>=(a, b c, d)
a>=(a, b c, d) and a=(a, b c, d)
w>=(a, b c, d) and w=(a, b c, d) and w=(a, b c, d)
w>=(a, b c, d) or (w=(a, b c, d) and w=(a, b c, d))
(w>=(a, b c, d) or w!~(a, b c, d)) or (w=(a, b c, d) and w=(a, b c, d))
w>=(a, b c, d) or w!~(a, b c, d) or (w=(a, b c, d) and w=(a, b c, d))
w>=(a, b c, d) or w!~(a, b c, d) or w=(a, b c, d) and w=(a, b c, d)
a>=(a, b c, d) and w!~(a, b c, d) or w=(a, b c, d) and w!=(a, b c, d)
''')
Pyparsing 的内部逻辑大量使用
ParseExceptions
,因为它通过嵌套 ParserElements
的解析器结构进行工作。由于 FilterException
扩展了 ParseException
,因此它会与尝试和重试内部异常引发和处理的所有其余部分一起被拉入。
我改变了你的例外,我认为这会让事情变得更接近你的期望:
class FilterException(Exception):
def __init__(self, pstr):
self.msg = pstr
关于解析器的其他一些注释:
Optional(space)
对于空格跳过不会很好地工作,因为 pyparsing 已经隐式跳过空格。相反,请尝试:
filter_term = Word(alphas, as_keyword=True).set_results_name("filter").set_parse_action(
filter_validator).set_name("filter")
AndOperation
和 OrOperation
采用已经与解析操作签名对齐的构造函数签名,因此它们可以在 infix_notation
中使用,如下所示:
search_parser <<= infix_notation(expression_term,
[
(and_operator, 2, opAssoc.LEFT, AndOperation),
(or_operator, 2, opAssoc.LEFT, OrOperation)
])
expression_term.run_tests('''
asas
was*
(as, b,c d)
((as, b,c d))
w=a
w=a*
w=(a, b c, d)
w:(a, b c, d)
w!=(a, b c, d)
w~(a, b c, d)
w!~(a, b c, d)
w>=(a, b c, d)
''')
search_parser.run_tests('''
a>=(a, b c, d) and a=(a, b c, d)
w>=(a, b c, d) and w=(a, b c, d) and w=(a, b c, d)
w>=(a, b c, d) or (w=(a, b c, d) and w=(a, b c, d))
(w>=(a, b c, d) or w!~(a, b c, d)) or (w=(a, b c, d) and w=(a, b c, d))
w>=(a, b c, d) or w!~(a, b c, d) or (w=(a, b c, d) and w=(a, b c, d))
w>=(a, b c, d) or w!~(a, b c, d) or w=(a, b c, d) and w=(a, b c, d)
a>=(a, b c, d) and w!~(a, b c, d) or w=(a, b c, d) and w!=(a, b c, d)
''')