我想定义一个语法表达式:
=expr + #native(...)
expr
可以是我们语法中的任何有效表达式,目前这是有效的。然而,#native(...)
是目标(可能是多种)方言中的有效SQL语句,这意味着我们无法控制它——它可能在SQL Server、Oracle、MySQL、Postgres等中。我们基本上只想“捕获”已输入到该构造中的内容。然而,我们遇到了一个问题,我认为这使得它变得模棱两可:
)
字符何时终止本机表达式?我们可以解析诸如 '...'
或 "..."
之类的字符串来忽略其中的任何字符,但是转义字符是什么也可能不明确——有时它可能是 qute-char 的双精度字符,例如 "He said ""hello""."
,也可以是反斜杠,例如 "He said \"Hello\"."
。换句话说,从一般意义上讲,很难知道某个值何时被引用或可能被注释/查询提示等。除了将表达式作为字符串文字(例如没有自己定义的转义字符的
=expr + #native("...")
)之外,还有其他相对优雅的方法来处理这种潜在的歧义吗?
想法:也许对于每种目标方言,我们都存储可接受的 SQL 方言的 Quote、Comment 和 Escape 字符?例如,地图如下:
{
"postgres": {
"comments": [
"--"
],
"blocks": [
{"start": "/*","end": "*/"}
],
"quotes": [
{"value": "'","escape": "''"},
{"value": "\"","escape": "\"\""}
]
},
"mysql": {
"comments": [
"#"
],
"blocks": [
{"start": "/*","end": "*/"}
],
"quotes": [
{"value": "'","escape": "\\"},
{"value": "\"","escape": "\\"},
{"value": "`",escape": "``"}
]
}
}
并将
native
更改为方言,例如=expr + #postgres(...)
。例如:
=1 + #postgres(SELECT (ARRAY[(2),'-3' /*)(*/, (((3)))])[1])
--> 3
这听起来是一个有效且好的方法吗?最后,如果词法本身依赖于字符,我将如何“提取”该
#native(...)
组件?我需要在词法分析器中添加一个预处理器吗? (我目前正在使用 ANTLR 进行词法分析/解析)。
您可以通过添加一些自定义代码并使用更高级的功能在词法分析器中执行此操作:
其要点是这样的:
#name(
时,进入“本机”模式一个小型 Java 演示:
// DynamicLexer.g4
lexer grammar DynamicLexer;
@members {
private java.util.Map<String, SqlDialect> dialects;
private String dialect = null;
private Block block = null;
private Quote quote = null;
public DynamicLexer(java.util.Map<String, SqlDialect> dialects, CharStream input) {
this(input);
this.dialects = dialects;
}
private void setDialect(String token) {
this.dialect = token.replaceAll("[#(]", "");
}
private SqlDialect getDialect() {
SqlDialect sqlDialect = this.dialects.get(this.dialect);
if (sqlDialect == null) {
throw new RuntimeException("Unknown dialect: '" + this.dialect + "'");
}
return sqlDialect;
}
private boolean blockStartAhead() {
SqlDialect sqlDialect = this.getDialect();
for (Block b : sqlDialect.blocks) {
if (this.ahead(b.start)) {
this.consume(b.start);
this.block = b;
return true;
}
}
return false;
}
private boolean blockEndAhead() {
if (this.ahead(this.block.end)) {
this.consume(this.block.end);
return true;
}
return false;
}
private boolean quoteStartAhead() {
SqlDialect sqlDialect = this.getDialect();
for (Quote q : sqlDialect.quotes) {
if (this.ahead(q.start)) {
this.consume(q.start);
this.quote = q;
return true;
}
}
return false;
}
private boolean quoteEndAhead() {
if (this.ahead(this.quote.start)) {
this.consume(this.quote.start);
return true;
}
return false;
}
private boolean quoteEscapeAhead(boolean consume) {
if (this.ahead(this.quote.escape)) {
if (consume) {
this.consume(this.quote.escape);
}
return true;
}
return false;
}
private boolean ahead(String text) {
for (int i = 1; i <= text.length(); i++) {
if (this._input.LA(i) != text.charAt(i - 1)) {
return false;
}
}
return true;
}
private void consume(String text) {
for (int i = 1; i < text.length(); i++) {
this._input.consume();
}
}
}
SPACE : [ \t\r\n] -> skip;
EQUAL : '=';
ADD : '+';
INT : [0-9]+;
NATIVE : '#' [a-zA-Z]+ '(' {setDialect(getText());} -> pushMode(NATIVE_MODE);
mode NATIVE_MODE;
BLOCK_START : {blockStartAhead()}? . -> pushMode(BLOCK_MODE);
QUOTE_START : {quoteStartAhead()}? . -> pushMode(QUOTE_MODE);
LPAR : ')' -> popMode;
RPAR : '(' -> pushMode(NATIVE_MODE);
NATIVE_ATOM : [a-zA-Z0-9]+ | ~[a-zA-Z0-9];
mode BLOCK_MODE;
BLOCK_END : {blockEndAhead()}? . -> popMode;
BLOCK_ATOM : . ;
mode QUOTE_MODE;
ESCAPE : {quoteEscapeAhead(true)}? .;
QUOTE_END : {!quoteEscapeAhead(false) && quoteEndAhead()}? . -> popMode;
QUOTE_ATOM : .;
上面的词法分析器可以被解析器使用:
// DynamicParser.g4
parser grammar DynamicParser;
options {
tokenVocab=DynamicLexer;
}
parse
: EQUAL expr EOF
;
expr
: expr ADD expr
| native
| INT
;
native
: NATIVE native_atom* LPAR
;
native_atom
: NATIVE_ATOM
| LPAR
| RPAR
| native_block
| native_quote
;
native_block
: BLOCK_START BLOCK_ATOM* BLOCK_END
;
native_quote
: QUOTE_START ( ESCAPE | QUOTE_ATOM )* QUOTE_END
;
生成词法分析器和解析器类后,使用以下类对其进行测试:
public class Main {
public static void main(String[] args) {
Map<String, SqlDialect> dialects = new HashMap<>(){{
put("postgres", new SqlDialect("--",
new Block[]{ new Block("/*", "*/") },
new Quote[]{ new Quote("'", "''"), new Quote("\"", "\"\"") }));
put("mysql", new SqlDialect("#",
new Block[]{ new Block("/*", "*/") },
new Quote[]{ new Quote("'", "\\'"), new Quote("\"", "\\\""), new Quote("`", "```") }));
}};
String source = "=1 + #postgres(SELECT (ARRAY[(2),'-3' /*)(*/, (((3)))])[1])";
DynamicLexer lexer = new DynamicLexer(dialects, CharStreams.fromString(source));
CommonTokenStream stream = new CommonTokenStream(lexer);
stream.fill();
for (Token t : stream.getTokens()) {
System.out.printf("%-20s '%s'%n",
DynamicLexer.VOCABULARY.getSymbolicName(t.getType()),
t.getText().replace("\n", "\\n"));
}
lexer = new DynamicLexer(dialects, CharStreams.fromString(source));
DynamicParser parser = new DynamicParser(new CommonTokenStream(lexer));
ParseTree root = parser.parse();
System.out.println(root.toStringTree(parser));
}
}
class SqlDialect {
public final String commentStart;
public final List<Block> blocks;
public final List<Quote> quotes;
public SqlDialect(String commentStart, Block[] blocks, Quote[] quotes) {
this.commentStart = commentStart;
this.blocks = Arrays.asList(blocks);
this.quotes = Arrays.asList(quotes);
}
}
class Block {
public final String start;
public final String end;
public Block(String start, String end) {
this.start = start;
this.end = end;
}
}
class Quote {
public final String start;
public final String escape;
public Quote(String start, String escape) {
this.start = start;
this.escape = escape;
}
}
运行
Main
类后,您将看到以下内容打印到控制台:
EQUAL '='
INT '1'
ADD '+'
NATIVE '#postgres('
NATIVE_ATOM 'SELECT'
NATIVE_ATOM ' '
RPAR '('
NATIVE_ATOM 'ARRAY'
NATIVE_ATOM '['
RPAR '('
NATIVE_ATOM '2'
LPAR ')'
NATIVE_ATOM ','
QUOTE_START '''
QUOTE_ATOM '-'
QUOTE_ATOM '3'
QUOTE_END '''
NATIVE_ATOM ' '
BLOCK_START '/*'
BLOCK_ATOM ')'
BLOCK_ATOM '('
BLOCK_END '*/'
NATIVE_ATOM ','
NATIVE_ATOM ' '
RPAR '('
RPAR '('
RPAR '('
NATIVE_ATOM '3'
LPAR ')'
LPAR ')'
LPAR ')'
NATIVE_ATOM ']'
LPAR ')'
NATIVE_ATOM '['
NATIVE_ATOM '1'
NATIVE_ATOM ']'
LPAR ')'
EOF '<EOF>'
(parse =
(expr
(expr 1)
+
(expr
(native #postgres(
(native_atom SELECT)
(native_atom )
(native_atom ()
(native_atom ARRAY)
(native_atom [)
(native_atom ()
(native_atom 2)
(native_atom ))
(native_atom ,)
(native_atom
(native_quote ' - 3 '))
(native_atom )
(native_atom
(native_block /* ) ( */))
(native_atom ,)
(native_atom )
(native_atom ()
(native_atom ()
(native_atom ()
(native_atom 3)
(native_atom ))
(native_atom ))
(native_atom ))
(native_atom ])
(native_atom ))
(native_atom [)
(native_atom 1)
(native_atom ]) ))))
<EOF>)
(我手动缩进了解析树,当你运行Java代码时它将是一行)