我有一个 Excel 函数解析器,它应该根据参数的类型处理函数重载。问题是数字和字符串列类型基于外部上下文,因此解析器需要根据这些列的上下文选择适当的函数,例如:
input expression: IFNA(123, 234) -> correctly parsed as number_function
input expression: IFNA("foo", "bar") -> correctly parsed as string_function
但是使用列时我们遇到问题
columnsContext = {
column1: Type.String
column2: Type.String
}
input expression: IFNA(column1, column2)
上面应该根据列类型解析为字符串函数,但由于它是在语法中首先声明的,因此被识别为数字函数。
语法:
grammar ExcelLikeFunctionsGrammar;
expression : number | string;
number_function:
IFNA LEFT_PAREN number COMMA number RIGHT_PAREN |
... other functions;
string_function:
IFNA LEFT_PAREN string COMMA string RIGHT_PAREN |
... other functions;
number : NUMBER_CONSTANT | number_function | number_column ;
string : STRING_CONSTANT | string_function | string_column ;
number_column: ALPHANUMERIC;
string_column: ALPHANUMERIC;
我尝试使用语义谓词并在自定义解析器中添加逻辑来处理该问题,但当它转到
noViableAlt
时,它会抛出 isNumber()
异常,因为解析器已经将 IFNA
识别为数字函数,但列是字符串类型,因此谓词返回 false
语法:
grammar ExcelLikeFunctionsGrammar;
// this get override in the CustomParser
@members {
protected boolean isNumber() {
return true;
}
protected boolean isString() {
return true;
}
}
expression : number | string;
number_function:
IFNA LEFT_PAREN number COMMA number RIGHT_PAREN |
... other functions;
string_function:
IFNA LEFT_PAREN string COMMA string RIGHT_PAREN |
... other functions;
number : NUMBER_CONSTANT | number_function | {isNumber()}? number_column ;
string : STRING_CONSTANT | string_function | {isString()}? string_column ;
number_column: ALPHANUMERIC;
string_column: ALPHANUMERIC;
解析器:
public class ExcelLikeFunctionsGrammarCustomParser extends ExcelLikeFunctionsGrammarParser {
private final ReferenceContext referenceContext;
public ExcelLikeFunctionsGrammarCustomParser(TokenStream input, ReferenceContext referenceContext) {
super(input);
this.referenceContext = referenceContext;
}
@Override
protected final boolean isNumber() {
return checkColumnTokenType(TableColumnType.DECIMAL);
}
@Override
protected final boolean isString() {
return checkColumnTokenType(TableColumnType.STRING);
}
private boolean checkColumnTokenType(TableColumnType columnType) {
return checkTypeLogic(...);
}
我通过添加基于类型的列标记来解决词汇级别的歧义,从而解决了这个问题。然后我构建了一个自定义词法分析器,在其中根据
checkType()
操作中引用的类型更改了标记的类型。
使用这种方法,当需要解析输入时,标记已经输入,因此解析器可以进入正确的函数。
语法
grammar ExcelLikeFunctionsGrammar;
@lexer::members {
protected void checkType(String text) {}
}
expression : number | string;
number_function:
IFNA LEFT_PAREN number COMMA number RIGHT_PAREN |
... other functions;
string_function:
IFNA LEFT_PAREN string COMMA string RIGHT_PAREN |
... other functions;
number : NUMBER_CONSTANT | number_function | number_column ;
string : STRING_CONSTANT | string_function | string_column ;
number_column: NUMBER_COLUMN;
string_column: STRING_COLUMN;
// lexer
ALPHANUMERIC : (LOWERCASE | UPPERCASE | DIGIT | UNDERSCORE) (LETTER | ' ')* (LOWERCASE | UPPERCASE | DIGIT | UNDERSCORE) {checkType(getText());} ;
NUMBER_COLUMN : ALPHANUMERIC ;
STRING_COLUMN : ALPHANUMERIC ;
自定义词法分析器
public class ExcelLikeFunctionsGrammarCustomLexer extends ExcelLikeFunctionsGrammarLexer {
private final ReferenceContext referenceContext;
public ExcelLikeFunctionsGrammarCustomLexer(CharStream input, ReferenceContext referenceContext) {
super(input);
this.referenceContext = referenceContext;
}
@Override
protected void checkType(String text) {
final var column = referenceContext.getCurrentSchema().getColumns().stream().filter(c -> c.getName().equals(text)).toList();
if (!column.isEmpty()) {
switch (column.get(0).getType()) {
case DECIMAL, BIGINT -> setType(ExcelLikeFunctionsGrammarParser.NUMBER_COLUMN);
case STRING -> setType(ExcelLikeFunctionsGrammarParser.STRING_COLUMN);
case BOOLEAN -> setType(ExcelLikeFunctionsGrammarParser.LOGICAL_COLUMN);
case DATETIME -> setType(ExcelLikeFunctionsGrammarParser.DATE_COLUMN);
default -> setType(ExcelLikeFunctionsGrammarParser.ALPHANUMERIC);
}
} else {
setType(ExcelLikeFunctionsGrammarParser.ALPHANUMERIC);
}
}
}