一般来说,我正在尝试创建一个基于Java的应用程序,在这里我可以compile支持简单正则表达式的术语词典。然后,该词典将用于创建简单实体tagger,其中在文本中标记了已识别的术语。以为ANTLR可以满足我的需求。试图创建一个不依赖于已编译语法和词法分析器文件的Java应用程序,因为语法必须在运行时每隔几分钟进行更新。
这里是我简单的“ Hello World”应用程序:
LexerGrammar lg = new LexerGrammar(
"lexer grammar L;\n" +
"A : ('a'|'A');\n" +
"B : ('b'|'B');\n" +
"C : ('c'|'C');\n" +
"D : ('d'|'D');\n" +
"FILL_TOKEN : (.);\n");
Grammar g = new Grammar(
"parser grammar T;\n" +
"t_abc : A FILL_TOKEN? B FILL_TOKEN? C;\n" +
"t_abcd : A FILL_TOKEN? B FILL_TOKEN? C FILL_TOKEN? D;\n" +
"rule0 : t_abcd|t_abc;\n" +
"ws : '.' -> skip ;\n",
lg);
LexerInterpreter lexEngine =
lg.createLexerInterpreter(new ANTLRInputStream("Test A BCD"));
CommonTokenStream tokens = new CommonTokenStream(lexEngine);
ParserInterpreter parser = g.createParserInterpreter(tokens);
Rule rule = g.rules.get("rule0");
ParseTree t = parser.parse(rule.index);
System.out.println(t.getText());
当我尝试编译应用程序时,出现以下错误
Exception in thread "main" java.lang.NullPointerException
at org.antlr.v4.runtime.atn.ATNSerializer.serialize(ATNSerializer.java:73)
at org.antlr.v4.runtime.atn.ATNSerializer.getSerialized(ATNSerializer.java:601)
at org.antlr.v4.runtime.atn.ATNSerializer.getSerializedAsChars(ATNSerializer.java:605)
at org.antlr.v4.tool.Grammar.createParserInterpreter(Grammar.java:1337)
at main.OnTheFly.main(OnTheFly.java:98)
[当我注释掉语法的"ws : '.' -> skip ;\n",
部分时,程序运行,但是它抱怨Test
未知。
我做错了,还是默认语法不支持skip
参数?使用Antlr 4.7.2和Java 1.8.0(131)
找到答案。只有词法分析器支持skip
参数,此外,我只需要全部词法分析器。可以通过查看结果标记来检索匹配项:
...
// using code from above with grammar part, including SKIP rule.
// In additions, all tokens have to be defined in
// ...
// required to process the input stream
tokens.fill();
for (Token token : tokens.getTokens()) {
int typeId = token.getType();
if (-1 == typeId) {
break;
}
String ruleName = lexEngine.getRuleNames()[token.getType() - 1];
System.out.println("Token: " + token.getText() + " - " + ruleName);
}
有关词法和语法词汇的更多信息,可以在这里找到:
https://github.com/antlr/antlr4/blob/master/doc/lexer-rules.md
https://github.com/antlr/antlr4/blob/master/doc/parser-rules.md