ANTLR4 语法不能正确匹配字符串中的转义引号

Question

我正在尝试为一种语言创建语法，该语言对字符串使用双引号并允许使用反斜杠转义引号。我正在使用 ANTLR4 来解析输入。

我定义了以下匹配字符串的规则：

STRING:
    '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
    ;
fragment
ESC_SEQ
    :   '\\'
        (   // The standard escaped character set such as tab, newline, etc.
            [btnfr"'\\]
            |
        |   // A Java style Unicode escape sequence
            UNICODE_ESC
        |   // Invalid escape
            .
        |   // Invalid escape at end of file
            EOF
        )
    ;

fragment
UNICODE_ESC
    :   'u' (HEX_DIGIT (HEX_DIGIT (HEX_DIGIT HEX_DIGIT?)?)?)?
;

但是，此规则似乎无法正确匹配在字符串末尾包含转义引号的字符串。例如，字符串

"test \"string\" that works"

被正确解析，但当我的字符串类似于

"test string that does \"not work\""

时，此规则不起作用。它也适用于和其他转义字符。

（我期待看到

"test string that "works""

作为输出）

我尝试修改规则以转义引号字符中的反斜杠，如下所示：

STRING:
    '"' ( ESC_SEQ | ~('\\'|'"') )* '"' | ('\\' '"'))
fragment
ESC_SEQ
    :   '\\'
        (   // The standard escaped character set such as tab, newline, etc.
            [btnfr"'\\]
            |
        |   // A Java style Unicode escape sequence
            UNICODE_ESC
        |   // Invalid escape
            .
        |   // Invalid escape at end of file
            EOF
        )
    ;

fragment
UNICODE_ESC
    :   'u' (HEX_DIGIT (HEX_DIGIT (HEX_DIGIT HEX_DIGIT?)?)?)?
;
    ;

但这仍然不起作用。

我做错了什么？如何修改我的语法以正确匹配带有转义引号的字符串？

Answer 1

ESC_SEQ

不会“逃脱”序列。你匹配

\"

所以这就是你在输出中得到的。

请参阅https://github.com/antlr/antlr4/blob/master/doc/lexer-rules.md了解如何在各种令牌上重写/跳过等以修复它。

ANTLR4 语法不能正确匹配字符串中的转义引号

问题描述投票：0回答：1

1个回答

最新问题

ANTLR4 语法不能正确匹配字符串中的转义引号

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1