解析规则Decaf语法antlr4

问题描述 投票:1回答:1

我正在为用ANTLR4编写的Decaf编程语言创建解析器和词法分析器规则。我正在尝试解析测试文件并不断出现错误,语法上肯定有问题,但我无法弄清楚。

测试文件看起来像:

class Program {
  int i[10];
}

错误是:第2:8行的输入'10'不匹配,预期为INT_LITERAL

这是完整的Decaf.g4语法文件

grammar Decaf;


/*
  LEXER RULES
  -----------
  Lexer rules define the basic syntax of individual words and symbols of a
  valid Decaf program. Lexer rules follow regular expression syntax.
  Complete the lexer rules following the Decaf Language Specification.
*/



CLASS : 'class';

INT : 'int';

RETURN : 'return';

VOID : 'void';

IF : 'if';

ELSE : 'else';

FOR : 'for';

BREAK : 'break';

CONTINUE : 'continue';

CALLOUT : 'callout';

TRUE : 'True' ;

FALSE : 'False' ;

BOOLEAN : 'boolean';

LCURLY : '{';

RCURLY : '}';

LBRACE : '(';

RBRACE : ')';


LSQUARE : '[';

RSQUARE : ']';
ADD : '+';

SUB : '-';

MUL : '*';

DIV : '/';

EQ : '=';

SEMI : ';';

COMMA : ',';

AND : '&&';

LESS : '<';

GREATER : '>';

LESSEQUAL : '<=' ;

GREATEREQUAL : '>=' ;

EQUALTO : '==' ;

NOTEQUAL : '!=' ;

EXCLAMATION : '!';



fragment CHAR : (' '..'!') | ('#'..'&') | ('('..'[') | (']'..'~') | ('\\'[']) | ('\\"') | ('\\') | ('\t') | ('\n');

CHAR_LITERAL : '\'' CHAR '\'';

//STRING_LITERAL : '"' CHAR+ '"' ;


HEXMARK : '0x';

fragment HEXA : [a-fA-F];

fragment HEXDIGIT : DIGIT | HEXA ;

HEX_LITERAL : HEXMARK HEXDIGIT+;


STRING : '"' (ESC|.)*? '"';

fragment ESC : '\\"' | '\\\\';




fragment DIGIT : [0-9];

DECIMAL_LITERAL : DIGIT(DIGIT)*;



COMMENT : '//' ~('\n')* '\n' -> skip;

WS : (' ' | '\n' | '\t' | '\r') + -> skip;

fragment ALPHA : [a-zA-Z] | '_';

fragment ALPHA_NUM : ALPHA | DIGIT;



ID : ALPHA ALPHA_NUM*;

INT_LITERAL : DECIMAL_LITERAL | HEX_LITERAL;

BOOL_LITERAL : TRUE | FALSE;

/*
  PARSER RULES
  ------------
  Parser rules are all lower case, and make use of lexer rules defined above
  and other parser rules defined below. Parser rules also follow regular
  expression syntax. Complete the parser rules following the Decaf Language
  Specification.
*/




program : CLASS ID LCURLY field_decl* method_decl* RCURLY EOF;

field_name : ID | ID LSQUARE INT_LITERAL RSQUARE;

field_decl : datatype field_name (COMMA field_name)* SEMI;

method_decl : (datatype | VOID) ID LBRACE ((datatype ID) (COMMA datatype ID)*)? RBRACE block;

block : LCURLY var_decl* statement* RCURLY;

var_decl : datatype ID (COMMA ID)* SEMI;


datatype : INT | BOOLEAN;

statement : location assign_op expr SEMI
        | method_call SEMI
        | IF LBRACE expr RBRACE block (ELSE block)?
        | FOR ID EQ expr COMMA expr block
        | RETURN (expr)? SEMI
        | BREAK SEMI
        | CONTINUE SEMI
        | block;

assign_op : EQ
          | ADD EQ
          | SUB EQ;


method_call : method_name LBRACE (expr (COMMA expr)*)? RBRACE
            | CALLOUT LBRACE STRING(COMMA callout_arg (COMMA callout_arg)*) RBRACE;


method_name : ID;

location : ID | ID LSQUARE expr RSQUARE;


expr : location
     | method_call
     | literal
     | expr bin_op expr
     | SUB expr
     | EXCLAMATION expr
     | LBRACE expr RBRACE;

 callout_arg : expr
            | STRING ;

bin_op : arith_op
      | rel_op
      | eq_op
      | cond_op;


arith_op : ADD | SUB | MUL | DIV | '%' ;

rel_op : LESS | GREATER | LESSEQUAL | GREATEREQUAL ;

eq_op : EQUALTO | NOTEQUAL ;

cond_op : AND | '||' ;

literal : INT_LITERAL | CHAR_LITERAL | BOOL_LITERAL ;
python parsing antlr antlr4 lexer
1个回答
0
投票

[只要有2个或更多与相同字符匹配的词法分析器规则,则首先定义的规则为准。您的情况,这两个规则都匹配10

DECIMAL_LITERAL : DIGIT(DIGIT)*;

INT_LITERAL : DECIMAL_LITERAL | HEX_LITERAL;

并且INT_LITERALDECIMAL_LITERAL之后定义,因此词法分析器将永远不会创建INT_LITERAL标记。如果现在尝试在解析器规则中使用它,则会收到发布的错误消息。

解决方案:从词法分析器中删除INT_LITERAL并创建解析器规则:

int_literal : DECIMAL_LITERAL | HEX_LITERAL;

并在解析器规则中使用int_literal

© www.soinside.com 2019 - 2024. All rights reserved.