我想使用flex和bison为迷你C语言编写一个编译器。我的语言示例如下所示:
/* This is an example uC program. */
int fac(int n)
{
if (n < 2)
return n;
return n * fac(n - 1);
}
int sum(int n, int a[])
{
int i;
int s;
i = 0;
s = 0;
while (i < n) {
s = s + a[i];
i = i + 1;
}
return s;
}
int main(void)
{
int a[2];
a[0] = fac(5);
a[1] = 27;
return 0;
}
这是我解析这种语言的语法:
program ::= topdec_list
topdec_list ::= /empty/ | topdec topdec_list
topdec ::= vardec ";"
| funtype ident "(" formals ")" funbody
vardec ::= scalardec | arraydec
scalardec ::= typename ident
arraydec ::= typename ident "[" intconst "]"
typename ::= "int" | "char"
funtype ::= typename | "void"
funbody ::= "{" locals stmts "}" | ";"
formals ::= "void" | formal_list
formal_list ::= formaldec | formaldec "," formal_list
formaldec ::= scalardec | typename ident "[" "]"
locals ::= /empty/ | vardec ";" locals
stmts ::= /empty/ | stmt stmts
stmt ::= expr ";"
| "return" expr ";" | "return" ";"
| "while" condition stmt
| "if" condition stmt else_part
| "{" stmts "}"
| ";"
else_part ::= /empty/ | "else" stmt
condition ::= "(" expr ")"
expr ::= intconst
| ident | ident "[" expr "]"
| unop expr
| expr binop expr
| ident "(" actuals ")"
| "(" expr ")"
unop ::= "-" | "!"
binop ::= "+" | "-" | "*" | "/"
| "<" | ">" | "<=" | ">=" | "!=" | "=="
| "&&"
| "="
actuals ::= /empty/ | expr_list
expr_list ::= expr | expr "," expr_list
flex模块工作正常,并根据需要返回值,但是bison不断给我一些奇怪的错误,表明我的语法错误(不是)。这是我的野牛文件:
%{
#include <stdio.h>
#include <stdlib.h>
extern int yylex();
extern int yyparse();
FILE* yyin;
extern int lineNumber;
void yyerror(const char* s);
%}
%union {
int intvalue;
}
%token<intvalue> INTCONST
%token IDENT
%token CHARK
%token ELSEK
%token IFK
%token INTK
%token RETURNK
%token VOIDK
%token WHILEK
%token NOTK
%token ANDK
%token COMMAK
%token DIVIDEK
%token MULTIPLYK
%token MINUSK
%token PLUSK
%token SEMICOLONK
%token NEQUALK
%token EQUALK
%token ASSIGNMENTK
%token RECOMPARATORK
%token LECOMPARATORK
%token RCOMPARATORK
%token LCOMPARATORK
%token RPARANTESESK
%token LPARANTESESK
%token RBRACKETK
%token LBRACKETK
%token RCURLY
%token LCURLY
%right ASSIGNMENTK
%left ANDK
%left EQUALK NEQUALK
%left LCOMPARATORK RCOMPARATORK LECOMPARATORK RECOMPARATORK
%left PLUSK
%left MULTIPLYK DIVIDEK
%left MINUSK NOTK
%start program
%%
program: topdec_list
;
topdec_list: /*empty*/
| topdec topdec_list
;
topdec: vardec SEMICOLONK
| funtype IDENT LPARANTESESK formals RPARANTESESK funbody
;
vardec: scalardec
| arraydec
;
scalardec: typename IDENT
;
arraydec: typename IDENT LBRACKETK INTCONST RBRACKETK
;
typename: INTK
| CHARK
;
funtype: typename
| VOIDK
;
funbody: LCURLY locals stmts RCURLY
| SEMICOLONK
;
formals: VOIDK
| formal_list
;
formal_list: formaldec
| formaldec COMMAK formal_list
;
formaldec: scalardec
| typename IDENT LBRACKETK RBRACKETK
;
locals: /*empty*/
| vardec SEMICOLONK locals
;
stmts: /*empty*/
| stmt stmts
;
stmt: expr SEMICOLONK
| RETURNK expr SEMICOLONK
| RETURNK SEMICOLONK
| WHILEK condition stmt
| IFK condition stmt else_part
| LCURLY stmts RCURLY
| SEMICOLONK
;
else_part: /*empty*/ | ELSEK stmt
;
condition: LPARANTESESK expr RPARANTESESK
;
expr: INTCONST
| IDENT
| IDENT LBRACKETK expr RBRACKETK
| unop expr
| expr binop expr
| IDENT LPARANTESESK actuals RPARANTESESK
| LPARANTESESK expr RPARANTESESK
;
unop: MINUSK | NOTK
;
binop: PLUSK
| MINUSK
| MULTIPLYK
| DIVIDEK
| LCOMPARATORK
| RCOMPARATORK
| LECOMPARATORK
| RECOMPARATORK
| NEQUALK
| EQUALK
| ANDK
| ASSIGNMENTK
;
actuals: /*empty*/
| expr_list
;
expr_list: expr
| expr COMMAK expr_list
;
%%
int main(int argc, char **argv)
{
yyin = fopen("input.c", "r");
do
{
yyparse();
} while (!feof(yyin));
fclose(yyin);
return 0;
}
void yyerror(const char* error)
{
fprintf(stderr, "Parse error in line %d: %s\n", lineNumber, error);
}
例如,对于此输入:
int i;
i = 0;
我得到一个错误,在它区分第二个i
之后发生语法错误(我在我的flex文件中打印令牌,所以我知道它没有问题,直到它到达赋值字符)。
或者作为传递此行的另一个例子:
int fac(int n);
我在第一个paranteses之后得到了相同的语法错误(正好是Parse error in line 1: syntax error
),这意味着它将第二个int
视为语法错误,它不应该因为我的语法看起来很好。
野牛产生的警告如下(flex和gcc很好):
semantic_analyzer.y: warning: 26 shift/reduce conflicts [-Wconflicts-sr]
semantic_analyzer.y:78.10-17: warning: rule useless in parser due to conflicts [-Wother]
funtype: typename
^^^^^^^^
任何建议或更正表示赞赏:)提前感谢。
int i;
i = 0;
不是一个有效的C程序,你的语法正确拒绝它。 (正如你的语法所示,程序是一系列声明,i = 10;
不是声明。)
你提到的第二个问题是由于你的语法导致int
中的第一个int fac(int n)
必须减少到funtype
,而int
中的int ;
是typename
引起的移位减少冲突。在解析器需要决定是否移动IDENT
或将typename
减少到funtype
时,它无法知道要采取哪个动作,因为它只能看到一个先行令牌 - IDENT
- 并且决定取决于第二个下一个预见标记(可能是也可能不是开括号)。默认情况下,yacc / bison通过移位来解决shift-reduce冲突,这意味着int fac
不能成为topdec
第二次生成的前缀;因此,(
将触发错误。