考虑这个语法:
expr ::= LP expr RP
| expr PLUS|MINUS expr
| expr STAR|SLASH expr
| term
term ::= INTEGER|FLOAT
上下文无关文法被定义为G = ( V, Σ, R, S )
,其中(在这种情况下):
V = { expr, term }
Σ = { LP, RP, PLUS, MINUS, STAR, SLASH, INTEGER, FLOAT }
R = //was defined above
S = expr
现在让我们定义一个小的类叫做Parser
其定义是(在C中提供的代码示例++):
class Parser
{
public:
Parser();
void Parse();
private:
void parseRecursive(vector<string> rules, int ruleIndex, int startingTokenIndex, string prevRule);
bool isTerm(string token); //returns true if token is in upper case
vector<string> split(...); //input: string; output: vector of words splitted by delim
map<string, vector<string>> ruleNames; //contains grammar definition
vector<int> tokenList; //our input set of tokens
};
为了便于规则之间去,每个语法规则被分成两个部分:一键(::=
前)和规则(::=
之后),所以从下面的地图上面我的语法发生:
std::map<string, vector<string>> ruleNames =
{
{ "expr", {
"LP expr RP",
"expr PLUS|MINUS expr",
"expr STAR|SLASH expr",
"term"
}
},
{ "term", { "INTEGER", "FLOAT" } }
};
出于测试目的,串(2 + 3) * 4
已经被符号化,以下面的一组
{ TK_LP, TK_INTEGER, TK_PLUS, TK_INTEGER, TK_RP, TK_STAR, TK_INTEGER }
和被用作Parser
输入数据。
现在最难的部分:算法。据我了解,我在想这个问题:
1)从起始符号向量(LP expr RP
)以第一规则,并将它分割成单词。
2)检查在规则第一个字是终端。
虽然我在这个算法我不知道,我还在努力让和执行它(当然,不成功的):
1)外qazxsw POI功能发起递归:
Parse
2)递归函数:
void Parser::Parse()
{
int startingTokenIndex = 0;
string word = this->startingSymbol;
for (int ruleIndex = 0; ruleIndex < this->ruleNames[word].size(); ruleIndex++)
{
this->parseRecursive(this->ruleNames[word], ruleIndex, startingTokenIndex, "");
}
}
我应该执行什么样的变化,以我的算法,使其工作?
根据你想实现一个时间解析器或编译器编译什么,使用了不同的方法。编译器的编译器主要用于LR,用于手动执行LL的。基本上,对于LL,手动实现使用递归下降(对于每个非末端,一个功能被创建实现它)。例如,对于语法:
void Parser::parseRecursive(vector<string> rules, unsigned ruleIndex, unsigned startingTokenIndex, string prevRule)
{
printf("%s - %s\n", rules[ruleIndex].c_str(), this->tokenNames[this->tokenList[startingTokenIndex]].c_str());
vector<string> temp = this->split(rules[ruleIndex], ' ');
vector<vector<string>> ruleWords;
bool breakOutter = false;
for (unsigned wordIndex = 0; wordIndex < temp.size(); wordIndex++)
{
ruleWords.push_back(this->split(temp[wordIndex], '|'));
}
for (unsigned wordIndex = 0; wordIndex < ruleWords.size(); wordIndex++)
{
breakOutter = false;
for (unsigned subWordIndex = 0; subWordIndex < ruleWords[wordIndex].size(); subWordIndex++)
{
string word = ruleWords[wordIndex][subWordIndex];
if (this->isTerm(word))
{
if (this->tokenNames[this->tokenList[startingTokenIndex]] == this->makeToken(word))
{
printf("%s ", word.c_str());
startingTokenIndex++;
} else {
breakOutter = true;
break;
}
} else {
if (prevRule != word)
{
startingTokenIndex++;
this->parseRecursive(this->ruleNames[word], 0, startingTokenIndex, word);
prevRule = word;
}
}
}
if (breakOutter)
break;
}
}
让我们杀左递归和左因子(LL文法不与左递归工作):
S -> S + A | A
A -> a | b
这样的实现是可能的:
S -> As
s -> + As | epsilon
A -> a | b
如果你想添加SDD,则继承的属性通过参数传递,并将合成的属性的输出值。
评论:一次不收集所有的代币,让他们根据需要:get_next_token()。