-
Notifications
You must be signed in to change notification settings - Fork 2
AFParser Creation Algorithm
In this section, we discuss the creation of the AFParser Data Structure. Recall, the AFParser Data Structure is a graph that consists of different types of nodes, i.e. BaseParser objects. We discuss the creation of these nodes by discussing the creation of the following groups:
- Formal Definitions
- DEF – Nonterminal Formal Definition
- TOK - Terminal Formal Definition
- Instances
- NON – Nonterminal Instance
- TOK – Terminal Instance
- Sequences
- SEQ - Ordered group of grammar symbols to match
- Alternations
- ALT - Alternations of groups
- Semantic Actions
- ACT
Each of the above groups represent the semantic meaning of each type of node. That is, a DEF node can only represent the formal definition of a nonterminal and a TOK node could represent the formal definition or instance of a terminal. Further, an ACT node could only represent a semantic action, and so on.
Formal definitions for (non)terminals are created when the (non)terminal is declared. Their properties are as follows:
- A terminal’s formal definition holds the terminal's token code and is never altered after its creation.
- Terminals created with the
Token()
constructor do not have a formal definition. - A nonterminal's formal definition maintains the productions associated with the nonterminal and can be altered after its creation.
- We add productions to a nonterminal's formal definition by using the assignment operator (=).
- We add flow variables to a nonterminal's formal definition by using the () and >> operators on the LHS of the nonterminal's productions.
Example:
Parser<> term(tok_code); // term is the formal definition of the created terminal
// term maintains an integer token code
Parser<> non; // non is the formal definition of a newly created nonterminal
// non maintains its productions, none defined so far...
int z; // dummy flow variable
// Formal definition altered to have in and out-flow variable z
non(z)>>z = Token(‘a’); // Formal definition altered, 1 production stored
non(z)>>z = Token(‘b’) & Token(‘c’); // Formal definition altered, 2 productions stored
non(z)>>z = Token(‘d’) | Token(‘e’); // Formal definition altered, 3 productions stored
Each (non)terminal instance maintains the flow variables associated with it and maintains a pointer to the (non)terminal's formal definition. (Non)terminals are created when the () and >> operators are invoked on its formal definition; however, its formal definition is not altered but a newly created instance is returned.
// flow variables
int a, b;
// all instances have a pointer to their formal definition
non(a); // instance of non with a pointer to in-flow variable a
non>>b; // instance of non with a pointer to out-flow variable b
non(a)>>b // instance of non with a pointer to in-flow variable a and pointer to out-flow variable b
term(a); // instance of term with a pointer to in-flow variable a
term>>b; // instance of term with a pointer to out-flow variable b
term(a)>>b; // instance of term with a pointer to in-flow variable a and pointer to out-flow variable b
Sequences are an ordered group of grammar symbols, i.e. (non)terminals and semantic actions. To create a sequence, one would use the & operator where the operands are BaseParser objects. Each SEQ node has a member variable, arg_
, that is a std::vector of BaseParser pointers that represent this ordered group of grammar symbols. Consider the example:
Token(‘H’) & Token(‘e’) & Token(‘l’) & Token(‘l’) & Token(‘o’);
The above example would return a SEQ node whose arg_
member data contains 5 elements that are all pointers to the above TOK nodes. A SEQ node is matched if all elements pointed to in its arg_
member data are matched.
Alternations represent different groups of ordered grammar symbols that can be matched. To create an alternation, one would use | operator where the operands are BaseParser objects. Each ALT node has a member variable, arg_
, that maintains pointers to different groups of grammar symbols that will potentially matched when parsing. Consider the example:
Parser<> A;
A = Token(‘H’) & Token(‘i’)
| Token(‘G’);
The nonterminal A
would consist of an ALT node whose arg_
member data contains two elements that are pointers to a SEQ node and a TOK node. An ALT node is matched if one element in it arg_
member data is matched.