Skip to content

AFParser Creation Algorithm

Ronald Franco edited this page Dec 26, 2018 · 2 revisions

In this section, we discuss the creation of the AFParser Data Structure. Recall, the AFParser Data Structure is a graph that consists of different types of nodes, i.e. BaseParser objects. We discuss the creation of these nodes by discussing the creation of the following groups:

  • Formal Definitions
    • DEF – Nonterminal Formal Definition
    • TOK - Terminal Formal Definition
  • Instances
    • NON – Nonterminal Instance
    • TOK – Terminal Instance
  • Sequences
    • SEQ - Ordered group of grammar symbols to match
  • Alternations
    • ALT - Alternations of groups
  • Semantic Actions
    • ACT

Each of the above groups represent the semantic meaning of each type of node. That is, a DEF node can only represent the formal definition of a nonterminal and a TOK node could represent the formal definition or instance of a terminal. Further, an ACT node could only represent a semantic action, and so on.

Formal Definitions

Formal definitions for (non)terminals are created when the (non)terminal is declared. Their properties are as follows:

  • A terminal’s formal definition holds the terminal's token code and is never altered after its creation.
  • Terminals created with the Token() constructor do not have a formal definition.
  • A nonterminal's formal definition maintains the productions associated with the nonterminal and can be altered after its creation.
  • We add productions to a nonterminal's formal definition by using the assignment operator (=).
  • We add flow variables to a nonterminal's formal definition by using the () and >> operators on the LHS of the nonterminal's productions.

Example:

Parser<> term(tok_code); // term is the formal definition of the created terminal
                         // term maintains an integer token code

Parser<> non;            // non is the formal definition of a newly created nonterminal
                         // non maintains its productions, none defined so far...

int z; // dummy flow variable

// Formal definition altered to have in and out-flow variable z 
non(z)>>z = Token(‘a’);              // Formal definition altered, 1 production stored
non(z)>>z = Token(‘b’) & Token(‘c’); // Formal definition altered, 2 productions stored
non(z)>>z = Token(‘d’) | Token(‘e’); // Formal definition altered, 3 productions stored

Instances

Each (non)terminal instance maintains the flow variables associated with it and maintains a pointer to the (non)terminal's formal definition. (Non)terminals are created when the () and >> operators are invoked on its formal definition; however, its formal definition is not altered but a newly created instance is returned.

// flow variables
int a, b;

// all instances have a pointer to their formal definition
non(a); // instance of non with a pointer to in-flow variable a
non>>b; // instance of non with a pointer to out-flow variable b
non(a)>>b // instance of non with a pointer to in-flow variable a and pointer to out-flow variable b
term(a); // instance of term with a pointer to in-flow variable a
term>>b; // instance of term with a pointer to out-flow variable b
term(a)>>b; // instance of term with a pointer to in-flow variable a and pointer to out-flow variable b

Sequences

Sequences are an ordered group of grammar symbols, i.e. (non)terminals and semantic actions. To create a sequence, one would use the & operator where the operands are BaseParser objects. Each SEQ node has a member variable, arg_, that is a std::vector of BaseParser pointers that represent this ordered group of grammar symbols. Consider the example:

Token(‘H’) & Token(‘e’) & Token(‘l’) & Token(‘l’) & Token(‘o’);

The above example would return a SEQ node whose arg_ member data contains 5 elements that are all pointers to the above TOK nodes. A SEQ node is matched if all elements pointed to in its arg_ member data are matched.

Alternations

Alternations represent different groups of ordered grammar symbols that can be matched. To create an alternation, one would use | operator where the operands are BaseParser objects. Each ALT node has a member variable, arg_, that maintains pointers to different groups of grammar symbols that will potentially matched when parsing. Consider the example:

Parser<> A;
A = Token(‘H’) & Token(‘i’)
  | Token(‘G’);

The nonterminal A would consist of an ALT node whose arg_ member data contains two elements that are pointers to a SEQ node and a TOK node. An ALT node is matched if one element in it arg_ member data is matched.

Clone this wiki locally