Skip to content

Scanning

Ronald Franco edited this page Dec 26, 2018 · 3 revisions

Scanning

The created LL(k) parser expects token-based input stored in a Tokenizer-derived object.

Tokenizer Class

The Tokenizer class is defined in tokenizer.h and maintains a std::vector of Token objects. The Tokenizer class is designed to be inherited by a custom class that handles the storing of tokens, using some scanning method.

Token Structure

The Token structure maintains the following information:

int code; // a unique integer to represent this type of token
std::string text; // contains lexeme of token
size_t line_no; // contains the line number of the lexeme
size_t columno; // contains the column number of the lexeme

Derived Classes

The Tokenizer class provides functions for basic maintenance of a std::vector of Token structures and is designed to be a base class. The derived classes are created by the user to implement a scanning method. The FlexTokenizer class, defined in flextokenizer.h, is an example Tokenizer-derived class that scans and tokenizes input using Flex.

Below is another example Tokenizer-derived class that tokenizes the bits of a binary number:

class BinaryTokenizer : public Tokenizer
{
  public:
    BinaryTokenizer(std::string str)
    {
      for (unsigned int i = 0; i < str.length(); i++)
        if (str[i] == '0')
          emplace_back('0',"0",1,0,0);
        else if (str[i] == '1')
          emplace_back('1',"1",1,0,0);
        else
          break;
    }
};

Other examples of Tokenizer-derived classes can be seen in the Examples wiki page.

Clone this wiki locally