Skip to content

Example: Definite Clause Grammars

Ronald Franco edited this page Dec 22, 2018 · 1 revision

Definite clause grammars are a way of expressing a grammar in a logic programming language such as Prolog. Though DCGs can be used for formal and natural languages, it is primarily used to parse natural languages due to its ability to represent linguistic features fairly concisely. Consider the following set of DCG rules:

sentence --> pronoun(subject), verb_phrase.
verb_phrase --> verb, pronoun(object).
pronoun(subject) --> [he].
pronoun(subject) --> [she].
pronoun(object) --> [him].
pronoun(object) --> [her].
verb --> [likes].

The above grammar accepts sentences such as “he likes her” and “he likes him” but does not accept sentences like “her likes he” and “him likes him”. The following is an AFG equivalent of the above definite clause grammar.

#include <cstring>
#include <set>
#include "parser.h"
#include "tokenizer.h"
#include "tokenstream.h"

#define WORD_TOK_CODE -1

/* sentenceTokenizer tokenizes words based on spaces */
class sentenceTokenizer : public Tokenizer
{
  public:
  sentenceTokenizer(std::string str)
  {
    /* make copy of input string */
    char * input = nullptr;
    input = strdup(str.c_str());

    /* get token */
    char * tok = std::strtok(input," ");
    
    /* check if token was found */
    while (tok != NULL)
    {
      /* token is a word, add token */
      emplace_back(WORD_TOK_CODE,tok,strlen(tok),0,0);

      /* get next token */ 
      tok = std::strtok(NULL,", ");
    }
    free(input);
  }
};

/* Overload extraction operator of TokenStream class to allow for word type
 * checking */
TokenStream<std::set<std::string>>&
    operator>>(TokenStream<std::set<std::string>> & in, int& out)
{
  (void) out;

  /* Check if word is of correct type */
  if (in.get_in()->find(in.get_text()) == in.get_in()->end())
    throw parsing_error("Invalid word");
   
  return in;
}

int main()
{
  int dummy = 0;

  /* sets for different types of words */
  std::set<std::string> subject_words;
  std::set<std::string> object_words;
  std::set<std::string> verb_words;

  /* add subject words */
  subject_words.insert("he");
  subject_words.insert("she");

  /* add object words */
  object_words.insert("him");
  object_words.insert("her");
  
  /* add verbs */
  verb_words.insert("likes");

  /* declare nonterminal start and terminal double_ */
  Parser<> sentence, verb_phrase;
  Parser<std::set<std::string>,int> word(-1);

  /* grammar */
  sentence = word(subject_words)>>dummy & verb_phrase;

  verb_phrase = word(verb_words)>>dummy & word(object_words)>>dummy;

  /* begin user prompt */
  std::cout << "==============================================\n\n";
  std::cout << "\tA simple sentence parser...\n\n";
  std::cout << "\tFormat: subject verb object \n\n";
  std::cout << "==============================================\n\n";

  std::cout << "Give me a sentence.\n";
  std::cout << "Type [q or Q] to quit: ";

  std::string str;
  while (getline(std::cin, str))
  {
    /* quit */
    if (str.empty() || str[0] == 'q' || str[0] == 'Q')
      break;

    /* tokenize input string */
    sentenceTokenizer tokens(str);

    /* parse using sentence as starting nonterminal */
    if (sentence.parse(&tokens))
    {
      std::cout << "-------------------------\n";
      std::cout << "Parsing succeeded\n";
      std::cout << str << " parses OK.\n" << std::endl;
    }
    else
    {
      std::cout << "-------------------------\n";
      std::cout << "Parsing failed\n" << std::endl;
    }

    /* prompt user */
    std::cout << "Give me a sentence.\n";
    std::cout << "Type [q or Q] to quit: ";

  }

  std::cout << "\nBye." << std::endl;
  return 0;
}
Clone this wiki locally