Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: provide Template Haskell quasiquoter that generates lexer definition in-place #108

Open
sergv opened this issue Jun 17, 2017 · 1 comment

Comments

@sergv
Copy link
Contributor

sergv commented Jun 17, 2017

Currently alex is a command-line tool that takes files in its own format and relies on support from Cabal-the-library to invoke alex command in order to preprocess .x files before passing them to ghc. This works reasonably well, but I think there's something to be gained if alex package would provide quasiquoter that takes lexer definition in the currently used format but spits out TH expressions instead of generating new file.

E.g. take tokens_scan_user.x test from the alex test suite. Instead of

{
module Main (main) where
import System.Exit
}

%wrapper "basic" -- Defines: AlexInput, alexGetByte, alexPrevChar

$digit = 0-9
$alpha = [a-zA-Z]
$ws    = [\ \t\n]

tokens :-

  5 / {\ u _ibt _l _iat -> u == FiveIsMagic} { \s -> TFive (head s) }
  $digit { \s -> TDigit (head s) }
  $alpha { \s -> TAlpha (head s) }
  $ws    { \s -> TWSpace (head s) }

{

data Token = TDigit Char
           | TAlpha Char
           | TWSpace Char
           | TFive Char -- Predicated only
           | TLexError
    deriving (Eq,Show)

data UserLexerMode = NormalMode
                   | FiveIsMagic
    deriving Eq

main | test1 /= result1 = exitFailure
     | test2 /= result2 = exitFailure
     -- all succeeded
     | otherwise        = exitWith ExitSuccess

run_lexer :: UserLexerMode -> String -> [Token]
run_lexer m s = go ('\n', [], s)
    where go i@(_,_,s') = case alexScanUser m i 0 of
                     AlexEOF             -> []
                     AlexError  _i       -> [TLexError]
                     AlexSkip   i' _len  ->                   go i'
                     AlexToken  i' len t -> t (take len s') : go i'

test1 = run_lexer FiveIsMagic "5 x"
result1 = [TFive '5',TWSpace ' ',TAlpha 'x']

test2 = run_lexer NormalMode "5 x"
result2 = [TDigit '5',TWSpace ' ',TAlpha 'x']
}

I'd like to write TokensScanUser.hs file that looks like:

{-# LANGUAGE QuasiQuotes #-}

module Main (main) where
import System.Exit

import Alex.TH

genLexer defaultLexer [alex|
%wrapper "basic" -- Defines: AlexInput, alexGetByte, alexPrevChar

$digit = 0-9
$alpha = [a-zA-Z]
$ws    = [\ \t\n]

tokens :-

  5 / {\u _ibt _l _iat -> u == FiveIsMagic} { \s -> TFive (head s) }
  $digit { \s -> TDigit (head s) }
  $alpha { \s -> TAlpha (head s) }
  $ws    { \s -> TWSpace (head s) }
|]

data Token = TDigit Char
           | TAlpha Char
           | TWSpace Char
           | TFive Char -- Predicated only
           | TLexError
    deriving (Eq,Show)

data UserLexerMode = NormalMode
                   | FiveIsMagic
    deriving Eq

main | test1 /= result1 = exitFailure
     | test2 /= result2 = exitFailure
     -- all succeeded
     | otherwise        = exitWith ExitSuccess

run_lexer :: UserLexerMode -> String -> [Token]
run_lexer m s = go ('\n', [], s)
    where go i@(_,_,s') = case alexScanUser m i 0 of
                     AlexEOF             -> []
                     AlexError  _i       -> [TLexError]
                     AlexSkip   i' _len  ->                   go i'
                     AlexToken  i' len t -> t (take len s') : go i'

test1 = run_lexer FiveIsMagic "5 x"
result1 = [TFive '5',TWSpace ' ',TAlpha 'x']

test2 = run_lexer NormalMode "5 x"
result2 = [TDigit '5',TWSpace ' ',TAlpha 'x']

Having such quasiquoter will provide following benefits:

  • This will provide an option to use alex independent of system's preprocessor if, say, clang starts to behave funny
  • This can help end confusion of text editors with overly long lines as mentioned Get rid of overly long lines in generated .hs file #84
  • There will be no nowelines in the generated files (becase there will be no generated files) that may annoy someone GHC-generated LINE annotation might cause non-deterministic build #105
  • User will be able to just start ghci in his or hers project and load lexer definition to play with - no need to add dist/build/ directory (or different directory, depending on build target and the build tool (cabal has one prefix here, stack has another depending on the snapshot))to ghci path any more
  • Indexing with e.g. tag generators should also improve because these programs will just skip quasiquoter part and index all user-defined functions.
@simonmar
Copy link
Member

Yes, this would be a great feature to have. Other people have done it before, e.g. I just found this: http://hackage.haskell.org/package/alex-meta

There are probably others. Maybe there already exists a good starting point?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants