Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major refactoring and new API (among others for compat with Julia v0.5 and above) #24

Closed
wants to merge 53 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
c9e2077
added .gitignore
vonDonnerstein Dec 26, 2016
29a91b7
cleaning README.md Ex3 now without warnings
vonDonnerstein Dec 26, 2016
ce2f0c7
cleaning Node.jl
vonDonnerstein Dec 26, 2016
4b85549
all includes at one place
vonDonnerstein Dec 26, 2016
965dd06
Avoiding clashes with ParseError in Base
vonDonnerstein Dec 26, 2016
906e8a1
removed useless creator because its defined by default
vonDonnerstein Dec 29, 2016
17f6505
decrease creator redundancy
vonDonnerstein Dec 29, 2016
1bb9589
making show(grammar) more legible
vonDonnerstein Dec 29, 2016
635628f
added documentation
vonDonnerstein Dec 29, 2016
56d5ccd
removing methods that are always overwritten
vonDonnerstein Dec 29, 2016
fec2dff
further increasing legibility of grammar
vonDonnerstein Dec 29, 2016
904f443
removing redundand creators
vonDonnerstein Dec 29, 2016
5ceb495
removing redundant exports and imports
vonDonnerstein Dec 29, 2016
db32d27
fixing 904f443
vonDonnerstein Dec 29, 2016
fffd08f
uncached_parse => parse_newcachekey (because even uncached_parse woul…
vonDonnerstein Dec 30, 2016
0b73b0d
extracting parsing functions from module file
vonDonnerstein Dec 30, 2016
815fed6
removing unused helper fcn
vonDonnerstein Dec 30, 2016
04fc079
updating show() for some rules
vonDonnerstein Dec 30, 2016
9e97458
replacing string_matches with one-liner
vonDonnerstein Dec 30, 2016
1a0ce67
documentation fix
vonDonnerstein Dec 30, 2016
4bbaae9
only symbols for direct use by the regular user should be exported
vonDonnerstein Dec 30, 2016
30c191d
shuffling lines where they logically belong
vonDonnerstein Dec 30, 2016
3fd6145
fixing 9e97458
vonDonnerstein Dec 30, 2016
236b098
updating tests
vonDonnerstein Dec 30, 2016
56da3c9
shuffling lines to the correct positions
vonDonnerstein Dec 30, 2016
f06ec83
Inner Constructors => Outer Constructors (actions are created by dire…
vonDonnerstein Dec 30, 2016
a375dd5
extracting show() rules to end of file
vonDonnerstein Dec 30, 2016
380bd8f
trimming show() rules
vonDonnerstein Dec 30, 2016
fc9d5bc
bringing all Base conflicting names to one place
vonDonnerstein Dec 30, 2016
ae0b451
building rule constructors by meta-code (less verbose)
vonDonnerstein Dec 30, 2016
45f297b
added fcn call wrappers for consistency
vonDonnerstein Jan 4, 2017
1c5f4e2
fixed regex matching
vonDonnerstein Jan 4, 2017
efbba95
updated and-rule io
vonDonnerstein Jan 4, 2017
e19c657
starting work on a new grammar framework
vonDonnerstein Jan 4, 2017
fc10859
increased legibility in newgrammar, suppressing terminals and ignorin…
vonDonnerstein Jan 4, 2017
dd7e564
OR-rule added to newgrammar
vonDonnerstein Jan 4, 2017
9446e9d
added TERM-rule to newgrammar
vonDonnerstein Jan 4, 2017
7e6d4d4
quantifiers added to newgrammar
vonDonnerstein Jan 4, 2017
2cfae2a
cleaner structure allows for actions on quantifiers
vonDonnerstein Jan 4, 2017
fa21e7c
added REGEX-rule to newgrammar
vonDonnerstein Jan 4, 2017
bb9b270
updated show fcns
vonDonnerstein Jan 5, 2017
d0f9791
the newgrammar framework is now almost done. still need gg_string
vonDonnerstein Jan 5, 2017
e522fc5
the grammar is self-replicating now. World domination soon to come...
vonDonnerstein Jan 7, 2017
1e0d0f6
added comparisons and corrected differences between grammargrammar an…
vonDonnerstein Jan 7, 2017
fb71b42
cleanup
vonDonnerstein Jan 7, 2017
bde0fde
updated license file and travis
vonDonnerstein Jan 8, 2017
348fd23
updated readme (written by hand not sphinx)
vonDonnerstein Jan 8, 2017
94123e0
added standardrules and related functions
vonDonnerstein Jan 8, 2017
0b0907d
minor fixes and renamings
vonDonnerstein Jan 8, 2017
ff22b3b
added more standardactions
vonDonnerstein Jan 8, 2017
41094a8
updated examples
vonDonnerstein Jan 8, 2017
b1fd128
removed test for old API and inserted end-to-end-test for new API
vonDonnerstein Jan 8, 2017
d9f2aae
minor fixes in readme
vonDonnerstein Jan 8, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Code Coverage
*.jl.cov
*.jl.*.cov
*.jl.*.html
*.jl.mem
amber.png
emerald.png
glass.png
ruby.png
snow.png
updown.png
gcov.css
index-sort-f.html
index-sort-l.html
index.html
lcov.info
test/coverage_run.out

# VIM
*.swo
*.swp

# ctags
tags

# Git
*.orig
27 changes: 12 additions & 15 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,17 +1,14 @@
language: cpp
compiler:
- clang
language: julia
os:
- linux
- osx
git:
depth: 3
julia:
- release
- nightly
matrix:
allow_failures:
- julia: nightly
notifications:
email: false
env:
matrix:
- JULIAVERSION="juliareleases"
- JULIAVERSION="julianightlies"
before_install:
- sudo add-apt-repository ppa:staticfloat/julia-deps -y
- sudo add-apt-repository ppa:staticfloat/${JULIAVERSION} -y
- sudo apt-get update -qq -y
- sudo apt-get install libpcre3-dev julia -y
- if [[ -a .git/shallow ]]; then git fetch --unshallow; fi
script:
- julia -e 'Pkg.init(); Pkg.clone(pwd()); Pkg.test("PEGParser")'
6 changes: 3 additions & 3 deletions LICENSE.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Cairo.jl is licensed under the MIT License:
PEGParser.jl is licensed under the MIT License:

> Copyright (c) 2012-2013: Jeff Bezanson, Keno Fischer, Tim Holy, Mike
> Nolta, Viral Shah, and other contributors.>
> Copyright (c) since 2014: <Henry Schurkus, Abe Schneider,
> and other contributors.>

> Permission is hereby granted, free of charge, to any person obtaining
> a copy of this software and associated documentation files (the
Expand Down
266 changes: 121 additions & 145 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,209 +2,185 @@

# PEGParser

PEGParser is a PEG Parser for Julia with Packrat capabilties. PEGParser was inspired by pyparsing, parsimonious, boost::spirit, as well as several others.
PEGParser is a parsing library for Parsing Expression Grammars (PEG) in Julia. It was inspired by pyparsing, parsimonious, boost::spirit, as well as several others. The original design was set up by Abe Schneider in 2014. As of January 2017 Henry Schurkus has reworked major parts of the library design. With the redesign also came a change of the API for easier and less error prone use. Below we describe the new design which takes grammar declarations in the form of (multiline) strings. The previous design relied heavily on the specific parsing logic of the julia language and should therefore be considered deprecated.

# Super Quick Tutorial For The Very Busy
A parser takes a string and a grammar specification to turn the former into a computable structure. PEGParser does this by first parsing (`parse(grammar,string)`) the string into an Abstract Syntax Tree (AST) and then transforming this AST into the required structure (`transform(function,AST)`).

## Defining a grammar

To define a grammar you can write:
A grammar can be defined as:

```julia
@grammar <name> begin
Grammar("""
rule1 = ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to match the new examples, which have rule => ...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply -- no need to put my name on the library since it looks like it's taking a different direction. Instead, if you like, you can just put based on PEGParser.

rule2 = ...
...
end
""")
```

### Allowed rules
where the following rules can be used:

The following rules can be used:
* Terminals: Strings and characters
* Or: `a | b | c`
* And: `a + b + c`
* Grouping: `(a + b) | (c + d)`
* Optional: ?(a + b)
* One or more: `+((a + b) | (c + d))`
* Zero or more: `*((a + b) | (c + d))`
* Look ahead: `a > (b + c)`
* Regular expressions: `r"[a-zA-Z]+"`
* Lists: `list(rule, delim)` *or* `list(rule, delim, min=1)`
* Suppression: `-rule`
* Semantic action: `rule { expr }`
* References to other rules: `theotherrule`
* Terminals: `'a'` (must match literally)
* Or: `rule1 | rule2 | rule3` (the first rule that matches wins)
* And: `rule1 & rule2 & rule3` (the rules are matched one after the other (`&` groups stronger than `|`)
* Grouping: `(rule1 + rule2) | (rule3 + rule4)`
* Optional: `?(rule1)` (is matched if possible, but counts as matched anyways)
* One or more: `+(rule1)` (is matched as often as possible, but has to be matched at least once)
* Zero or more: `*(rule1)` (is matched as often as possible, but counts as matched even if never matched)
* Regular expressions: `r([a-zA-Z]+)r` (matches whatever the regex between `r(` and `)r` matches)
* Suppression: `-(rule1)` (the rule has to be matched but yields no node to the AST)
* Semantic action: `rule{ expr }` (uses expr to create the node instead of the default `no_action`; see below for more information)

For semantic actions, the `expr` may use the variables: `node`, `value`, `first`, `last`, and `children`. The `value` variable has a corresponding alias `_0` and each element of `children` `_i`, where `i` is the index into `children`. See below for examples using this.
The argument to `Grammar()` is a String, where line ends or semicoli (;) can be used to separate rules.
All grammars by default use `start` as the starting rule. You can specify a different starting rule in the `parse` function if you desire.

#### TODO
Multiple: `(a+b)^(3, 5)`
### Example 1
Note: All these examples and more can be found in the examples folder of PEGParser.

## Example 1
Let's start by creating a simple calculator that can take two numbers and an operator to give a result.

We first define the grammar:
```julia
@grammar calc1 begin
start = number + op + number
op = plus | minus
number = -space + r"[0-9]+"
plus = -space + "+"
minus = -space + "-"
space = r"[ \t\n\r]*"
end
```
calc1 = Grammar("""
start => (number & op & number)

All grammars by default use `start` as the starting rule. You can specify a different starting rule in the `parse` function if you desire.

The starting rule is composed of two other rules: `number` and `op`. For this calculator, we only allow `+` and `-`. Note, that this could in fact be written more concisely with:

```julia
op = -space + r"[+-]"
op => plus | minus
number => (-(space) & r([0-9]+)r)
plus => (-(space) & '+')
minus => (-(space) & '-')
space => r([ \\t\\n\\r]*)r
""")
```

The starting rule is composed of two other rules: `number` and `op`. For this calculator, we only allow `+` and `-`.

The `number` rule just matches any digit between 0 to 9. You'll note that spaces appear in front of all terminals. This is because PEGs don't handle spaces automatically.

## Parsing
`parse(grammar,string)` allows the construction of the AST of `string` according to `grammar`.

### Example 1 continued
Now we can run this grammar with some input:

```julia
(ast, pos, error) = parse(calc1, "4+5")
println(ast)
```

will result in the following output:
resulting in the following output:

```
node(start) {AndRule}
1: node(number) {AndRule}
1: node(number.2) {'4',RegexRule}
2: node(plus) {AndRule}
1: node(plus.2) {'+',Terminal}
3: node(number) {AndRule}
1: node(number.2) {'5',RegexRule}
node() {PEGParser.AndRule}
1: node() {PEGParser.AndRule}
1: node() {'4',PEGParser.RegexRule}
2: node() {PEGParser.AndRule}
1: node() {'+',PEGParser.Terminal}
3: node() {PEGParser.AndRule}
1: node() {'5',PEGParser.RegexRule}
```

Our input is correctly parsed by our input, but we either have to traverse the tree to get out the result, or use change the output of the parse.
## Transformation

We can change the output of the parse with semantic actions. Every rule already has a semantic action attached to it. Normally it is set to either return a node in the tree or (for the or-rule) give the first child node.
Finally one transforms the AST to the desired datastructure by first defining an accordingly overloaded actuator function and then calling it recursively on the AST by `transform(function, ast)`.

For example, we can change the `number` rule to emit an actual number:
### Example 1 continued
We now have the desired AST for "4+5". For our calculator we do not want to put everything into a datastructure, but actually fold it all up directly into the final result.

```julia
number = (-space + r"[0-9]+") { parseint(_1.value) }
```
For the transformation an actuator function needs to be defined, which dispatches on the name of the nodes. So we first need to give names to the parsed nodes:

The curly-braces after a rule allows either an expression or function to be used as the new action. In this case, the first child (the number, as the space is suppressed), as specified by `_1`, is parsed as an integer.

If we rewrite the grammar fully with actions defined for the rules, we end up with:
```julia
calc1 = Grammar("""
start => (number & op & number) {"start"}

op => (plus | minus) {"op"}
number => (-(space) & r([0-9]+)r) {"number"}
plus => (-(space) & '+') {"plus"}
minus => (-(space) & '-') {"minus"}
space => r([ \\t\\n\\r]*)r
""")
```
leading to the following AST
```julia
@grammar calc1 begin
start = (number + op + number) {
apply(eval(_2), _1, _3)
}

op = plus | minus
number = (-space + r"[0-9]+") {parseint(_1.value)}
plus = (-space + "+") {symbol(_1.value)}
minus = (-space + "-") {symbol(_1.value)}
space = r"[ \t\n\r]*"
end

data = "4+5"
(ast, pos, error) = parse(calc1, data)
println(ast)
node(start) {PEGParser.AndRule}
1: node(number) {PEGParser.AndRule}
1: node() {'4',PEGParser.RegexRule}
2: node(plus) {PEGParser.AndRule}
1: node() {'+',PEGParser.Terminal}
3: node(number) {PEGParser.AndRule}
1: node() {'5',PEGParser.RegexRule}
```

We now get `9` as an answer. Thus, the parse is also doing the calculation. The code for this can be found in `calc1.jl`, with `calc2.jl` providing a more realistic (and useful) calculator.
We can now define the actuator function as
```julia
toresult(node,children,::MatchRule{:default}) = node.value
toresult(node,children,::MatchRule{:number}) = parse(Int,node.value)
toresult(node,children,::MatchRule{:plus}) = +
toresult(node,children,::MatchRule{:minus}) = -
toresult(node,children,::MatchRule{:start}) = children[2](children[1],children[3])
```
and recursively apply it to our AST
```julia
transform(toresult,ast)
```
to obtain the correct result, `9`.

## Example 2
## Actions

In `calc3.jl`, you can find a different approach to this problem. Instead of trying to calculate the answer immediately, the full syntax tree is created. This allows it to be transformed into different forms. In this example, we transform the tree into Julia code:
Not always does one want to create every node directly as a basic `Node` type. Actions allow to directly act on parts of the AST during its very construction. An action is specified by `{ action }` following any rule. Generally a function (anonymous or explicit) has to be specified which takes the following arguments `(rule, value, firstpos, lastpos, childnodes)` and may return anything which nodes higher up in the AST can work with.

As a shorthand just specifying a name as a string, e.g. `"name"`, results in a normal node, but with the specified name set. This is how we did the naming in example 1 above. As a side note: The action `liftchild` just takes the child of the node and returns it on the current level. This is the default action for `|`-rules - whichever child gets matched is returned in place of the `|`-rule as if we had explicitly specified
```julia
@grammar calc3 begin
start = expr

expr_op = term + op1 + expr
expr = expr_op | term
term_op = factor + op2 + term

term = term_op | factor
factor = number | pfactor
pfactor = (lparen + expr + rparen) { _2 }
op1 = add | sub
op2 = mult | div

number = (-space + float) { parsefloat(_1.value) } | (-space + integer) { parseint(_1.value) }
add = (-space + "+") { symbol(_1.value) }
sub = (-space + "-") { symbol(_1.value) }
mult = (-space + "*") { symbol(_1.value) }
div = (-space + "/") { symbol(_1.value) }

lparen = (-space + "(") { _1 }
rparen = (-space + ")") { _1 }
space = r"[ \n\r\t]*"
end
myOrRule = (rule1 | rule2) {liftchild}
```

You will also notice that instead of trying to define `integer` and `float` manually, we are now using pre-defined parsers. Custom parsers can be defined to both make defining new grammars easier as well as add new types of functionality (e.g. maintaining symbol tables).
### Where do actions apply?
Actions always apply to the single token preceding them, so in
* `rule1 {action} & rule2` `action` applies to rule1
* `rule1 & rule2 {action}` `action` applies to rule2
* `(rule1 & rule2) {action}` `action` applies to the `&`-rule joining rule1 and rule2.

The grammar is now ready to be used to parse strings:
For another example, in
* `*(rule) {action}` `action` applies to the `*`-rule
* `*(rule {action})` `action` applies to `rule`.

### Example 2
As our calculator is really very simple we could have - instead of first building a named AST and then transforming it - directly parsed the string into the final result by means of actions:
```julia
(ast, pos, error) = parse(calc3, "3.145+5*(6-4.0)")
```

which results in the following AST:
calc2 = Grammar("""
start => (number & op & number){(r,v,f,l,c) -> c[2](c[1],c[3])}

op => plus | minus
number => (-(space) & r([0-9]+)r) {(r,v,f,l,c) -> parse(Int,c[1].value)}
plus => (-(space) & '+'){(a...) -> +}
minus => (-(space) & '-'){(a...) -> -}
space => r([ \\t\\n\\r]*)r
""")
```
node(start) {ReferencedRule}
node(expr_op) {AndRule}
1: 3.145 (Float64)
2: + (Symbol)
3: node(term_op) {AndRule}
1: 5 (Int64)
2: * (Symbol)
3: node(expr_op) {AndRule}
1: 6 (Int64)
2: - (Symbol)
3: 400.0 (Float64)
```
which would have directly resulted in `9` when parsing `parse(calc2, "4+5")`.

Now that we have an AST, we can create transforms to convert the AST into Julia code:
### Example 3
Actually, the best example for how to parse stuff can be found in the source code itself. In `grammarparsing.jl` we give the grammar used to parse grammar specifications by the user. While it is not actually live code, its consistency with what really happens is ensured by having it be a test in the test suite. Look here if you ever wonder about any specifics of grammar specification.

```julia
toexpr(node, cnodes, ::MatchRule{:default}) = cnodes
toexpr(node, cnodes, ::MatchRule{:term_op}) = Expr(:call, cnodes[2], cnodes[1], cnodes[3])
toexpr(node, cnodes, ::MatchRule{:expr_op}) = Expr(:call, cnodes[2], cnodes[1], cnodes[3])
```
# An In Depth Guide To The Library

and to use the transforms:
* The entry point to the library is of course the file `PEGParser.jl` which handles all `import`/`export`ing and includes the other files in order.
* `rules.jl` defines `Rule` and all its `subtypes`. These typically consist of a `name` (which when constructed with the default constructor is simply `""`), a type-specific `value` and an `action`.
* `grammar.jl` defines the `Grammar`-type as a dictionary mapping symbols to rules.
* `comparison.jl` defines comparison functions so that it is possible to check for example if two grammars are the same.
* `standardactions.jl` defines some utility actions which are often needed, e.g. the above mentioned `liftchild`.

```julia
code = transform(toexpr, ast)
```
*after these files are read it is now possible to specify any grammar in the most julia-nique way: By manually stacking constructors into one another.*

to generate the Expr:
* `node.jl` defines the `Node`-type which makes up any AST. An AST is actually just a top-level node and all its (recursive) children.
* `parse.jl` defines the generic `parse` function and its children `parse_newcachekey` which are specified for each Rule subtype to handle the recursive parsing of a string by a specified grammar.
* `transform.jl` defines the `transform` function mentioned above.

```
Expr
head: Symbol call
args: Array(Any,(3,))
1: Symbol +
2: Float64 3.145
3: Expr
head: Symbol call
args: Array(Any,(3,))
1: Symbol *
2: Int64 5
3: Expr
head: Symbol call
args: Array(Any,(3,))
typ: Any
typ: Any
typ: Any
```
*after these files are additionally read it is now possible to also parse and transform a string to a datastructure according to some grammar build by the manual stacking process discussed above*

## Caveats
* Since now all functionality is in principle available, `grammarparsing.jl` defines a grammar to parse grammars by the stacking process to allow the end-user to simply specify his or her grammar as a string.

This is still very much a work in progress and doesn't yet have as much test coverage as I would like.
Note, that some grammar functionality is still only available by direct construction. That is because the consistent definition of such a grammargrammar becomes exponentially more difficult with the number of grammar features.

The error handling still needs a lot of work. Currently only a single error will be emitted, but the hope is to allow multiple errors to be returned.
* `standardrules.jl` defines a grammar `standardrules` consisting only of commonly used rules like `space`, `float`, etc. so that they do not have to be defined by the end user every single time. Instead, the end user can simply join these rules into his or her definition by constructing the grammar as `Grammar("...", standardrules)`
Loading