Add a LALR grammar for Rust with testing support #21452

bleibig · 2015-01-21T04:06:58Z

This adds a new lexer/parser combo for the entire Rust language can be generated with with flex and bison, taken from my project at https://github.com/bleibig/rust-grammar. There is also a testing script that runs the generated parser with all *.rs files in the repository (except for tests in compile-fail or ones that marked as "ignore-test" or "ignore-lexer-test"). If you have flex and bison installed, you can run these tests using the new "check-grammar" make target.

This does not depend on or interact with the existing testing code in the grammar, which only provides and tests a lexer specification.

OS X users should take note that the version of bison that comes with the Xcode toolchain (2.3) is too old to work with this grammar, they need to download and install version 3.0 or later.

The parser builds up an S-expression-based AST, which can be displayed by giving the "-v" argument to parser-lalr (normally it only gives output on error). It is only a rough approximation of what is parsed and doesn't capture every detail and nuance of the program.

Hopefully this should be sufficient for issue #2234, or at least a good starting point.

rust-highfive · 2015-01-21T04:07:11Z

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

steveklabnik · 2015-01-21T04:48:03Z

😍

emberian · 2015-01-21T13:09:01Z

+10000, so glad to finally see this work start to move in-tree.

erickt · 2015-01-21T15:44:30Z

Wow, this is awesome. Did any of these tools find any ambiguity in the grammar?

hawkw · 2015-01-21T16:03:55Z

+1

aturon · 2015-01-21T16:29:26Z

Whoa...!

zwarich · 2015-01-21T18:09:51Z

Strictly speaking, this isn't a LALR grammar for Rust, because it relies on the use of a push_back function in semantic actions to push tokens back onto the token stream.

brson · 2015-01-21T19:11:11Z

This looks pretty awesome. What are the prospects for testing that files the Rust parser rejects are also rejected by the grammar?

jbclements · 2015-01-21T19:32:34Z

I love where this is headed!

bleibig · 2015-01-23T03:32:20Z

The grammar as it stands has a number of S/R conflicts, but they are all resolved through use of the precedence features in bison to (hopefully) match how the production rust parser works in these situations.

As far as testing goes, the testing script does not do negative tests for programs that should fail to parse, but that feature can easily be added to the script. We can do that using programs in the compile-fail directory, however not all files there fail to parse as they are meant to fail in a later stage of compilation. We can check whether it's supposed to parse first with rustc -Z parse-only, but it might be a better idea to split the compile-fail directory so that files that fail to parse are in a new "parse-fail" directory.

nikomatsakis · 2015-01-24T20:38:37Z

@bors r+ f39297f

bors · 2015-01-24T22:14:15Z

⌛ Testing commit f39297f with merge 4e4e8cf...

This adds a new lexer/parser combo for the entire Rust language can be generated with with flex and bison, taken from my project at https://github.com/bleibig/rust-grammar. There is also a testing script that runs the generated parser with all *.rs files in the repository (except for tests in compile-fail or ones that marked as "ignore-test" or "ignore-lexer-test"). If you have flex and bison installed, you can run these tests using the new "check-grammar" make target. This does not depend on or interact with the existing testing code in the grammar, which only provides and tests a lexer specification. OS X users should take note that the version of bison that comes with the Xcode toolchain (2.3) is too old to work with this grammar, they need to download and install version 3.0 or later. The parser builds up an S-expression-based AST, which can be displayed by giving the "-v" argument to parser-lalr (normally it only gives output on error). It is only a rough approximation of what is parsed and doesn't capture every detail and nuance of the program. Hopefully this should be sufficient for issue #2234, or at least a good starting point.

bors · 2015-01-25T00:49:40Z

☀️ Test successful - auto-linux-32-nopt-t, auto-linux-32-opt, auto-linux-64-nopt-t, auto-linux-64-opt, auto-linux-64-x-android-t, auto-mac-32-opt, auto-mac-64-nopt-t, auto-mac-64-opt, auto-win-32-nopt-t, auto-win-32-opt, auto-win-64-nopt-t, auto-win-64-opt

sanxiyn · 2015-01-27T14:58:44Z

Where should one send changes to grammar now? bleibig/rust-grammar or rust-lang/rust?

steveklabnik · 2015-01-27T15:41:33Z

Rust-lang/rust

keleshev · 2015-02-12T19:54:09Z

This grammar is likely not LALR(1). When an ambiguity exists in an LALR or LR grammar, it could be resolved in two ways, either by:

rewriting the grammar to resolve ambiguity, or by
employing heuristics, for example, deciding whether to shift or reduce based on operator precedence.

In first case, the grammar is guaranteed to be (LA)LR, but in second—it might or might not. This Rust grammar is resolving ambiguities with precedence, so—it might not be (LA)LR.

Example
expr:
 | expr "+" expr
 | expr "*" expr
 | NUBMER
It is ambiguous because 1 + 2 * 3 could be parsed into either
(1 + 2) * 3 or 1 + (2 * 3), so it is not (LA)LR.

To resolve the ambiguity, you can rewrite this grammar as:
expr:
 | NUMBER 
 | sum

sum:
 | product "+" product

product:
 | expr "*" expr
Now, it is LALR(1) and it will parse the expression only as 1 + (2 * 3).

But, practically speaking, most (LA)LR parser generators allow you to resolve grammar ambiguities with precedence, so this is probably not a big deal.

keleshev · 2015-02-13T10:11:06Z

This grammar seems to define assignment and compound assignment operators as left-associative (which is corresponds to reference description), however this example confirms that = is right-associative:

fn main() {
  let mut u: ();
  let mut a: u8;

  u = (a = 2);  // right associativity
//(u = a) = 2;     left associativity, doesn't work
  u = a = 2;    // this works, so it must be right-associative

  print!("{} {}", a, u == ());
}

keleshev · 2015-02-13T10:12:45Z

src/grammar/parser-lalr.y

+// prefix_exprs
+%precedence RETURN
+
+%left '=' SHLEQ SHREQ MINUSEQ ANDEQ OREQ PLUSEQ STAREQ SLASHEQ CARETEQ PERCENTEQ


Here is the place where erroneous left-associativity of assignment operators is defined.

Add a LALR grammar for Rust with testing support

f39297f

rust-highfive assigned nikomatsakis Jan 21, 2015

steveklabnik mentioned this pull request Jan 21, 2015

Grammar tests #2234

Closed

bors merged commit f39297f into rust-lang:master Jan 25, 2015

This was referenced Feb 4, 2015

Model lexer is still wrong #15883

Closed

Move compile-fail tests that are rejected by the parser to parse-fail #22011

Merged

keleshev reviewed Feb 13, 2015
View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a LALR grammar for Rust with testing support #21452

Add a LALR grammar for Rust with testing support #21452

bleibig commented Jan 21, 2015

rust-highfive commented Jan 21, 2015

steveklabnik commented Jan 21, 2015

emberian commented Jan 21, 2015

erickt commented Jan 21, 2015

hawkw commented Jan 21, 2015

aturon commented Jan 21, 2015

zwarich commented Jan 21, 2015

brson commented Jan 21, 2015

jbclements commented Jan 21, 2015

bleibig commented Jan 23, 2015

nikomatsakis commented Jan 24, 2015

bors commented Jan 24, 2015

bors commented Jan 25, 2015

sanxiyn commented Jan 27, 2015

steveklabnik commented Jan 27, 2015

keleshev commented Feb 12, 2015

Example

keleshev commented Feb 13, 2015

keleshev Feb 13, 2015

Add a LALR grammar for Rust with testing support #21452

Add a LALR grammar for Rust with testing support #21452

Conversation

bleibig commented Jan 21, 2015

rust-highfive commented Jan 21, 2015

steveklabnik commented Jan 21, 2015

emberian commented Jan 21, 2015

erickt commented Jan 21, 2015

hawkw commented Jan 21, 2015

aturon commented Jan 21, 2015

zwarich commented Jan 21, 2015

brson commented Jan 21, 2015

jbclements commented Jan 21, 2015

bleibig commented Jan 23, 2015

nikomatsakis commented Jan 24, 2015

bors commented Jan 24, 2015

bors commented Jan 25, 2015

sanxiyn commented Jan 27, 2015

steveklabnik commented Jan 27, 2015

keleshev commented Feb 12, 2015

Example

keleshev commented Feb 13, 2015

keleshev Feb 13, 2015

Choose a reason for hiding this comment