Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Typescript to infer types of grammar rules #98

Merged
merged 6 commits into from
Apr 10, 2023

Conversation

siefkenj
Copy link
Contributor

@siefkenj siefkenj commented Apr 3, 2023

This PR is the big one! It merges https://github.com/siefkenj/peggy-to-ts/ into ts-pegjs.

This PR depends on #97

Much code in the src/lib directory was added. This code adds a TypeExtractor object which takes a grammar AST and creates a Typescript type for each rule using ts-morph to call Typescript for type inference.

Key features:

  • If a Peggy action has a Typescript return type, it will be used by this code.
  • If a user specifies a type in returnTypes, they will override whatever is computed.
  • Just like in TS, if TS cannot infer the type of a rule, it will be given type any.
  • Type names are automatically named as CamelCase based on grammar rule names. This can be turned off via an option.
  • A onlyGenerateGrammarTypes option was added to have only the grammar types output. This can be used if someone wants to split their parser into a types file and a parser file.
  • Circularly defined types are detected and eliminated. It is possible that this results in a type that is different from what the parser generates, but most of the time, I think it will be correct...

Circularly defined types

If a grammar is defined like

A = "a" / B
B = "b" / "(" @A ")"

the generator will try to create the following types:

type A = "a" | B
type B = "b" | A

which is a circular type definition. While the grammar knows it won't get stuck in an infinite parsing loop, the type has no idea. Ideally, someone would re-write the grammar to be

A = Base / "(" @Base ")"
Base = A_base / B_base
A_base = "a"
B_base = "b"

Rather than make users rewrite their grammar, the algorithm in prune-circular-references.ts "snips" the circular reference by inserting void for one of the types. That is,

A = "a" / B
B = "b" / "(" @A ")"

turns to

type A = "a" | B
type B = "b" | void

This means the type of B is incorrect, but the type of A is correct. Assuming the rule closer to the top of the file is the start rule, everything should be fine.

This actually comes up in practice in the st and javascript .pegjs files.

Currently no warning is emitted, but this could be easily changed.

Dependencies

This PR adds a dependency on ts-morph and prettier. The prettier dependency can be removed without changing the meaning of the generated types, but it makes them look a lot nicer (and removes many excess parenthesis that are added when creating types)

Copy link
Contributor

@pjmolina pjmolina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! thank you for the good work!

@pjmolina pjmolina merged commit 0255b92 into metadevpro:master Apr 10, 2023
@pjmolina
Copy link
Contributor

Thank you very much for a great work @siefkenj
Published as ts-pegjs@4.0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants