Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial regex source generator #59186

Merged
merged 16 commits into from
Sep 22, 2021

Conversation

stephentoub
Copy link
Member

This adds the [RegexGenerator(...)] attribute and a new System.Text.RegularExpressions.Generator source generator for Regex. Given a method like:

[RegexGenerator("abc")]
private static partial Regex Abc();

it will generate C# code to implement Abc() that returns a Regex instance with code like you'd have gotten with RegexOptions.Compiled, but generated at build time in C# rather than at run time with reflection emit in MSIL. The generator is largely a port of RegexCompiler.cs; ideally we wouldn't duplicate all of the emit logic, once for MSIL and once for C#, and we should subsequently look for ways to avoid this duplication. The generator includes source from System.Text.RegularExpressions, in particular in support of parsing and analyzing regexes.

image

It includes a test project for the generator focused on testing the mechanics of the generator as well as integrating the source generator into the regex test suite, rolled out into many but not all of the tests so far.

There are a bunch of follow-ups to this for while I'll be opening issues, including rolling this out through the rest of the tests (and finding ways to make it go faster, as it's currently significantly increasing the time for the regex test suite), fixing a bunch of TODOs (e.g. figuring out what to do about culture), and improving perf further (it's now really easy to see opportunities for improved code generation).

Fixes #58880
Contributes to #44676

cc: @danmoseley, @pgovind, @tannergooding, @eerhardt, @veanes, @olsaarik

@ghost
Copy link

ghost commented Sep 15, 2021

Tagging subscribers to this area: @eerhardt, @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

This adds the [RegexGenerator(...)] attribute and a new System.Text.RegularExpressions.Generator source generator for Regex. Given a method like:

[RegexGenerator("abc")]
private static partial Regex Abc();

it will generate C# code to implement Abc() that returns a Regex instance with code like you'd have gotten with RegexOptions.Compiled, but generated at build time in C# rather than at run time with reflection emit in MSIL. The generator is largely a port of RegexCompiler.cs; ideally we wouldn't duplicate all of the emit logic, once for MSIL and once for C#, and we should subsequently look for ways to avoid this duplication. The generator includes source from System.Text.RegularExpressions, in particular in support of parsing and analyzing regexes.

image

It includes a test project for the generator focused on testing the mechanics of the generator as well as integrating the source generator into the regex test suite, rolled out into many but not all of the tests so far.

There are a bunch of follow-ups to this for while I'll be opening issues, including rolling this out through the rest of the tests (and finding ways to make it go faster, as it's currently significantly increasing the time for the regex test suite), fixing a bunch of TODOs (e.g. figuring out what to do about culture), and improving perf further (it's now really easy to see opportunities for improved code generation).

Fixes #58880
Contributes to #44676

cc: @danmoseley, @pgovind, @tannergooding, @eerhardt, @veanes, @olsaarik

Author: stephentoub
Assignees: -
Labels:

area-System.Text.RegularExpressions

Milestone: 7.0.0

@dotnet-issue-labeler
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@stephentoub
Copy link
Member Author

@jaredpar, FYI.

@@ -0,0 +1,318 @@
<?xml version="1.0" encoding="utf-8"?>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we not reuse the existing resx?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How? i.e. is there a mechanism for including an additional resx file and having them merged, or something like that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant just use the existing one - of course not all the strings apply. Perhaps that would get complex given this one is localized.
No I don't know a way to include one in another.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant just use the existing one

And add the generator specific strings into the existing one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's less wasteful than the duplication. Feel free to ignore.

@danmoseley
Copy link
Member

Will actually be interesting now we can look at the C# generated for different regexes to more easily see how it works.

@stephentoub stephentoub force-pushed the regexsourcegenerator branch 2 times, most recently from 7993493 to 771edaf Compare September 16, 2021 01:09
@Szer
Copy link

Szer commented Sep 16, 2021

Will there be any warning if that library code (supposed to be language agnostic?) would be called NOT from Roslyn?

@stephentoub
Copy link
Member Author

Will there be any warning if that library code (supposed to be language agnostic?) would be called NOT from Roslyn?

I don't understand the question. Regex is usable from all .NET languages. If you try to use RegexGenerator with something other than C#, you'll get errors because no code will be generated. Can you clarify?

Adds the new RegexGenerator attribute that's a signal to the regex
generator to generate code for the specified regex.
Add a source generator for generating C# code for Regex.  This is
primarily a port of RegexCompiler.cs, generating C# code instead of
MSIL.
Adds tests dedicated to the mechanics of the source generator, e.g. that
appropriate diagnostics are issued for improper use of RegexGenerator.
Start integrating the source generator into the regex test suite, so
that many existing tests also validate the generated code.
Changing the generator to not collect all regexes together means we don't need to reprocess/regenerate all regexes every time any one of them is changed.
To better test the appropriate ctor usage.
Also fixed one place where the IL we were generating wasn't as good as the reflection emit code.
Also clean up generated code in a few places to make it more readable / concise.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[API Proposal]: [RegexGenerator(...)] attribute
6 participants