Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RegexOptions.NonBacktracking #60607

Merged
merged 1 commit into from
Oct 19, 2021

Conversation

stephentoub
Copy link
Member

@stephentoub stephentoub commented Oct 19, 2021

The simple API addition (a new RegexOptions.NonBacktracking enum member) is the tip of the iceberg for a new engine in System.Text.RegularExpressions. Setting this option now opts the regex processing into a mode that can guarantee linear-time matching in the length of the input (doing so disables certain features of Regex and regex patterns and impacts observable behavior for what's matched).

This commit is the initial implementation ported over from dotnet/runtimelab, which was in turn seeded from the MSR Symbolic Regex Matcher project. There is still further work required to make it ready to ship, and such remaining work will happen in main.

Fixes #57891
Fixes #18614
Contributes to #1349

cc: @veanes, @olsaarik, @danmoseley, @jeffhandley, @eerhardt, @tannergooding

The simple API addition (a new RegexOptions.NonBacktracking enum member) is the tip of the iceberg for a new engine in System.Text.RegularExpressions.  Setting this option now opts the regex processing into a mode that can guarantee linear-time matching in the length of the input (doing so disables certain features of Regex and regex patterns).

This commit is the initial implementation ported over from dotnet/runtimelab, which was itself seeded from the MSR Symbolic Regex Matcher project.

Co-authored-by: Olli Saarikivi <olsaarik@microsoft.com>
Co-authored-by: Stephen Toub <stoub@microsoft.com>
Co-authored-by: Dan Moseley <danmose@microsoft.com>
@dotnet-issue-labeler
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@stephentoub
Copy link
Member Author

Changes have all been vetted by relevant reviewers in dotnet/runtimelab. Merging.

@stephentoub stephentoub merged commit 8676d97 into dotnet:main Oct 19, 2021
@stephentoub stephentoub deleted the regexnonbacktracking branch October 19, 2021 13:54
Comment on lines 385 to +389
protected void InitializeReferences()
{
if (_refsInitialized)
{
ThrowHelper.ThrowNotSupportedException(ExceptionResource.OnlyAllowedOnce);
}

_replref = new WeakReference<RegexReplacement?>(null);
_refsInitialized = true;
// This method no longer has anything to initialize. It continues to exist
// purely for API compat, as it was originally shipped as protected, with
// assemblies generated by Regex.CompileToAssembly calling it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we ever mark defunct protected methods as [Obsolete]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could. I don't know that it would help with a lot. In general the only folks that use the protected surface area of Regex/RegexRunner are ourselves, as part of the CompileToAssembly, RegexOptions.Compiled, and now the source generator. Regardless of [Obsolete], if someone is using this surface area directly, they're probably doing something wrong :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these APIs have

This API supports the product infrastructure and is not intended to be used directly from your code.

in the docs. E.g. https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regexrunner.charinclass

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API supports the product infrastructure and is not intended to be used directly from your code.

Should all such API's be annotated so our analyzer flags them on use? I know there's quite a few API's we own that have that blurb.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ghost ghost locked as resolved and limited conversation to collaborators Nov 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[API Proposal]: RegexOptions.Constrained Use a non-backtracking NFA/DFA regex engine where possible
5 participants