Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compiler: fix RegexSet bug #369

Merged
merged 1 commit into from
May 20, 2017
Merged

compiler: fix RegexSet bug #369

merged 1 commit into from
May 20, 2017

Conversation

BurntSushi
Copy link
Member

When compiling a RegexSet, it was possible for the jump locations to
become incorrect if the last regex in the set had a starting location
that didn't correspond to the beginning of its program. This can happen
in simple cases like when your set consists of the regexes a and β.
In particular, the program for β is:

0: Bytes(\xB2) (goto 2)
1: Bytes(\xCE) (goto 0)
2: MATCH

Where the entry point is 1 instead of 0. To fix this, we compile a
set of regexes similarly to how we compile a|β, where we handle the
holes produced by sub-expressions correctly.

Fixes #353

When compiling a RegexSet, it was possible for the jump locations to
become incorrect if the last regex in the set had a starting location
that didn't correspond to the beginning of its program. This can happen
in simple cases like when your set consists of the regexes `a` and `β`.
In particular, the program for `β` is:

    0: Bytes(\xB2) (goto 2)
    1: Bytes(\xCE) (goto 0)
    2: MATCH

Where the entry point is `1` instead of `0`. To fix this, we compile a
set of regexes similarly to how we compile `a|β`, where we handle the
holes produced by sub-expressions correctly.

Fixes #353
@BurntSushi
Copy link
Member Author

@bors r+

@bors
Copy link
Contributor

bors commented May 20, 2017

📌 Commit cd8f6eb has been approved by BurntSushi

@bors
Copy link
Contributor

bors commented May 20, 2017

⌛ Testing commit cd8f6eb with merge 8a1b2bb...

bors added a commit that referenced this pull request May 20, 2017
compiler: fix RegexSet bug

When compiling a RegexSet, it was possible for the jump locations to
become incorrect if the last regex in the set had a starting location
that didn't correspond to the beginning of its program. This can happen
in simple cases like when your set consists of the regexes `a` and `β`.
In particular, the program for `β` is:

    0: Bytes(\xB2) (goto 2)
    1: Bytes(\xCE) (goto 0)
    2: MATCH

Where the entry point is `1` instead of `0`. To fix this, we compile a
set of regexes similarly to how we compile `a|β`, where we handle the
holes produced by sub-expressions correctly.

Fixes #353
@bors
Copy link
Contributor

bors commented May 20, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: BurntSushi
Pushing 8a1b2bb to master...

@bors bors merged commit cd8f6eb into master May 20, 2017
@BurntSushi BurntSushi deleted the ag-fix-353 branch July 5, 2023 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RegexSet misbehave with unicode
2 participants