Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

\u{00ad} character is sometimes parsed twice #77417

Closed
bugadani opened this issue Oct 1, 2020 · 1 comment
Closed

\u{00ad} character is sometimes parsed twice #77417

bugadani opened this issue Oct 1, 2020 · 1 comment
Labels
A-parser Area: The parsing of Rust source code to an AST. C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@bugadani
Copy link
Contributor

bugadani commented Oct 1, 2020

See here: https://godbolt.org/z/sdv7a3

The println! outputs "f\u{ad}\u{ad}cali" - this isn't a println! issue, but the string literal also contains 2 \u{ad} characters.

Same example, but a little more visible: the following assert is supposed to pass:

assert_eq!(6, "f\u{AD}­cali".chars().count());

Issue exists since 1.0

@bugadani bugadani added the C-bug Category: This is a bug. label Oct 1, 2020
@jonas-schievink jonas-schievink added A-parser Area: The parsing of Rust source code to an AST. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 1, 2020
@bugadani
Copy link
Contributor Author

bugadani commented Oct 1, 2020

Error code ID10T. I managed to paste the shy character into the text, so it doesn't show up in the source but it IS there. Wow

@bugadani bugadani closed this as completed Oct 1, 2020
bors added a commit to rust-lang/rust-clippy that referenced this issue Oct 2, 2020
Lint for invisible Unicode characters other than ZWSP

This PR extends the existing `zero_width_space` lint to look for other invisible characters as well (in this case, `\\u{ad}` soft hyphen.

I feel like this lint is the logical place to add the check, but I also realize the lint name is not particularly flexible, but I also understand that it shouldn't be renamed for compatibility reasons.

Open questions:
 - What other characters should trigger the lint?
 - What should be done with the lint name?
 - How to indicate the change in functionality?

Motivation behind this PR: rust-lang/rust#77417 - I managed to shoot myself in the foot by an invisible character pasted into my test case.
bors added a commit to rust-lang/rust-clippy that referenced this issue Oct 2, 2020
Lint for invisible Unicode characters other than ZWSP

This PR extends the existing `zero_width_space` lint to look for other invisible characters as well (in this case, `\\u{ad}` soft hyphen.

I feel like this lint is the logical place to add the check, but I also realize the lint name is not particularly flexible, but I also understand that it shouldn't be renamed for compatibility reasons.

Open questions:
 - What other characters should trigger the lint?
 - What should be done with the lint name?
 - How to indicate the change in functionality?

Motivation behind this PR: rust-lang/rust#77417 - I managed to shoot myself in the foot by an invisible character pasted into my test case.

changelog: rename [`zero_width_space`] to [`invisible_characters`] and add SHY and WJ to the list.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-parser Area: The parsing of Rust source code to an AST. C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

2 participants