Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lint for invisible Unicode characters other than ZWSP #6105

Merged
merged 4 commits into from
Oct 2, 2020

Conversation

bugadani
Copy link
Contributor

@bugadani bugadani commented Oct 2, 2020

This PR extends the existing zero_width_space lint to look for other invisible characters as well (in this case, \\u{ad} soft hyphen.

I feel like this lint is the logical place to add the check, but I also realize the lint name is not particularly flexible, but I also understand that it shouldn't be renamed for compatibility reasons.

Open questions:

  • What other characters should trigger the lint?
  • What should be done with the lint name?
  • How to indicate the change in functionality?

Motivation behind this PR: rust-lang/rust#77417 - I managed to shoot myself in the foot by an invisible character pasted into my test case.

changelog: rename [zero_width_space] to [invisible_characters] and add SHY and WJ to the list.

@rust-highfive
Copy link

r? @ebroto

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties label Oct 2, 2020
Copy link
Member

@ebroto ebroto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the open questions:

What other characters should trigger the lint?

I'm really not an expert in this field. Taking a look at this Wikipedia entry, there are some other invisible characters (second table) which could be candidates.
My main concern would be including characters that could be legitimately used in languages I'm not familiar with, so I would prefer to be conservative here. Maybe the WORD JOINER one could be a candidate? If we are not sure I would prefer not to include others.

What should be done with the lint name?

There's a mechanism for renaming lints. You can find an example in #3554. It's a bit old PR, but self-contained, except for the test about renaming lints, you can omit that.

How to indicate the change in functionality?

A new name for the lint should suffice. What about invisible_characters that you used when linting?

clippy_lints/src/unicode.rs Outdated Show resolved Hide resolved
@ebroto ebroto added S-waiting-on-author Status: This is awaiting some action from the author. (Use `@rustbot ready` to update this status) and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties labels Oct 2, 2020
@bugadani
Copy link
Contributor Author

bugadani commented Oct 2, 2020

I'm really not an expert in this field. Taking a look at this Wikipedia entry, there are some other invisible characters (second table) which could be candidates.

Oh, I intended this question to a broader audience, not placing the burden of figuring out on you. Just in case somebody would have a stronger opinion here.

My main concern would be including characters that could be legitimately used in languages I'm not familiar with, so I would prefer to be conservative here. Maybe the WORD JOINER one could be a candidate? If we are not sure I would prefer not to include others.

I've added WJ, that seems obvious enough. I don't think language support is an issue here - if a character is invisible, I think people should prefer writing out the escape code instead in any language. But I may have wrong assumptions and preferences here.

There's a mechanism for renaming lints. You can find an example in #3554. It's a bit old PR, but self-contained, except for the test about renaming lints, you can omit that.

Thanks, I've renamed the lint to invisible_characters.

@bugadani bugadani requested a review from ebroto October 2, 2020 22:13
Copy link
Member

@ebroto ebroto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's wait for CI and we can merge it.

Oh, I intended this question to a broader audience, not placing the burden of figuring out on you

Of course! Sorry if that sounded harsh, it was not the intention.

I don't think language support is an issue here - if a character is invisible, I think people should prefer writing out the escape code instead in any language

That makes sense, but I would prefer to be cautious. I think adding those two is a good improvement in any case.

@ebroto
Copy link
Member

ebroto commented Oct 2, 2020

@bors r+

Thanks!

@bors
Copy link
Collaborator

bors commented Oct 2, 2020

📌 Commit 572e4c4 has been approved by ebroto

@bors
Copy link
Collaborator

bors commented Oct 2, 2020

⌛ Testing commit 572e4c4 with merge 15a7950...

bors added a commit that referenced this pull request Oct 2, 2020
Lint for invisible Unicode characters other than ZWSP

This PR extends the existing `zero_width_space` lint to look for other invisible characters as well (in this case, `\\u{ad}` soft hyphen.

I feel like this lint is the logical place to add the check, but I also realize the lint name is not particularly flexible, but I also understand that it shouldn't be renamed for compatibility reasons.

Open questions:
 - What other characters should trigger the lint?
 - What should be done with the lint name?
 - How to indicate the change in functionality?

Motivation behind this PR: rust-lang/rust#77417 - I managed to shoot myself in the foot by an invisible character pasted into my test case.
@bugadani
Copy link
Contributor Author

bugadani commented Oct 2, 2020

Of course! Sorry if that sounded harsh, it was not the intention.

Not at all, no worries. :) Thanks for the review!

@bors
Copy link
Collaborator

bors commented Oct 2, 2020

💔 Test failed - checks-action_test

@ebroto
Copy link
Member

ebroto commented Oct 2, 2020

@bors retry (missing changelog)

@bors
Copy link
Collaborator

bors commented Oct 2, 2020

⌛ Testing commit 572e4c4 with merge 9408c68...

@bors
Copy link
Collaborator

bors commented Oct 2, 2020

☀️ Test successful - checks-action_dev_test, checks-action_remark_test, checks-action_test
Approved by: ebroto
Pushing 9408c68 to master...

@bors bors merged commit 9408c68 into rust-lang:master Oct 2, 2020
@bugadani bugadani deleted the sus-char branch October 2, 2020 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action from the author. (Use `@rustbot ready` to update this status)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants