Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend whitespace to include NEL, LS, PS, LRM, RLM, and maybe ALM #74

Open
tahonermann opened this issue May 27, 2022 · 1 comment
Open
Labels
help wanted Extra attention is needed paper needed A paper proposing a specific solution is needed

Comments

@tahonermann
Copy link
Member

Unicode paper L2/22-072R: Proposal for amendments to UAX#9 and UAX#31, adopted for the upcoming Unicode 15 release, demonstrates the utility in allowing U+200E LEFT-TO-RIGHT MARK (LRM) and U+200F RIGHT-TO-LEFT MARK (RLM) to appear in whitespace, but not to constitute whitespace in isolation. The intent is to allow these marks to be inserted in whitespace in order to restore character directionality that might have been altered by characters in the preceding token.

@tahonermann tahonermann added help wanted Extra attention is needed paper needed A paper proposing a specific solution is needed labels May 27, 2022
@tahonermann tahonermann changed the title Allow LEFT-TO-RIGHT MARK and RIGHT-TO-LEFT MARK in whitespace Extend whitespace to include NEL, LS, PS, LRM, RLM, and maybe ALM. May 28, 2022
@tahonermann
Copy link
Member Author

I updated the issue title to extend this issue to cover the inclusion of all of the following characters in whitespace. This would suffice for C++ to meet the Pattern_White_Space requirements of UAX31-R3.

  • U+0085 NEXT LINE (NEL)
  • U+200E LEFT-TO-RIGHT MARK (LRM)
  • U+200F RIGHT-TO-LEFT MARK (RLM)
  • U+2028 LINE SEPARATOR (LS)
  • U+2029 PARAGRAPH SEPARATOR (PS)

Additionally, inclusion of the ALM should be considered as it is conceptually similar to LRM and RLM, though it is not a member of the Pattern_White_Space property (and cannot be added because that property is immutable). Including this character in whitespace would require the specification of a profile in [uaxid.pattern] for conformance with UAX31-R3.

  • U+061C ARABIC LETTER MARK (ALM)

@tahonermann tahonermann changed the title Extend whitespace to include NEL, LS, PS, LRM, RLM, and maybe ALM. Extend whitespace to include NEL, LS, PS, LRM, RLM, and maybe ALM May 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed paper needed A paper proposing a specific solution is needed
Development

No branches or pull requests

1 participant