Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeSet parser does not support all code points #3893

Open
skius opened this issue Aug 18, 2023 · 0 comments
Open

UnicodeSet parser does not support all code points #3893

skius opened this issue Aug 18, 2023 · 0 comments
Labels
C-transliterator Component: transliterator good first issue Good for newcomers help wanted Issue needs an assignee T-bug Type: Bad behavior, security, privacy

Comments

@skius
Copy link
Member

skius commented Aug 18, 2023

The unescaping code in icu_unicodeset_parser only works for scalar values (Rust char's), when all code points should be supported (any u32 below or equal char::MAX). Should be relatively straightforward to fix by replacing chars with u32s and a val <= char::MAX as u32 check instead of char::try_from in parse_escaped_char.

This currently fails, but should pass: icu_unicodeset_parser::parse(r"[^\uD800-\uE0FF]")

@skius skius added T-bug Type: Bad behavior, security, privacy good first issue Good for newcomers help wanted Issue needs an assignee C-unicode Component: Props, sets, tries labels Aug 18, 2023
@sffc sffc added C-transliterator Component: transliterator and removed C-unicode Component: Props, sets, tries labels Oct 5, 2023
@sffc sffc added this to the Backlog ⟨P4⟩ milestone Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-transliterator Component: transliterator good first issue Good for newcomers help wanted Issue needs an assignee T-bug Type: Bad behavior, security, privacy
Projects
None yet
Development

No branches or pull requests

2 participants