UnicodeSet parser does not support all code points #3893

skius · 2023-08-18T19:37:51Z

The unescaping code in icu_unicodeset_parser only works for scalar values (Rust char's), when all code points should be supported (any u32 below or equal char::MAX). Should be relatively straightforward to fix by replacing chars with u32s and a val <= char::MAX as u32 check instead of char::try_from in parse_escaped_char.

This currently fails, but should pass: icu_unicodeset_parser::parse(r"[^\uD800-\uE0FF]")

The text was updated successfully, but these errors were encountered:

skius added T-bug Type: Bad behavior, security, privacy good first issue Good for newcomers help wanted Issue needs an assignee C-unicode Component: Props, sets, tries labels Aug 18, 2023

skius mentioned this issue Aug 21, 2023

CodePointInversionList JSON serialization cannot represent all code points #3892

Closed

skius mentioned this issue Aug 29, 2023

Stabilize UnicodeSet parsing #3959

Open

sffc added C-transliterator Component: transliterator and removed C-unicode Component: Props, sets, tries labels Oct 5, 2023

sffc added this to the Backlog ⟨P4⟩ milestone Oct 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeSet parser does not support all code points #3893

UnicodeSet parser does not support all code points #3893

skius commented Aug 18, 2023

UnicodeSet parser does not support all code points #3893

UnicodeSet parser does not support all code points #3893

Comments

skius commented Aug 18, 2023