Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving some transliteration datagen logic into the crate #4029

Merged
merged 20 commits into from
Sep 19, 2023

Conversation

robertbastian
Copy link
Member

@robertbastian robertbastian commented Sep 12, 2023

This adds a RuleCollection type to the icu_transliterate crate (datagen feature). Users register source rules with a BCP-47 name and a list of aliases on this, and once all required sources have been registered, it can be used as a data provider and on-the-fly (cached) compiles sources to data structs.

Datagen is updated to use this, it registers all transliterators in CLDR.

Baked data is removed, as there's no transliteration data in CLDR 43. Clients will be able to use RuleCollection as a data provider in 1.3.

#3961
#3998
#4056
cc @skius

sffc
sffc previously approved these changes Sep 18, 2023
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Praise: I like the overall structure of this refactoring. I think RuleCollection is useful to have available at runtime.

Comment on lines +100 to +101
reverse: bool,
visible: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Avoid boolean positional arguments because they are not readable at the call site; prefer enums

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be addressed in later polishing, there's no good place for these enums on the public API right now.


/// The kind of compile error that occurred.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
enum CompileErrorKind {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought: Not convinced about moving this error out of the error module

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type does not exist on the public API, as the RuleCollection can only return DataErrors as a provider. These errors have log information generated from this type's Debug implementation, so it really is an implementation detail of RuleCollection at the moment.

direction: Direction,
id_mapping: &HashMap<String, String>,
) -> Result<(Option<Self>, Option<Self>)>
/// ✨ *Enabled with the `compiled_data` Cargo feature.*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: do we need the stability warning for the impl Trait return value bounds?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm only if we plan on _unstable requiring fewer data keys in the future. Do we?

Manishearth
Manishearth previously approved these changes Sep 18, 2023
Manishearth
Manishearth previously approved these changes Sep 19, 2023
@robertbastian robertbastian merged commit 6775703 into unicode-org:main Sep 19, 2023
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants