Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document transliteration data struct format #3776

Closed
5 tasks done
skius opened this issue Aug 3, 2023 · 2 comments
Closed
5 tasks done

Document transliteration data struct format #3776

skius opened this issue Aug 3, 2023 · 2 comments
Assignees
Labels
C-unicode Component: Props, sets, tries T-docs-tests Type: Code change outside core library

Comments

@skius
Copy link
Member

skius commented Aug 3, 2023

The zero-copy format of rule based transliterators (introduced in #3775) should be documented more.

  • How rule encoding works with private use code points
  • How cursor offsets work with special replacers (backrefs, function calls)
  • How backrefs are encoded ($n is n code points after the last code point that maps to a VarZeroVec in VarTable, e.g., if the only specials are a UnicodeSet, a segment, and a backref $1, then $1 would be the third private use code point (first is segment, second is UnicodeSet)))
  • How anchors are encoded (two reserved, hardcoded private use values for start and end)
  • What the {id/rule}_group_list fields are and how they need to be interpreted
@skius skius added T-docs-tests Type: Code change outside core library C-unicode Component: Props, sets, tries labels Aug 3, 2023
@skius skius self-assigned this Aug 3, 2023
@skius skius mentioned this issue Aug 3, 2023
41 tasks
@skius
Copy link
Member Author

skius commented Aug 29, 2023

Include this in documentation: #3891

@skius
Copy link
Member Author

skius commented Sep 1, 2023

@skius skius closed this as completed Sep 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-unicode Component: Props, sets, tries T-docs-tests Type: Code change outside core library
Projects
None yet
Development

No branches or pull requests

1 participant