Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ti: Added training data for the Tigrinya language. #615

Merged
merged 1 commit into from
Feb 27, 2022

Conversation

babraham123
Copy link
Contributor

The Tigrinya language is based off of the Ge'ez script and is similar to Amharic (its spoken by over 10 million people). The dictionary is based off of the Nagaoka Tigrinya Corpus, cited below.

Tedla, Y. K., Yamamoto, K., & Marasinghe, A. (2016). Nagaoka Tigrinya Corpus: Design and development of part-of-speech tagged corpus. Nagaoka University of Technology, 1-4.

I've also included a smaller, less clean corpus that's generated from old newspapers and books. The words sometimes include punctuation and numbers.

Grammar: https://en.wikipedia.org/wiki/Tigrinya_grammar
Full list of punctuation: https://gist.github.com/babraham123/d1b597b9a5b293f332ff5613db0df3dc
Fonts: http://www.wazu.jp/gallery/Fonts_Ethiopic.html

@rkcosmos rkcosmos merged commit 0e395ab into JaidedAI:master Feb 27, 2022
@babraham123
Copy link
Contributor Author

Thanks @rkcosmos !
Do you know when a Tigrinya OCR model might become available?

thuc-moreh pushed a commit to moreh-dev/EasyOCR that referenced this pull request Jul 5, 2023
ti: Added training data for the Tigrinya language.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants