ti: Added training data for the Tigrinya language. #615

babraham123 · 2021-12-09T09:48:49Z

The Tigrinya language is based off of the Ge'ez script and is similar to Amharic (its spoken by over 10 million people). The dictionary is based off of the Nagaoka Tigrinya Corpus, cited below.

Tedla, Y. K., Yamamoto, K., & Marasinghe, A. (2016). Nagaoka Tigrinya Corpus: Design and development of part-of-speech tagged corpus. Nagaoka University of Technology, 1-4.

I've also included a smaller, less clean corpus that's generated from old newspapers and books. The words sometimes include punctuation and numbers.

Grammar: https://en.wikipedia.org/wiki/Tigrinya_grammar
Full list of punctuation: https://gist.github.com/babraham123/d1b597b9a5b293f332ff5613db0df3dc
Fonts: http://www.wazu.jp/gallery/Fonts_Ethiopic.html

babraham123 · 2022-03-05T23:29:26Z

Thanks @rkcosmos !
Do you know when a Tigrinya OCR model might become available?

ti: Added training data for the Tigrinya language.

ti: Added training data for the Tigrinya language.

4227d49

babraham123 mentioned this pull request Dec 9, 2021

List of languages in development #91

Closed

rkcosmos merged commit 0e395ab into JaidedAI:master Feb 27, 2022

thuc-moreh pushed a commit to moreh-dev/EasyOCR that referenced this pull request Jul 5, 2023

Merge pull request JaidedAI#615 from babraham123/tigrinya

c68cfcc

ti: Added training data for the Tigrinya language.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ti: Added training data for the Tigrinya language. #615

ti: Added training data for the Tigrinya language. #615

babraham123 commented Dec 9, 2021

babraham123 commented Mar 5, 2022

ti: Added training data for the Tigrinya language. #615

ti: Added training data for the Tigrinya language. #615

Conversation

babraham123 commented Dec 9, 2021

babraham123 commented Mar 5, 2022