This project was done during SibaqLahja. It was awarded as the best diacritizer. we present a new public diacritized dataset for Gulf Arabic in accordance to the pronounciation of the city of Dubai in the United Arab Emirates (UAE). The dataset is a 19,850 words subset of the Gumar corpus (Khalifa et al., 2018), which is composed of roughly 200 thousand words from Emirati internet novels.
A machine learning model that adds diacritics to Emirati text.
Use the package manager pip to install foobar.
pip install -r requirements.txt
Add any nondiacritized emirati text in quotatioins after main.py
python main.py 'السلام عليكم'