Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arabic Sentiment Analysis in the Ar-PHP library #1

Open
khaled-alshamaa opened this issue May 5, 2021 · 16 comments
Open

Arabic Sentiment Analysis in the Ar-PHP library #1

khaled-alshamaa opened this issue May 5, 2021 · 16 comments

Comments

@khaled-alshamaa
Copy link

Dear Zaid and ARBML team,

Thanks for your initiative to support Arabic people who are working on NLP and need guidance or would like to brainstorm.

Well, let me introduce myself, my name is Khaled Al-Shamaa, and I am working on the PHP and Arabic language library as a side project since 2006. This library provides a set of tools that enable Arabic website developers to provide a professional search, present, and process Arabic content in PHP.

Recently we put a goal for 2021 to introduce few Arabic NLP functionalities to our library, and we started with the Arabic sentiment analysis. But we followed a bit different approach (rather than BERT models like your Arabic mobileBERT) seeking simplicity and the ability to work using minimum resources (our model is less than 30Kb in size and deployed perfectly fine even in interpreted languages like PHP or JavaScript).

We used a pragmatic approach by accepting that all the words in the first language spoken by the Semitic peoples consisted of bi-radicals (i.e., two sounds/letters). Therefore, we can handle the majority of Arabic word roots as being expanded by the addition of a third letter, with the resulting meaning having a semantic relation to the original bi-radical.

We built a statistical log-odds scores model to tag the tone (positive, negative) of each word (actually two letters root for that word) using a dataset published on Kaggle that includes 100k Arabic reviews from hotels, books, movies, products, and a few airlines.

We would like to get an external review for our model accuracy and performance by your team to see how decent this model is. It will be a great opportunity if we can collaborate to bring more open-source NLP tools into the hands of Arabic web developers.

Best regards,
Khaled

@zaidalyafeai
Copy link
Contributor

Hey @khaled-alshamaa, big fan of your approach. I don't quite understand this paragraph though

We used a pragmatic approach by accepting that all the words in the first language spoken by the Semitic peoples consisted of bi-radicals (i.e., two sounds/letters). Therefore, we can handle the majority of Arabic word roots as being expanded by the addition of a third letter, with the resulting meaning having a semantic relation to the original bi-radical.

@khaled-alshamaa
Copy link
Author

Well, the following reference may give you some background:
The biradical origin of semitic roots

@zaidalyafeai
Copy link
Contributor

@khaled-alshamaa sounds really interesting. How do you preprocess text for sentiment analysis ? Do you have to find the bi-radical root ?

@khaled-alshamaa
Copy link
Author

Well @zaidalyafeai, preprocessing text to find the bi-radical root still a tricky part of the approach that I used to tackle this challenge and may have room to improve/enhance it.

Keep in mind that I am trying my best to avoid any method that depends on mapping or dictionary (e.g., bag of words/tokens). We took that decision to maintain the small size of the model. Therefore, we rely on statistical criteria to find the most probable root (or at least the most significant/rare pairs of letters in any word)!

I know this may look weird from a lingual point of view (or even not accepted). But it works to some extent! That is why I am seeking this brainstorming to get some review and thoughts out of the box ;-)

So before jumping to more details about the algorithm itself (at the end of the day, it may not be that promising). I want to get an external review for its performance and accuracy to see if it deserves more awareness and efforts to improve and explore the potential opportunities.

@khaled-alshamaa
Copy link
Author

Dear @zaidalyafeai, you mentioned in this tweet that your mobileBERT model for sentiment analysis is 83 MB in size, achieves 95%, and can be tested online in the browser using this TensorFlow.js implementation.

I noticed from your tweet and demo page that you used the HARD: Hotel Arabic-Reviews Dataset in training and testing your model. Therefore, I tested my tiny model (less than 30 KB) on the same/unseen dataset, and it was able to achieve 82% on the balanced reviews dataset (in total 105,698 reviews).

On the other hand, you know how much training data quality is critical in AI/ML era. Unfortunately, the lack of such high-quality resources for the Arabic language is a big challenge. For example, the first review in the mentioned dataset set was "ممتاز النظافة والطاقم متعاون" while the rating was 2 (i.e., negative)!

By the way, my model does not show overfitting symptoms in this case and tagged it correctly as positive (even though it will count as an error when we estimate the accuracy because the predicted value differs from the actual one).

I understand the cultural dimension that may affect the quality of this collected data (e.g., miss-understand how to interpret the best, is it 1 or 5). Also, I am aware of the bias because of the training dataset source (e.g., hotels vs. products reviews). All these issues highlight the importance of having well-curated and more diverse sources of training datasets. In this territory, data and not algorithms are the key factor to get better NLP models.

@khaled-alshamaa
Copy link
Author

The following comment/discussion can show you a more obvious example of the domain bias (e.g., hotel service vs. product quality) in sentiment analysis of reviews:

https://io.hsoub.com/webdev/115826/comment/550311

@zaidalyafeai
Copy link
Contributor

Dear @khaled-alshamaa , regarding some of the examples you shared in the discussion I think it is important to show a score of confidence i.e the sentiment is positive with probability 90%. Regarding data, unfortunately, a lot of data in Arabic is not as clean compared to the English counterparts. Regardless, a lot of models even deep ones are still vulnerable to a lot of examples even if they show high accuracy on curated splits.

@khaled-alshamaa
Copy link
Author

Dear @zaidalyafeai, thanks for your suggestion to show a score of confidence in terms of probability. The related code updates are already committed to the Ar-PHP project repository this morning (May 9, 2021). It will be released in the next version 6.2 of the library.

@zaidalyafeai
Copy link
Contributor

@khaled-alshamaa , is there a way to try a demo online?

@khaled-alshamaa
Copy link
Author

Dear @zaidalyafeai , I am sorry to be late in response to your request because we at the Ar-PHP project used to provide the backend library for website developers to process Arabic content rather than develop frontend demo applications.

I converted the whole model parameters and the algorithm of the query function into JavaScript language to give you an easy way to test our Arabic sentiment analysis model online (or even offline). You can check it here.

@zaidalyafeai
Copy link
Contributor

@khaled-alshamaa very nice work, tested with some queries, very impressive for a tiny model as yours !

@khaled-alshamaa
Copy link
Author

Dear @zaidalyafeai, I believe that it can't beat the prediction accuracy of BERT models because it is a context-free model (i.e., just like word2vec). Actually, you can look at it as a kind of customized version of the word2vec model with a massive reduction in the bag-of-words size (only bi-radical) and dimensions (only two for positive and negative sentiments).

I don't know if it is worth adding a simple rule-based mechanism to handle the case of having negation words to provide a kind of minimum context sensing (but still significant in phrases like لست سعيدا or لست خائب الظن).

@zaidalyafeai
Copy link
Contributor

Sounds like a good addition, not sure if there might be any specific instances where this might cause problems cc @MagedSaeed.

@khaled-alshamaa
Copy link
Author

Dears @zaidalyafeai and @MagedSaeed, I did a quick test of adding a simple rule-based mechanism to handle the case of having negation words (i.e., to provide a kind of minimum context sensing) looks promising.

Results show a %2 improvements in sentiment prediction accuracy on the HARD dataset by checking for the following 6 words only in the rules (even without process all possible lexical forms):
لا، لن، لم، ما، غير، سوى

I updated the online demo using the new version of the model and processing algorithm for further testing by external auditors (you can test the score of سيء جدا vs. ليس سيء جدا). Also, this pilot experiment will be followed up by more formal and comprehensive negation word processing in the model.

Well, all in all, do you believe that such an algorithm/approach we implement in this model deserves to publish in a scientific paper? What is your advice?

@zaidalyafeai
Copy link
Contributor

I checked the demo, it seems to be working better, good job ! Regarding publishing your findings, you can either target Arabic-focus workshops (wanlp), open source workshops (nlp-oss) or demo tracks. I would suggest you check out the demo track of emnlp https://2021.emnlp.org/call-for-papers/demos, which has a deadline July 1, 2021.

The EMNLP 2021 System Demonstration Program Committee invites proposals for the Demonstrations Program. Demonstrations may range from early research prototypes to mature production-ready systems. Of particular interest are publicly available open-source or open-access systems.

@khaled-alshamaa
Copy link
Author

Dears @zaidalyafeai and @MagedSaeed, I tested the model against another dataset (AraCust: a Saudi Telecom Tweets corpus for sentiment analysis) and got interesting results where Accuracy was only 54% while Recall was up to 70%! In other words, this context-free model still reasonably OK to predict correctly the actual positive cases while it performed terribly in detecting Type-1 errors.

Please note that corpus has 20K tweets (i.e., rather than formal reviews), so it might be related to some cultural factors and/or complicated synthesis structure that we are using in Arabic when we would like to express negative feedback in public using a polite/diplomacy way which tends to be hard to detect by such a simple model like ours.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants