Natural-Language-Processing-with-NLTK-and-Scikit-learn

This is a sentiment analysis project using NLTK and Scikit-learn libraries. In this jupyter notebook, well-known machine learning algorithms are trained on a Twitter dataset and then applied to the "The National University of Singapore SMS Corpus dataset".

Finally, the percentage of positive and negative messages is compared based on 10 different countries.

Requirements

All python packages needed are listed in requirements.txt file and can be installed simply using the pip command.

Data

Public access to the dataset is provided by The National University of Singapore. This dataset contains 67,093 text messages (SMSs) taken from the corpus on Mar 9, 2015 and mostly is comprised of messages from Singaporeans and students attending the University. You can download it from this.

Roadmap

Preprocessing:

Changing to lowercase and removing punctuation,
Removing empty messages
Tokenizing the messages
Removing stopwords
Creating bag of word(BOW)
Vectorizing And for better visualization of the dataset, the word clouds of positive and negative sets are plotted.

Training the classifiers:

Eight well-known machine learning classifiers are trained on the Twitter dataset, and the accuracy of the validation set is printed in a table. The models are built with the Scikit-Learn library.

Testing and printing the results:

The classifiers are applied to the dataset from the National University of Singapore, and the calculated predicted negative and positive percentages are printed for the entire dataset as well as for each country.

Acknowledgment

This project is done as the proposed portfolio of https://www.codecademy.com/learn/paths/natural-language-processing.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
images		images
Natural_Language_Processing_with_NLTK_and_Sklearn.ipynb		Natural_Language_Processing_with_NLTK_and_Sklearn.ipynb
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Natural-Language-Processing-with-NLTK-and-Scikit-learn

Requirements

Data

Roadmap

Preprocessing:

Training the classifiers:

Testing and printing the results:

Acknowledgment

About

Releases

Packages

Languages

Sepideh-Adamiat/Natural-Language-Processing-with-NLTK-and-Scikit-learn

Folders and files

Latest commit

History

Repository files navigation

Natural-Language-Processing-with-NLTK-and-Scikit-learn

Requirements

Data

Roadmap

Preprocessing:

Training the classifiers:

Testing and printing the results:

Acknowledgment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages