Sentiment Analysis of Patients' Blogs from various online forums and analysing them into categories
Disease Exists (Neutral Sentiment)
Health Deteriorating (Negative Sentiment)
Health Recovering (Positive Sentiment)
The classification is done with the help of Naive Bayes Probabilistic Classifier. The final aim is to find the accuracies for various sets of training and testing datasets. The programming is done in R language.
Datasets Source: Online Website - https://patient.info/ (Educational purpose only)
- Clone this repository.
- Create a database consisting of two columns: Label and Blogs
In the "Label" column, the sentiment of the blog will be mentioned, i.e. Exists, Deteriorate or Recover.
In the "Blogs" column, input the blogs from any online forums, or self articulated blogs from various sources. - Open R compiler, run the entire code.
- Increase or decrease the number of times the dataset is randomized, it can help in increasing the accuracy by 10% at most.
- Try to label the dataset more accurately.
Results from the dataset considered show the sentiment scores for the given emotion (anger, anticipation, fear, ....)
Utilizing different proportions of training and testing datasets to find the accuracy changes
Accuracy verses the proportion of dataset used for training