Multinomial Naive Bayes Language Classification Model

This repository provides a tutorial on implementing language classification using the Multinomial Naive Bayes algorithm. The tutorial includes a Python implementation to detect the language of a given text. The code consists of two main files: main.py for user interaction and detector.py containing the LanguageClassifier class.

Overview

The Multinomial Naive Bayes algorithm is widely used for text classification tasks, including language identification. This tutorial demonstrates how to train a language classifier using a provided dataset and then use the trained model to predict the language of input text.

Prerequisites

Before running the code, ensure you have the following dependencies installed:

Python
Required libraries: requests, bs4, pandas, scikit-learn, joblib

Install the necessary dependencies using the following command:

pip install requests bs4 pandas scikit-learn joblib

Usage

Clone the Repository:

git clone https://github.com/vivekkdagar/NaiveBayesClassifier.git
cd NaiveBayesClassifier

Run the Main Script:
```
python3 main.py
```
Select Data Source and input data:
- Choose the mode ('raw', 'file', or 'website') to input text data.
Results:
- The predicted language for the provided text will be displayed.

Code Structure

main.py: Handles user interaction and data input.
detector.py: Contains the LanguageClassifier class responsible for training and predicting languages.

Data Preprocessing

The LanguageClassifier class preprocesses the training data by removing special characters and transforming the text into a bag-of-words representation using the CountVectorizer from scikit-learn.

Training the Model

The tutorial uses a provided dataset, "Language Detection.csv," to train the Multinomial Naive Bayes model. The model is then serialized using the joblib library for future use.

Additional Notes

To modify or extend the training dataset, edit the "Language Detection.csv" file.
Adjust the HTML tag in the scrape_website function within main.py based on your specific use case.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
language_classifier		language_classifier
.gitignore		.gitignore
LICENSE		LICENSE
Language Detection.csv		Language Detection.csv
README.md		README.md
main.py		main.py
model.joblib		model.joblib

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multinomial Naive Bayes Language Classification Model

Overview

Prerequisites

Usage

Code Structure

Data Preprocessing

Training the Model

Additional Notes

References

About

Languages

License

vivekkdagar/NaiveBayesClassifier

Folders and files

Latest commit

History

Repository files navigation

Multinomial Naive Bayes Language Classification Model

Overview

Prerequisites

Usage

Code Structure

Data Preprocessing

Training the Model

Additional Notes

References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages