Skip to content

Code for the toxic comment classification in Kaggle

Notifications You must be signed in to change notification settings

david1542/toxic-comments

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

This project contains code for the Toxic Comment Classification Challenge in Kaggle.

The goal of the competition is to identify and classify toxic online comments.

Prerequisites

You need to install poetry before moving forward. Follow the instructions in this link.

Installation

  1. Clone this repo:
git clone https://github.com/david1542/toxic-comments.git
  1. Install the dependencies:
poetry install
  1. Authenticate to Kaggle CLI. Follow these instructions.
  2. Downgrade PyTorch to 1.12.1, since in later versions there are mismatches in the CUDA drivers (issue):
pip install torch==1.12.1+cu113 -f https://download.pytorch.org/whl/torch_stable.html
  1. Run this script to download the data:
./scripts/download_data.sh

Train

Hydra is used as a configuration manager. Simply run the train.py script and edit the parameters as you like:

python src/train.py training_args.learning_rate=1e-3 training_args.num_train_epochs=5

For more information about the parameters, go to configs/train.yaml.

Articles

Some nice articles that I've found while working on this problem:

  • Nice article about multi label classification.
  • Some technical tips about fine tuning transformers for a multi label problem.

About

Code for the toxic comment classification in Kaggle

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages