Skip to content

Latest commit

 

History

History
72 lines (42 loc) · 2.5 KB

README.md

File metadata and controls

72 lines (42 loc) · 2.5 KB

DSCI 691 NLP Group Project

That's What Who Said

This repo contains the code for a collaborative project between 3- Drexel University Grad students, that sought to create a tool to identify speakers in multi-party dialogues using text-based features.

Our motivation was to determine who said what

It's important to note that the FCC requirements for speaker identification in closed captioning are intended to ensure accessiblity and equal participation for individuals with hearing impairmaents. By accurately identifying speakers, viewers who rely on closed captionin can better understand and follow conversations, enhancing their overall viewing experience

Description

We conducted five experiments using two pre-trained transformer-based modles (DistilBERT and RoBERTa) to predict if the speaker of a line of dialogue from the television show, The Office, was either "Dwight" or "Not Dwight".

THE QUESTION:

  • "Dwight" or "Not Dwight"?
TASK
DATA
DATA PREPROCESSING & VISUALIZATION
MACHINE LEARNING
PROJECT REPORT

Installation & Usage

To rerun this project

  1. Clone this repository
  2. Set up your project directories as per the file tree (below)
  3. Model files for all five models are 4.9GB, so download with caution.
  4. step through the 02_transformer_model.ipynb

Credits

Thanks to the good people at:

Deepnote

Kaggle

Hugging Face

Our Team

Kelsey Fox

Justin Minnion

Chris Chavez

License

Please refer to:

Hugging Face Privacy Policy for Hugging Face's consent to the terms of usage of their products.

Kaggle Privacy Policy for Kaggle's consent to the terms of usage of their products.