Skip to content

Zu1uDe1ta/thats-what-who-said

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DSCI 691 NLP Group Project

That's What Who Said

This repo contains the code for a collaborative project between 3- Drexel University Grad students, that sought to create a tool to identify speakers in multi-party dialogues using text-based features.

Our motivation was to determine who said what

It's important to note that the FCC requirements for speaker identification in closed captioning are intended to ensure accessiblity and equal participation for individuals with hearing impairmaents. By accurately identifying speakers, viewers who rely on closed captionin can better understand and follow conversations, enhancing their overall viewing experience

Description

We conducted five experiments using two pre-trained transformer-based modles (DistilBERT and RoBERTa) to predict if the speaker of a line of dialogue from the television show, The Office, was either "Dwight" or "Not Dwight".

THE QUESTION:

  • "Dwight" or "Not Dwight"?
TASK
DATA
DATA PREPROCESSING & VISUALIZATION
MACHINE LEARNING
PROJECT REPORT

Installation & Usage

To rerun this project

  1. Clone this repository
  2. Set up your project directories as per the file tree (below)
  3. Model files for all five models are 4.9GB, so download with caution.
  4. step through the 02_transformer_model.ipynb

Credits

Thanks to the good people at:

Deepnote

Kaggle

Hugging Face

Our Team

Kelsey Fox

Justin Minnion

Chris Chavez

License

Please refer to:

Hugging Face Privacy Policy for Hugging Face's consent to the terms of usage of their products.

Kaggle Privacy Policy for Kaggle's consent to the terms of usage of their products.


About

DSCI 691 NLP Group Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published