Multimodal Speech Emotion Recognition

This repository contains the code and resources for the Multimodal SER Model designed to recognize emotions from speech by combining text and acoustic data. The model fine-tunes a Multilayer Perceptron from text features extracted by DeBERTaV3 and another Multilayer Perceptron from acoustic features extracted by Wav2Vec2, then integrates these features through a Multilayer Perceptron (MLP) for improved emotion classification.

Overview

The Multimodal SER Model leverages both textual and acoustic features to classify emotions more accurately. The architecture consists of:

Initial Classification: Each model (MLP 1, respectively MLP 2) individually classifies the emotion based on their respective features (extracted with Wav2Vec2, respectively DebertaV3).
Fusion and Final Classification: The extracted features and initial predictions are combined using a Multi-Layer Perceptron (MLP 3) to provide the final emotion classification.

A research report detailing the development and evaluation of this architecture can be found at research-raport.pdf.

Setup

Clone this repository.
Create a virtual python 3.11 environment.
Set the python packet manager to version 23.3.1, using:
```
$ pip upgrade --install pip==23.3.1
```
Install the imported libraries using:
```
$ pip install requriements.txt
```

Dataset

The dataset can be downloaded from:

kaggle - unofficial version
official website - official version, available upon request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
audio		audio
core		core
dataloaders		dataloaders
fusion		fusion
preprocessing		preprocessing
scripts		scripts
text		text
vizualisers		vizualisers
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
architecture.png		architecture.png
config.yaml		config.yaml
main.py		main.py
requirements.txt		requirements.txt
research-raport.pdf		research-raport.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal Speech Emotion Recognition

Overview

Setup

Dataset

License

About

Releases

Packages

Languages

License

razvan404/multimodal-speech-emotion-recognition

Folders and files

Latest commit

History

Repository files navigation

Multimodal Speech Emotion Recognition

Overview

Setup

Dataset

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages