Skip to content

Latest commit

 

History

History
23 lines (13 loc) · 1.72 KB

README.md

File metadata and controls

23 lines (13 loc) · 1.72 KB

Spotify Automatic Playlist Continuation

The final project for the Big Data Course A.Y. 2022/2023 at the University of Rome la Sapienza.

The project involves implementing 3 different techniques to solve the Spotify Million Dataset Playlist Challenge, hosted on AICrowd.

The methods are implemented using Pyspark in order for the data to work on a distributed system.

There is also a re-implementation of the MMCF: Multimodal Collaborative Filtering for Automatic Playlist Continuation by the "Hello World" team that classified in 2nd place in the challenge. The re-implementation consists in converting the Neural Network from Tensorflow v1 to PyTorch, and using Petastorm to create a PyTorch DataLoader from a Pyspark DataFrame in order to keep the data distributed.

The folder structure is the following:

  • core: contains the notebooks and other files that constitute the core algorithms that implement the recommender systems;
  • slides: contains the source code for the presentation made using Slidev.
  • webapp: contains the code for a demo app built with Vite + React + FastAPI that showcase the usage of the system;

Demo of the web app

demo.mp4

Tech Stack

Python • PySpark • PyTorch • Petastorm • Typescript • MinIO • React • Tailwind • FastAPI • Vite • MongoDB • Docker • Slidev