Skip to content

AntreasAntoniou/TALI

Repository files navigation

TALI: A Large Scale Quadra-Modal Dataset consisting of Temporally and Semantically Aligned Audio, Language and Images

Welcome to TALI, a large-scale quadra-modal dataset consisting of temporally and semantically aligned audio, language, and images. This dataset is assembled from YouTube and Wikipedia, offering rich multimodal data points for various research areas.

Characteristics of TALI

  • TALI integrates YouTube media components (video and audio), YouTube text components (title, description, subtitles), and Wikipedia components (image and context). These components have been temporally aligned and semantically integrated.
  • Multiple language support enhances the global comprehension and capabilities of the TALI dataset.

For a Gradio visualization of the full dataset please go to this link

Getting Started

Installation

For the default install use:

pip install git+https://github.com/AntreasAntoniou/TALI

For the dev install use:

pip install git+https://github.com/AntreasAntoniou/TALI[dev]

To get started with TALI, you can load the dataset via Hugging Face's datasets library through our helper functions. The reason we don't use datasets directly is because we found huggingface_hub downloads much faster and reliable. For a full set of possible configurations look at examples.py. Here's a basic usage example: