TALI: A Large Scale Quadra-Modal Dataset consisting of Temporally and Semantically Aligned Audio, Language and Images
Welcome to TALI, a large-scale quadra-modal dataset consisting of temporally and semantically aligned audio, language, and images. This dataset is assembled from YouTube and Wikipedia, offering rich multimodal data points for various research areas.
- TALI integrates YouTube media components (video and audio), YouTube text components (title, description, subtitles), and Wikipedia components (image and context). These components have been temporally aligned and semantically integrated.
- Multiple language support enhances the global comprehension and capabilities of the TALI dataset.
For a Gradio visualization of the full dataset please go to this link
For the default install use:
pip install git+https://github.com/AntreasAntoniou/TALI
For the dev install use:
pip install git+https://github.com/AntreasAntoniou/TALI[dev]
To get started with TALI, you can load the dataset via Hugging Face's datasets
library through our helper functions. The reason we don't use datasets
directly is because we found huggingface_hub downloads much faster and reliable. For a full set of possible configurations look at examples.py. Here's a basic usage example: