Welcome to the VideoLlama_X_OOPS project! 🎥🦙
This repository explores the application of the VideoLlama model to YouTube fail videos extracted from the OOPS dataset. Leveraging prompt engineering techniques from three influential papers ( Large Language Models are Zero-Shot Reasoners, LARGE LANGUAGE MODELS ARE HUMAN-LEVEL PROMPT ENGINEERS, The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) ), I aim to unlock the latent capabilities of Language Models (LLMs) to extract key events in a video.
The dataset used in this project is OOPS (oops! Predicting Unintentional Action in Video). Specifically, I filtered out videos from the OOPS dataset that depict failed actions.
The detailed experiment results and observations can be found in the result docs and exp_data.json file. It's worth noting that the documentation includes additional experiment results that are not present in the JSON file.
I employed the VideoLlama model's Hugging Face interface to generate and fine-tune prompts. The refined prompts were crucial in finding out the model's performance and improvement sectors on identifying and understanding fail events in the videos.
If you use the experiment results from this repository in your work, please consider citing this project. Thank you <3
Feel free to reach out if you have any questions or suggestions!
- 📧 Email: hasnatabdullah79@gmail.com
- 💼 LinkedIn: Hasnat Md Abdullah Happy experimenting! 🚀