Skip to content

Prompt engineering videollama model on youtube fail videos to extract unusual activity descriptions from oops dataset.

Notifications You must be signed in to change notification settings

Hasnat79/videollama_x_oops_prompt_eng

Repository files navigation

VideoLlama_X_OOPS_Prompt_Eng

Welcome to the VideoLlama_X_OOPS project! 🎥🦙

Overview

This repository explores the application of the VideoLlama model to YouTube fail videos extracted from the OOPS dataset. Leveraging prompt engineering techniques from three influential papers ( Large Language Models are Zero-Shot Reasoners, LARGE LANGUAGE MODELS ARE HUMAN-LEVEL PROMPT ENGINEERS, The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) ), I aim to unlock the latent capabilities of Language Models (LLMs) to extract key events in a video.

Dataset - OOPS

The dataset used in this project is OOPS (oops! Predicting Unintentional Action in Video). Specifically, I filtered out videos from the OOPS dataset that depict failed actions.

Experiment Results

The detailed experiment results and observations can be found in the result docs and exp_data.json file. It's worth noting that the documentation includes additional experiment results that are not present in the JSON file.

Experimental Process

I employed the VideoLlama model's Hugging Face interface to generate and fine-tune prompts. The refined prompts were crucial in finding out the model's performance and improvement sectors on identifying and understanding fail events in the videos.

Citation

If you use the experiment results from this repository in your work, please consider citing this project. Thank you <3

Contact

Feel free to reach out if you have any questions or suggestions!

About

Prompt engineering videollama model on youtube fail videos to extract unusual activity descriptions from oops dataset.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published