Skip to content

28andrew/KeyboardVideoToText

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KeyboardVideoToText

Final project for CPSC 185.

Fine-tuning a Qwen2.5-Omni-3B model to predict typed text from overhead video, with audio, of typing on a keyboard.

Environment

  • The Conda environment is given at environment.yml. Create the environment with conda env create -f environment.yml.

Dataset Creation

  • 0training/collect_gui.py is a GUI Python program that prompts sentences to type and records video and keystroke data.
  • Our keyboard video dataset, with full videos and keystroke timing data, is available at this HuggingFace dataset.
Sample Data

Text: The source said if approved, the authority would allow a transaction to be carried out.

788.mp4

Training

  • Install llamafactory via pip install -e ".[torch,metrics]" in the 1training/LLaMA-Factory directory.
  • Use 1training/0train.ipynb to generate the augmented dataset and ensure that it's at 1training/LLaMA-Factory/data/keyboard_videos. Look at the relative paths in keyboard.json to understand the directory structure for the .mp4 and .wav files.
  • Run training via 1training/train.sh which uses the configuration at 1training/train_keyboard.yml.

About

Creating a model to predict the text typed from an overhead video of typing on a keyboard.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published