Skip to content

MNeMoNiCuZ/florence2-caption-batch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Florence2 Caption Batch

This tool uses the VLM Florence2 from Microsoft to caption images in an input folder. Thanks to their team for training this great model.

It's a very fast and fairly robust captioning model that can produce good outputs in 3 different levels of detail.

Requirements

  • Python 3.10 or above.

    • It's been tested with 3.10, 3.11 and 3.12.
    • It does not work with 3.8.
  • Cuda 12.1.

    • It may work with other versions. Untested.

To use CUDA / GPU speed captioning, you'll need ~6GB VRAM or more.

Setup

  1. Create a virtual environment. Use the included venv_create.bat to automatically create it. Use python 3.10 or above.
  2. Install the libraries in requirements.txt. pip install -r requirements.txt. This is done by step 1 when asked if you use venv_create.
  3. Install Pytorch for your version of CUDA. It's only been tested with version 12.1 but may work with others.
  4. Open batch.py in a text editor and change the BATCH_SIZE = 7 value to match the level of your GPU.

For a 6gb VRAM GPU, use 1.

For a 24gb VRAM GPU, use 7.

How to use

  1. Activate the virtual environment. If you installed with venv_create.bat, you can run venv_activate.bat.
  2. Run python batch.py from the virtual environment.

This runs captioning on all images in the /input/-folder.

Detail Mode

You can edit the variable DETAIL_MODE to 1, 2 or 3.

Here's an example:

airplane 001

DETAIL_MODE = 1:

A toy airplane flying through the clouds in the sky.

DETAIL_MODE = 2:

The image shows a toy airplane flying through the sky with white fluffy clouds in the background.

DETAIL_MODE = 3:

The image shows a toy airplane flying above the clouds. The airplane is made of gray yarn and has two propellers on either side. It appears to be in mid-flight, with its wings spread wide and its nose pointing upwards. The clouds below are white and fluffy, and the sky is a light blue with a few wispy clouds. In the background, there is a body of water visible. The overall mood of the image is peaceful and serene.

Credits

Thanks Gökay Aydoğan for helping me with the scripts.

About

A batch captioning tool using florence2

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published