Skip to content

Model Compression with NNCF

Disty0 edited this page Jun 25, 2024 · 31 revisions

Usage

  1. Use Diffusers backend. Execution & Models -> Execution backend
  2. Go into Compute Settings
  3. Enable Compress Model weights with NNCF options
  4. Reload the model.

Note: VAE Upcast (in Diffusers settings) has to be set to false if you use the VAE option.
If you get black images with SDXL models, use the FP16 Fixed VAE.

Features

  • Uses INT8, halves the model size
    Saves 3.4 GB of VRAM with SDXL

Disadvantages

  • It is Autocast, GPU will still use 16 Bit to run the model and will be slower
  • Not implemented in Original backend
  • Fused projections are not compatible with NNCF
  • Using Loras will make generations slower

Options

These results compares NNCF 8 bit to 16 bit.

  • Model:
    Compresses UNet or Transformers part of the model.
    This is where the most memory savings happens for Stable Diffusion.

    SDXL: 2500 MB~ memory savings.
    SD 1.5: 750 MB~ memory savings.
    PixArt-XL-2: 600 MB~ memory savings.

  • Text Encoder:
    Compresses Text Encoder parts of the model.
    This is where the most memory savings happens for PixArt.

    PixArt-XL-2: 4750 MB~ memory savings.
    SDXL: 750 MB~ memory savings.
    SD 1.5: 120 MB~ memory savings.

  • VAE:
    Compresses VAE part of the model.
    Memory savings from compressing VAE is pretty small.

    SD 1.5 / SDXL / PixArt-XL-2: 75 MB~ memory savings.

Clone this wiki locally