This repository contains the implementation of Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning.
Note
We used stable diffusion 1.5. in our experiments, but it has been deleted from huggingface (as of September 2, 2024).
- Camera-ready version is released on arXiv
- Our paper has been accepted by BMVC2024 (accepted papers list)
The environments of our experiments are based on PyTorch1.13.1 (docker image)
Pull docker image.
docker pull pytorch/pytorch:1.13.1-cuda11.6-cudnn8-runtime
Install other packages.
pip install -r requirements.txt
Save the stable diffusion 1.5 pipeline (text encoder, tokenizer, scheduler, VAE, and U-Net).
python load_save.py --pipeline runwayml/stable-diffusion-v1-5 --save_dir models/sd-15
in case stable diffusion 1.4
python load_save.py --pipeline CompVis/stable-diffusion-v1-4 --save_dir models/sd-14
Store the prepared images in a directory. Supported .png
, jpg
, and .jpeg
.
└── ds
└── church
├── church-01.jpg
├── church-02.png
├── church-03.jpg
└── church-04.jpeg
Run following command for training (erasing).
python train.py --concept "Eiffel Tower" --concept_type object --save eiffel-tower --data ds/church --local --text_encoder_path models/sd-14/text_encoder --diffusion_path models/sd-14 --epochs 4
Erased models are stored like below.
.
└── eiffel-tower
├── epoch-0
├ ├── pytorch_model.bin
├ └── config.json
├── epoch-1
├ ├── pytorch_model.bin
├ └── config.json
├── epoch-2
├ ├── pytorch_model.bin
├ └── config.json
├── epoch-3
├ ├── pytorch_model.bin
├ └── config.json
├──loss.csv
└──loss.png
inference (PNDM Scheduler and 100 inference steps)
python infer.py "a photo of Eiffel Tower." eiffel-tower/epoch-3 --tokenizer_path models/sd-14/tokenizer --unet_path models/sd-14/unet --vae_path models/sd-14/vae
or
python infer.py "a photo of Eiffel Tower." eiffel-tower/epoch-3 --model_name CompVis/stable-diffusion-v1-4
this command use the Stable Diffusion 1.4 except the text encoder.
The preprint can be cited as follows
@misc{fuchi2024erasing,
title={Erasing Concepts from Text-to-Image Diffusion Models with Few-shot Unlearning},
author={Masane Fuchi and Tomohiro Takagi},
year={2024},
eprint={2405.07288},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
This implementation is based on Textual Inversion using diffusers.
Baselines are as follows: