Add `VAEImageDecoder` for StableDiffusionV3 #1796

james77777778 · 2024-08-26T14:23:18Z

Numerics check:
https://colab.research.google.com/drive/1YsWvZ0NBINDgdqipsldso1Y1NKJUDkUf?usp=sharing

Future works:

Implement CLIPPreprocessor
Wrap CLIPTextEncoder and T5XXLTextEncoder for the use in StableDiffusionV3
Implement VAEImageDecoder
Implement MMDiT
Implement StableDiffusionV3 (inference model)

@divyashreepathihalli @mattdangerw @SamanehSaadat

divyashreepathihalli

Thanks for the PR!! LGTM!

mattdangerw

lgtm! minor nits and a few design notes I don't think we need to solve in this PR

keras_nlp/src/models/stable_diffusion_v3/vae_image_decoder.py

keras_nlp/src/models/stable_diffusion_v3/vae_attention.py

mattdangerw · 2024-08-26T21:14:18Z

keras_nlp/src/models/stable_diffusion_v3/vae_image_decoder.py

+from keras_nlp.src.utils.keras_utils import standardize_data_format
+
+
+class VAEImageDecoder(Backbone):


Note that with pali gemma, our "backbone" contains all the weights needed from a pre-trained model. So in that case the image encoder and text decoder collectively form a single backbone class.

We should discuss the high level flows that we want as we go, but our current approach is...

StableDiffusionBackbone should contain all the pretrained weights for using the entire model without a specific task setup. This can come from stitching other backbones/sub models together. No preprocessing.

StableDiffusion[TaskName] would wrap the backbone with a setup for a particular task. Preprocessing included. Ideally allowing both find-tuning and inference, but that would depend on the task at hand. For stable diffusion the main task is definitely text to image, though I'm not sure what we should call that. StableDiffusionImageGenerator is kinda long.

I was unsure of how we wanted to assemble these encoders and the decoder, so I made them as a Backbone first.

We should discuss the high level flows that we want as we go, but our current approach is...

Got it. Will make the encoders and decoder as a keras.Model to follow that pattern.

I think the task name, ImageGenerator, is a bit ambiguous. Maybe we should call it TextToImage instead?
It is also possible to use SD3 for ImageToImage and Inpaint tasks.
https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/stable_diffusion_3

TextToImage sounds fine to me. Shorter.

Got it. Will make the encoders and decoder as a keras.Model to follow that pattern.

I suspect we still have more to figure out here. For these big "composite models" with lots of sub-components, it would be good if we allowed loading sub models individually some how. E.g. load the text encoder of a T5 model, or just the image encoder of PaliGemma. That's a valid use case, that fit's with the flexibility we'd like to shoot for, and we don't support it today. But a probably for another PR I think.

…in `VAEAttention`

* Add `VAEImageDecoder` for StableDiffusionV3 * Use `keras.Model` for `VAEImageDecoder` and follows the coding style in `VAEAttention`

Add VAEImageDecoder for StableDiffusionV3

c62d983

james77777778 mentioned this pull request Aug 26, 2024

Add CLIP and T5XXL for StableDiffusionV3 #1790

Merged

5 tasks

divyashreepathihalli approved these changes Aug 26, 2024

View reviewed changes

divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Aug 26, 2024

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Aug 26, 2024

mattdangerw approved these changes Aug 26, 2024

View reviewed changes

Use keras.Model for VAEImageDecoder and follows the coding style …

eee2ceb

…in `VAEAttention`

james77777778 force-pushed the add-vae-decoder branch from 7e03da3 to eee2ceb Compare August 27, 2024 14:08

divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Aug 27, 2024

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Aug 27, 2024

mattdangerw merged commit 536474a into keras-team:keras-hub Aug 28, 2024
10 checks passed

james77777778 deleted the add-vae-decoder branch August 29, 2024 02:26

mattdangerw pushed a commit that referenced this pull request Sep 11, 2024

Add VAEImageDecoder for StableDiffusionV3 (#1796)

faffd86

* Add `VAEImageDecoder` for StableDiffusionV3 * Use `keras.Model` for `VAEImageDecoder` and follows the coding style in `VAEAttention`

mattdangerw pushed a commit that referenced this pull request Sep 13, 2024

Add VAEImageDecoder for StableDiffusionV3 (#1796)

9feb2d8

* Add `VAEImageDecoder` for StableDiffusionV3 * Use `keras.Model` for `VAEImageDecoder` and follows the coding style in `VAEAttention`

mattdangerw pushed a commit that referenced this pull request Sep 17, 2024

Add VAEImageDecoder for StableDiffusionV3 (#1796)

c4627d1

* Add `VAEImageDecoder` for StableDiffusionV3 * Use `keras.Model` for `VAEImageDecoder` and follows the coding style in `VAEAttention`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `VAEImageDecoder` for StableDiffusionV3 #1796

Add `VAEImageDecoder` for StableDiffusionV3 #1796

james77777778 commented Aug 26, 2024

divyashreepathihalli left a comment

mattdangerw left a comment

mattdangerw Aug 26, 2024

james77777778 Aug 27, 2024 •

edited

Loading

mattdangerw Aug 28, 2024

		from keras_nlp.src.utils.keras_utils import standardize_data_format


		class VAEImageDecoder(Backbone):

Add VAEImageDecoder for StableDiffusionV3 #1796

Add VAEImageDecoder for StableDiffusionV3 #1796

Conversation

james77777778 commented Aug 26, 2024

divyashreepathihalli left a comment

Choose a reason for hiding this comment

mattdangerw left a comment

Choose a reason for hiding this comment

mattdangerw Aug 26, 2024

Choose a reason for hiding this comment

james77777778 Aug 27, 2024 • edited Loading

Choose a reason for hiding this comment

mattdangerw Aug 28, 2024

Choose a reason for hiding this comment

Add `VAEImageDecoder` for StableDiffusionV3 #1796

Add `VAEImageDecoder` for StableDiffusionV3 #1796

james77777778 Aug 27, 2024 •

edited

Loading