-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add VAEImageDecoder
for StableDiffusionV3
#1796
Add VAEImageDecoder
for StableDiffusionV3
#1796
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!! LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! minor nits and a few design notes I don't think we need to solve in this PR
from keras_nlp.src.utils.keras_utils import standardize_data_format | ||
|
||
|
||
class VAEImageDecoder(Backbone): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that with pali gemma, our "backbone" contains all the weights needed from a pre-trained model. So in that case the image encoder and text decoder collectively form a single backbone class.
We should discuss the high level flows that we want as we go, but our current approach is...
StableDiffusionBackbone
should contain all the pretrained weights for using the entire model without a specific task setup. This can come from stitching other backbones/sub models together. No preprocessing.StableDiffusion[TaskName]
would wrap the backbone with a setup for a particular task. Preprocessing included. Ideally allowing both find-tuning and inference, but that would depend on the task at hand. For stable diffusion the main task is definitely text to image, though I'm not sure what we should call that.StableDiffusionImageGenerator
is kinda long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unsure of how we wanted to assemble these encoders and the decoder, so I made them as a Backbone
first.
We should discuss the high level flows that we want as we go, but our current approach is...
Got it. Will make the encoders and decoder as a keras.Model
to follow that pattern.
I think the task name, ImageGenerator
, is a bit ambiguous. Maybe we should call it TextToImage
instead?
It is also possible to use SD3 for ImageToImage
and Inpaint
tasks.
https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/stable_diffusion_3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TextToImage
sounds fine to me. Shorter.
Got it. Will make the encoders and decoder as a keras.Model to follow that pattern.
I suspect we still have more to figure out here. For these big "composite models" with lots of sub-components, it would be good if we allowed loading sub models individually some how. E.g. load the text encoder of a T5 model, or just the image encoder of PaliGemma. That's a valid use case, that fit's with the flexibility we'd like to shoot for, and we don't support it today. But a probably for another PR I think.
…in `VAEAttention`
7e03da3
to
eee2ceb
Compare
* Add `VAEImageDecoder` for StableDiffusionV3 * Use `keras.Model` for `VAEImageDecoder` and follows the coding style in `VAEAttention`
* Add `VAEImageDecoder` for StableDiffusionV3 * Use `keras.Model` for `VAEImageDecoder` and follows the coding style in `VAEAttention`
* Add `VAEImageDecoder` for StableDiffusionV3 * Use `keras.Model` for `VAEImageDecoder` and follows the coding style in `VAEAttention`
* Add `VAEImageDecoder` for StableDiffusionV3 * Use `keras.Model` for `VAEImageDecoder` and follows the coding style in `VAEAttention`
Numerics check:
https://colab.research.google.com/drive/1YsWvZ0NBINDgdqipsldso1Y1NKJUDkUf?usp=sharing
Future works:
@divyashreepathihalli @mattdangerw @SamanehSaadat