Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transfer Learning fails when training conditional model based on dataset labels #98

Open
ageroul opened this issue Apr 28, 2021 · 10 comments

Comments

@ageroul
Copy link

ageroul commented Apr 28, 2021

Hi,
I have prepared my dataset according to dataset_tool.py. Dimensions are 256x256 and has 5 classes(labels). The dataset.json file is also fine. Here is the problem:
When running python train.py and my Transfer Learning source network is ffhq256 the execution fails pretty soon (in the beginning of "Constructing networks") with this error:
RuntimeError: The size of tensor a (1024) must match the size of tensor b (512) at non-singleton dimension 1
When I run the same code but with the option cond='False' (ignore the dataset labels) the problem disappears and the transfer learning continues without error.
What is the problem here?
Thanks in advance!
PS: I also tried ffhq512 (with option cond="True") but then I get error again: RuntimeError: The size of tensor a (256) must match the size of tensor b (512) at non-singleton dimension 0

@wdf19961118
Copy link

You can set --gpus=1 and try again, I see that training_set_iterator = iter(torch.utils.data.DataLoader(dataset=training_set, sampler=training_set_sampler, batch_size=batch_size//num_gpus, **data_loader_kwargs)) in training_loop.py, maybe it is the reason for your error. Good luck!

@ageroul
Copy link
Author

ageroul commented Apr 29, 2021

You can set --gpus=1 and try again, I see that training_set_iterator = iter(torch.utils.data.DataLoader(dataset=training_set, sampler=training_set_sampler, batch_size=batch_size//num_gpus, **data_loader_kwargs)) in training_loop.py, maybe it is the reason for your error. Good luck!

Thanks for the answer,
Unfortunately this is not the issue as I already set the option --gpus=1 in train.py.

@chengkeng
Copy link

I also encountered the same situation.

@chengkeng
Copy link

This must be re-trained, remove "--resume=xxx"

@ageroul
Copy link
Author

ageroul commented Apr 30, 2021

This must be re-trained, remove "--resume=xxx"

If it "must" be retrained then there is no transfer learning happening...

@wdf19961118
Copy link

You want to train a conditional model initialized by unconditional model(ffhq256), right? However, the structure of conditional model is different from unconditional model. You can print the structure and see that.

@Gass2109
Copy link

Because the conditional model takes as input the concatenation (in the first dimension) of the label features (bs, 256) and the latent code (bs, 256), which gives a tensor of shape (bs, 512). However, the unconditional model takes only the latent representation (bs, 256). hope that helps :)

@thusinh1969
Copy link

I closed my question because this is the reason !

Steve

@wenhaoyong
Copy link

I encountered a similar problem and I fixed it with the option' cond="True" '. Thx.

@49xxy
Copy link

49xxy commented Sep 6, 2022

我遇到了类似的情况,我用选项' cond="True" '修复了它。谢谢。

How did you solve it? I sincerely hope to get your help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants