Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not transfer-learning with different number of "classes" #156

Open
thusinh1969 opened this issue Aug 5, 2021 · 5 comments
Open

Can not transfer-learning with different number of "classes" #156

thusinh1969 opened this issue Aug 5, 2021 · 5 comments

Comments

@thusinh1969
Copy link

thusinh1969 commented Aug 5, 2021

Describe the bug
Do not seem to be able to transfer learning from my own pretrained model (both are conditional-training models). The pretrained model has 20 "conditional classes" and was performing well. I then tried to use the same model to transfer learning to another dataset but with 34 "conditional classes" and got errors:

Resuming from "./results/CHECKPOINT/network-snapshot-001400.pkl"
Traceback (most recent call last):
File "train_GPU_0.py", line 547, in
main() # pylint: disable=no-value-for-parameter
File "C:\ProgramData\Anaconda3\lib\site-packages\click\core.py", line 829, in call
return self.main(*args, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\click\core.py", line 782, in main
rv = self.invoke(ctx)
File "C:\ProgramData\Anaconda3\lib\site-packages\click\core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\ProgramData\Anaconda3\lib\site-packages\click\core.py", line 610, in invoke
return callback(*args, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\click\decorators.py", line 21, in new_func
return f(get_current_context(), *args, **kwargs)
File "train_GPU_0.py", line 540, in main
subprocess_fn(rank=0, args=args, temp_dir=temp_dir)
File "train_GPU_0.py", line 389, in subprocess_fn
training_loop.training_loop(rank=rank, **args)
File "D:\AI\Furnitures\dataset_AA\GAN_DATA_for_training\GAN_data_NEW_COMBINED_HOUSE_ROOM\Individual_Style_to_Context_dataset_corrected\Contemporary\StyleGANV2-pytorch\training\training_loop.py", line 163, in training_loop
misc.copy_params_and_buffers(resume_data[name], module, require_all=False)
File "D:\AI\Furnitures\dataset_AA\GAN_DATA_for_training\GAN_data_NEW_COMBINED_HOUSE_ROOM\Individual_Style_to_Context_dataset_corrected\Contemporary\StyleGANV2-pytorch\torch_utils\misc.py", line 160, in copy_params_and_buffers
tensor.copy
(src_tensors[name].detach()).requires_grad_(tensor.requires_grad)
RuntimeError: The size of tensor a (34) must match the size of tensor b (20) at non-singleton dimension 1_

To Reproduce
Have tried a few time with different dataset, same kind of error.

I though that we should be able to transfer learning regardless of number of class to take advantages of the pretrained weights for the most of network ?

Any help is highly appreciated.
Steve

@thusinh1969
Copy link
Author

thusinh1969 commented Aug 5, 2021

Answer in #98 already but for non-conditioning vs conditioning. Any idea how to transfer learning between 2 conditioning-models ?

Would something like this work? Where do we apply this change ?
(from Lightning-AI/pytorch-lightning#4690 (comment))

_def on_load_checkpoint(self, checkpoint: dict) -> None:
state_dict = checkpoint["state_dict"]
model_state_dict = self.state_dict()
is_changed = False
for k in state_dict:
if k in model_state_dict:
if state_dict[k].shape != model_state_dict[k].shape:
logger.info(f"Skip loading parameter: {k}, "
f"required shape: {model_state_dict[k].shape}, "
f"loaded shape: {state_dict[k].shape}")
state_dict[k] = model_state_dict[k]
is_changed = True
else:
logger.info(f"Dropping parameter {k}")
is_changed = True

    if is_changed:
        checkpoint.pop("optimizer_states", None)_

Thanks,
Steve

@Gass2109
Copy link

Gass2109 commented Aug 6, 2021

We can transfer learning between 2 conditional models with different number of classes, but in this case we will not copy the parameters of the embedding layer "mapping.embed" in G and D (its shape depends on the number of classes taken as an input). For this, you need to modify the function "copy_params_and_buffers" in "torch_utils/misc.py" in such a way that it does not copy all the parameters of the pretrained model.
For example, you can use

if name in src_tensors and "embed" not in name:   
        tensor.copy_(src_tensors[name].detach()).requires_grad_(tensor.requires_grad)

@thusinh1969
Copy link
Author

thusinh1969 commented Aug 6, 2021

https://github.com/Gass2109 I have changed the code into what you proposed, and got errors:
def copy_params_and_buffers(src_module, dst_module, require_all=False):
assert isinstance(src_module, torch.nn.Module)
assert isinstance(dst_module, torch.nn.Module)
src_tensors = {name: tensor for name, tensor in named_params_and_buffers(src_module)}
for name, tensor in named_params_and_buffers(dst_module):
assert (name in src_tensors) or (not require_all)
if name in src_tensors and "embed" not in name:
tensor.copy_(src_tensors[name].detach()).requires_grad_(tensor.requires_grad)

------------ ERROR ---------------

File ".\training\networks.py", line 602, in forward
y = x.reshape(G, -1, F, c, H, W) # [GnFcHW] Split minibatch N into n groups of size G, and channels C into F groups of size c.
RuntimeError: shape '[8, -1, 1, 512, 4, 4]' is invalid for input of size 163840

Any hint please.

Thanks,
Steve

@thusinh1969
Copy link
Author

thusinh1969 commented Aug 6, 2021

https://github.com/Gass2109 I have changed the code into what you proposed, and got errors:
def copy_params_and_buffers(src_module, dst_module, require_all=False):
assert isinstance(src_module, torch.nn.Module)
assert isinstance(dst_module, torch.nn.Module)
src_tensors = {name: tensor for name, tensor in named_params_and_buffers(src_module)}
for name, tensor in named_params_and_buffers(dst_module):
assert (name in src_tensors) or (not require_all)
if name in src_tensors and "embed" not in name:
tensor.copy_(src_tensors[name].detach()).requires_grad_(tensor.requires_grad)

------------ ERROR ---------------

File ".\training\networks.py", line 602, in forward
y = x.reshape(G, -1, F, c, H, W) # [GnFcHW] Split minibatch N into n groups of size G, and channels C into F groups of size c.
RuntimeError: shape '[8, -1, 1, 512, 4, 4]' is invalid for input of size 163840

Any hint please.

Thanks,
Steve

My BAD !!! I screwed up batch size to 20, should divide by 8, change it to 24 and it works now, thank you.

Steve

@MationPlays
Copy link

Answer in #98 already but for non-conditioning vs conditioning. Any idea how to transfer learning between 2 conditioning-models ?

Would something like this work? Where do we apply this change ? (from PyTorchLightning/pytorch-lightning#4690 (comment))

_def on_load_checkpoint(self, checkpoint: dict) -> None: state_dict = checkpoint["state_dict"] model_state_dict = self.state_dict() is_changed = False for k in state_dict: if k in model_state_dict: if state_dict[k].shape != model_state_dict[k].shape: logger.info(f"Skip loading parameter: {k}, " f"required shape: {model_state_dict[k].shape}, " f"loaded shape: {state_dict[k].shape}") state_dict[k] = model_state_dict[k] is_changed = True else: logger.info(f"Dropping parameter {k}") is_changed = True

    if is_changed:
        checkpoint.pop("optimizer_states", None)_

Thanks, Steve

have you managed to implement stylegan2 into lightning? Do you have a repo for this? This would be great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants