Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid training loss #31

Closed
hshiah opened this issue Feb 11, 2023 · 5 comments
Closed

Invalid training loss #31

hshiah opened this issue Feb 11, 2023 · 5 comments

Comments

@hshiah
Copy link

hshiah commented Feb 11, 2023

The training loss on brats2020 of new version is usually nan.

微信图片_20230211123834

@hshiah
Copy link
Author

hshiah commented Feb 11, 2023

When the loss is not NAN, the grad_norm is extremely large like 7.44e+04, while the previous version is usually around 10.
May I ask the reason? I train the model on raw brats2020 training data.

@WuJunde
Copy link
Collaborator

WuJunde commented Feb 11, 2023

I fixed the bug, please update the project and try again.

@hshiah
Copy link
Author

hshiah commented Feb 12, 2023

Hi, I tried the newest version and the model is stuck at training stage. I checked the GPU memory usage and it keeps a small value (around 2500 MiB) instead of normal value.
image

@WuJunde
Copy link
Collaborator

WuJunde commented Feb 13, 2023

@hshiah I checked it again, it works fine in my workplace. Did you run it on GPU? You may need to add --gpu 0.

@hshiah
Copy link
Author

hshiah commented Feb 13, 2023 via email

@WuJunde WuJunde closed this as completed Feb 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants