Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do parameter count calculations in 64 bits to not overflow in case of… #367

Merged
merged 2 commits into from
Sep 1, 2023

Conversation

janimo
Copy link
Contributor

@janimo janimo commented Aug 27, 2023

… very large models

This is based on discussions and code in #111, #154 and #164 so all credits to their authors, and is tested with the 13B and 34B code llama models. It seems to be the minimal amount of conceptual change needed.
The alternative to declaring the new n_layers variable are explicit (unsigned long)p->n_layers casts in 5 of the multiplications.

The p->n_layers*p->n_dim could also be precomputed in unsigned long and reused since it appears in all multiplications, but it would lose the pedagogical value of having the order of operations reflect the order of matrices in the cases where dim comes last.

@kroggen
Copy link
Contributor

kroggen commented Aug 29, 2023

Just a reminder that sizeof(long) on MSVC and MinGW is 32-bit

@janimo
Copy link
Contributor Author

janimo commented Aug 29, 2023

@kroggen thank you, updated to unsigned long long.

@karpathy karpathy merged commit b9fb861 into karpathy:master Sep 1, 2023
6 checks passed
@karpathy
Copy link
Owner

karpathy commented Sep 1, 2023

thank you @janimo

vinhtran2611 pushed a commit to vinhtran2611/llama2.c that referenced this pull request Jan 20, 2024
Do parameter count calculations in 64 bits to not overflow in case of…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants