Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Megatron models. #9560

Closed
3 tasks done
Narsil opened this issue Jan 13, 2021 · 5 comments
Closed
3 tasks done

Adding Megatron models. #9560

Narsil opened this issue Jan 13, 2021 · 5 comments

Comments

@Narsil
Copy link
Contributor

Narsil commented Jan 13, 2021

🌟 New model addition

Is it feasible to add Megatron models ? It seems the architecture is really just a GPT2, most of the work should be in creating the config, fusing layers from the available weights here: https://github.com/pytorch/fairseq/tree/master/examples/megatron_11b and making them available.

There are Nvidia's megatron (Bert and Gpt variants) and Facebook-11b megatron (gpt variant)

If we stick to that then we can't run the model on a single GPU, so we should probably make sure this is compatible with:

Is keeping the current GPT2 architecture and using deepspeed's ZeRo and other parallelism schemes without touching original implementation feasible?

Model description

https://github.com/pytorch/fairseq/blob/e3c4282551e819853952284681e9ed60398c5c4a/examples/megatron_11b/README.md

Open source status

https://developer.nvidia.com/blog/language-modeling-using-megatron-a100-gpu/

@stas00 @patrickvonplaten

@stas00
Copy link
Contributor

stas00 commented Jan 13, 2021

Since DeepSpeed both integrates and uses Megatron-LM almost everywhere in its tutorials it most likely should just work. Of course, the devil is in the detail.

As I haven't had a chance to study/work with GPT2 yet, I will let others comment on the more important part of your query.

@anton-l anton-l mentioned this issue Feb 20, 2021
5 tasks
soeque1 added a commit to soeque1/transformers that referenced this issue Feb 20, 2021
soeque1 added a commit to soeque1/transformers that referenced this issue Feb 22, 2021
soeque1 added a commit to soeque1/transformers that referenced this issue Feb 22, 2021
@jordiae
Copy link

jordiae commented Jul 5, 2022

@stas00
Copy link
Contributor

stas00 commented Jul 5, 2022

As this is a really old thread, perhaps make a request in a new Issue, @jordiae?

And of course, if you're interested you're more than welcome to try and add it yourself. This is of course only an invitation.

@jordiae
Copy link

jordiae commented Jul 7, 2022

As this is a really old thread, perhaps make a request in a new Issue, @jordiae?

And of course, if you're interested you're more than welcome to try and add it yourself. This is of course only an invitation.

Got it! Posted here because the issue was open. Thanks.

@Narsil
Copy link
Contributor Author

Narsil commented Jul 18, 2022

Will close this issue as it's really kind of outdated.

@Narsil Narsil closed this as completed Jul 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants