-
Notifications
You must be signed in to change notification settings - Fork 26.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Megatron models. #9560
Comments
Since DeepSpeed both integrates and uses Megatron-LM almost everywhere in its tutorials it most likely should just work. Of course, the devil is in the detail. As I haven't had a chance to study/work with GPT2 yet, I will let others comment on the more important part of your query. |
* Add modeling_mlm_gpt2
* Add modeling_mlm_gpt2 init
* Add modeling_mlm_gpt2
Any plans of adding MegatronT5? (https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/model/t5_model.py) |
As this is a really old thread, perhaps make a request in a new Issue, @jordiae? And of course, if you're interested you're more than welcome to try and add it yourself. This is of course only an invitation. |
Got it! Posted here because the issue was open. Thanks. |
Will close this issue as it's really kind of outdated. |
🌟 New model addition
Is it feasible to add Megatron models ? It seems the architecture is really just a GPT2, most of the work should be in creating the config, fusing layers from the available weights here: https://github.com/pytorch/fairseq/tree/master/examples/megatron_11b and making them available.
There are Nvidia's megatron (Bert and Gpt variants) and Facebook-11b megatron (gpt variant)
If we stick to that then we can't run the model on a single GPU, so we should probably make sure this is compatible with:
Is keeping the current GPT2 architecture and using deepspeed's ZeRo and other parallelism schemes without touching original implementation feasible?
Model description
https://github.com/pytorch/fairseq/blob/e3c4282551e819853952284681e9ed60398c5c4a/examples/megatron_11b/README.md
Open source status
https://developer.nvidia.com/blog/language-modeling-using-megatron-a100-gpu/
@stas00 @patrickvonplaten
The text was updated successfully, but these errors were encountered: