Adding Megatron models. #9560

Narsil · 2021-01-13T08:55:36Z

🌟 New model addition

Is it feasible to add Megatron models ? It seems the architecture is really just a GPT2, most of the work should be in creating the config, fusing layers from the available weights here: https://github.com/pytorch/fairseq/tree/master/examples/megatron_11b and making them available.

There are Nvidia's megatron (Bert and Gpt variants) and Facebook-11b megatron (gpt variant)

If we stick to that then we can't run the model on a single GPU, so we should probably make sure this is compatible with:

Is keeping the current GPT2 architecture and using deepspeed's ZeRo and other parallelism schemes without touching original implementation feasible?

Model description

https://github.com/pytorch/fairseq/blob/e3c4282551e819853952284681e9ed60398c5c4a/examples/megatron_11b/README.md

Open source status

the model implementation is available: https://github.com/ngoyal2707/Megatron-LM/blob/adb23324c222aad0aad89308e70302d996a5eaeb/mpu/transformer.py (Most of the work seems to be on Matrix parallelization)
the model weights are available: https://dl.fbaipublicfiles.com/fairseq/models/model_parallel/megatron_11b.tar.gz (Megatron 11b), https://github.com/NVIDIA/Megatron-LM#downloading-checkpoints (Nvidia's version, 3b and 8.3b don't seem to be available)
who are the authors: (mention them, if possible by @gh-username) Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, Bryan Catanzaro https://arxiv.org/abs/1909.08053

https://developer.nvidia.com/blog/language-modeling-using-megatron-a100-gpu/

@stas00 @patrickvonplaten

stas00 · 2021-01-13T18:17:54Z

Since DeepSpeed both integrates and uses Megatron-LM almost everywhere in its tutorials it most likely should just work. Of course, the devil is in the detail.

As I haven't had a chance to study/work with GPT2 yet, I will let others comment on the more important part of your query.

* Add modeling_mlm_gpt2

* Add modeling_mlm_gpt2 init

* Add modeling_mlm_gpt2

jordiae · 2022-07-05T14:45:11Z

Any plans of adding MegatronT5? (https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/model/t5_model.py)

stas00 · 2022-07-05T15:55:13Z

As this is a really old thread, perhaps make a request in a new Issue, @jordiae?

And of course, if you're interested you're more than welcome to try and add it yourself. This is of course only an invitation.

jordiae · 2022-07-07T15:57:21Z

As this is a really old thread, perhaps make a request in a new Issue, @jordiae?

And of course, if you're interested you're more than welcome to try and add it yourself. This is of course only an invitation.

Got it! Posted here because the issue was open. Thanks.

Narsil · 2022-07-18T17:09:49Z

Will close this issue as it's really kind of outdated.

Narsil added the New model label Jan 13, 2021

anton-l mentioned this issue Feb 20, 2021

[WIP] Add Megatron-11B #10301

Closed

5 tasks

soeque1 added a commit to soeque1/transformers that referenced this issue Feb 20, 2021

[Megatron-LM] Add megatron-lm gpt2 (huggingface#9560)

286da7f

* Add modeling_mlm_gpt2

soeque1 added a commit to soeque1/transformers that referenced this issue Feb 22, 2021

[Megatron-LM] Add megatron-lm gpt2 (huggingface#9560)

245c99c

* Add modeling_mlm_gpt2 init

soeque1 added a commit to soeque1/transformers that referenced this issue Feb 22, 2021

[Megatron-LM] Add megatron-lm gpt2 (huggingface#9560)

95fef40

* Add modeling_mlm_gpt2

Narsil closed this as completed Jul 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Megatron models. #9560

Adding Megatron models. #9560

Narsil commented Jan 13, 2021

stas00 commented Jan 13, 2021 •

edited

Loading

jordiae commented Jul 5, 2022

stas00 commented Jul 5, 2022

jordiae commented Jul 7, 2022

Narsil commented Jul 18, 2022

Adding Megatron models. #9560

Adding Megatron models. #9560

Comments

Narsil commented Jan 13, 2021

🌟 New model addition

Model description

Open source status

stas00 commented Jan 13, 2021 • edited Loading

jordiae commented Jul 5, 2022

stas00 commented Jul 5, 2022

jordiae commented Jul 7, 2022

Narsil commented Jul 18, 2022

stas00 commented Jan 13, 2021 •

edited

Loading