ZeRO blog post #71

stas00 · 2021-01-14T03:24:50Z

As requested on slack, here is a blog post that discusses the recently added integration of ZeRO via DeepSpeed and FairScale.

Please read the content and see if I have missed anything.

As before I have no idea how to make the thumbnail.

Please feel free to change the title and make suggestions.

@julien-c, @sgugger, @thomwolf, @LysandreJik

LysandreJik

Fantastic blog! It's really so impressive to manage to cram 20 batch size with t5-3b on a single GPU.

Left a few comments, but overall, looks awesome to me! Thanks for your work on this and for putting this blog together, it was a great read!

zero-deepspeed-fairscale.md

thomwolf

This looks really cool! I agree with the comments of Lysandre!

zero-deepspeed-fairscale.md

julien-c · 2021-01-14T10:13:41Z

maybe slightly out of scope, but should we mention and/or research the AWS implementation of model parallelism in SageMaker Distributed?

or it could maybe be in a followup post (cc @n1t0 @philschmid)

sgugger

This is fantastic! I left lots of small nits. The bigger issue is the big commands IMO. We shouild try a way to get them shorter or better presented.

zero-deepspeed-fairscale.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

stas00 · 2021-01-14T17:57:36Z

Thank you very much for taking the time to proofread and make suggestions, especially Sylvain - so much support!

Please have a look at how I dealt with the 3 cases of massive cl args sets - I simply left only the important parts, replaced the rest with [...] and pointed to the post where the full command is to be found.

I think everything else is good.

I'm just not sure how to make the thumbnail the way you need it - perhaps this could be automated, since this stage seems to be strange... maybe this process could be automated with imagemagick? So you have your desired background and font and we just tell imagemagick to generate the thumbnail with the title of the article. https://legacy.imagemagick.org/Usage/text/

stas00 · 2021-01-14T18:11:44Z

One note - I used CUDA_VISIBLE_DEVICES=0 to single out one gpu, but deepspeed has a bug now where it ignores that env var, so it'll be using 0th GPU even if CUDA_VISIBLE_DEVICES=1 microsoft/DeepSpeed#662 But hoping it will get fixed eventually.
There is probably no need to add noise to the blog post. I will update my initial comment instead.

stas00 · 2021-01-14T18:20:29Z

maybe slightly out of scope, but should we mention and/or research the AWS implementation of model parallelism in SageMaker Distributed?

If you feel it helps by all means add what needs to be added.

Ideally we should have a blog post dedicated to Model Parallelism where it'd fit naturally. It should discuss MP and PP (Pipeline Parallelism) as the latter solves the idling problem with the former.

I'd take some bits from these posts: huggingface/transformers#8771 (comment) and huggingface/transformers#8771 (comment) and expand on those.

Personally, I feel a need to implement PP first before I can write about it, so that I have a deep understanding. But others, of course, can do the writing as well, so please do what you think is the best.

I'm going to work on implementing PP for t5 or bart next.

sgugger

Your changes look good to me @stas00 Thanks!

zero-deepspeed-fairscale.md

patrickvonplaten

Amazing, very nicely written too!

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

zero-deepspeed-fairscale.md

@stas00

@stas00 you can just use this asset as thumbnail, that'll work

stas00 · 2021-01-18T16:12:38Z

Thank you, @julien-c

wrt to forwarding my PR from assets/9 to assets/11, why are we using a number assets/\d\d_ in first place? Why not match the asset to the blog post name? With just the name there is no need to try to figure out the next number if there are several in works... Also the asset folder is inconsistent with [-_] separator - sometimes it's the same as the blog post file, sometimes it's not. probably easier to stick to either always _ or - or the same blog-post file, which is also inconsistent.

julien-c · 2021-01-18T20:58:01Z

Also the asset folder is inconsistent with [-_] separator - sometimes it's the same as the blog post file, sometimes it's not. probably easier to stick to either always _ or - or the same blog-post file, which is also inconsistent.

It used to be consistent... until someone made it inconsistent 😂

But anyway, yes, we can probably let go of the numbered prefix from now on if we want to. It's just a matter of personal taste.

Any ways... this is ready to merge on my side too.

julien-c · 2021-01-18T20:58:40Z

Just me ping when you want to merge/publish!

zero-deepspeed-fairscale.md

stas00 · 2021-01-18T23:20:02Z

@julien-c, I think this is good to publish. Thank you!

* Add: zh/gaussian-splatting.md & zh/reformer.md * Lora-for-sequence-classification-with-Roberta-Llama-Mistral cn done (#66) * Add: zh/ram-efficient-pytorch-fsdp.md * Add: zh/pytorch-fsdp.md * Add: zh/big-bird.md * Add: zh/lcm_lora.md * Add: zh/the_n_implementation_details_of_rlhf_with_ppo.md & zh/personal-copilot.md * Add: zh/long-range-transformers.md Signed-off-by: Yao Matrix <matrix.yao@intel.com> * Update: zh/long-range-transformers.md * update zh/the n implementation details of rlhf * update zh/personal copilot --------- Signed-off-by: Yao Matrix <matrix.yao@intel.com> Co-authored-by: Yao Matrix <yaoweifeng0301@126.com> Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com> * Update: zh/personal-copilot.md * Update: zh/the_n_implementation_details_of_rlhf_with_ppo.md Fix: wrong filename of zh/the_n_implementation_details_of_rlhf_with_ppo.md * lcm_lora cn done Signed-off-by: Yao Matrix <matrix.yao@intel.com> --------- Signed-off-by: Yao Matrix <matrix.yao@intel.com> Co-authored-by: Yang Lee <45715979+innovation64@users.noreply.github.com> Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com> * Update: zh/lcm_lora.md * Fix: zh/lcm_lora.md * big-bird cn done Signed-off-by: YAO Matrix <matrix.yao@intel.com> --------- Signed-off-by: Yao Matrix <matrix.yao@intel.com> Signed-off-by: YAO Matrix <matrix.yao@intel.com> Co-authored-by: Yang Lee <45715979+innovation64@users.noreply.github.com> Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com> * Update: zh/big-bird.md * pytorch-fsdp cn done Signed-off-by: YAO Matrix <matrix.yao@intel.com> --------- Signed-off-by: Yao Matrix <matrix.yao@intel.com> Signed-off-by: YAO Matrix <matrix.yao@intel.com> Co-authored-by: Yang Lee <45715979+innovation64@users.noreply.github.com> Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com> * Update: zh/pytorch-fsdp.md * ram-efficient-pytorch-fsdp cn done Signed-off-by: Matrix YAO <matrix.yao@intel.com> --------- Signed-off-by: Yao Matrix <matrix.yao@intel.com> Signed-off-by: YAO Matrix <matrix.yao@intel.com> Signed-off-by: Matrix YAO <matrix.yao@intel.com> Co-authored-by: Yang Lee <45715979+innovation64@users.noreply.github.com> Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com> * Update: zh/ram-efficient-pytorch-fsdp.md * Lora-for-sequence-classification-with-Roberta-Llama-Mistral cn done Signed-off-by: YAO Matrix <matrix.yao@intel.com> --------- Signed-off-by: Yao Matrix <matrix.yao@intel.com> Signed-off-by: YAO Matrix <matrix.yao@intel.com> Signed-off-by: Matrix YAO <matrix.yao@intel.com> Co-authored-by: Yang Lee <45715979+innovation64@users.noreply.github.com> Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com> * Update: zh/Lora-for-sequence-classification-with-Roberta-Llama-Mistral.md * 1. gaussian-splatting cn done 2. reformer cn done Signed-off-by: Matrix YAO <matrix.yao@intel.com> --------- Signed-off-by: Yao Matrix <matrix.yao@intel.com> Signed-off-by: YAO Matrix <matrix.yao@intel.com> Signed-off-by: Matrix YAO <matrix.yao@intel.com> Co-authored-by: Yang Lee <45715979+innovation64@users.noreply.github.com> Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com> * Update: zh/gaussian-splatting.md * Update: zh/reformer.md * Add: zh/moe.md * add zh/moe.md * add zh/moe.md * add zh/moe.md * continue trans * continue trans * continue trans * continue trans * continue trans * continue trans * continue trans * continue trans * continue trans * continue trans * continue trans * Update: zh/moe.md * Update: zh/moe.md * Adding Zh translation to whisper-speculative-decoding.md 1. Adding Zh translation to zh\whisper-speculative-decoding.md 2. fix error code form in whisper-speculative-decoding.md * Revert "Update: zh/moe.md" This reverts commit 82ff6ad. * fix commit to Update: zh/moe.md fix file to commit 82ff6ad * add zh/2023-in-llms.md (#71) * format refine. --------- Signed-off-by: Yao Matrix <matrix.yao@intel.com> Signed-off-by: YAO Matrix <matrix.yao@intel.com> Signed-off-by: Matrix YAO <matrix.yao@intel.com> Co-authored-by: Yao Matrix <yaoweifeng0301@126.com> Co-authored-by: Yang Lee <45715979+innovation64@users.noreply.github.com> Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com> Co-authored-by: Xinyu Yang <cauyxy@163.com> Co-authored-by: 1375626371 <1375626371@qq.com>

stas00 added 2 commits January 13, 2021 19:22

zero blog post

172bfdd

more resources

1ff3ee7

LysandreJik approved these changes Jan 14, 2021

View reviewed changes

thomwolf reviewed Jan 14, 2021

View reviewed changes

zero-deepspeed-fairscale.md Outdated Show resolved Hide resolved

zero-deepspeed-fairscale.md Outdated Show resolved Hide resolved

sgugger approved these changes Jan 14, 2021

View reviewed changes

stas00 and others added 3 commits January 14, 2021 09:28

Apply suggestions from code review

4231f97

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

fixes

97153d3

Update zero-deepspeed-fairscale.md

9355ff4

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

sgugger approved these changes Jan 14, 2021

View reviewed changes

patrickvonplaten reviewed Jan 15, 2021

View reviewed changes

zero-deepspeed-fairscale.md Outdated Show resolved Hide resolved

patrickvonplaten reviewed Jan 15, 2021

View reviewed changes

zero-deepspeed-fairscale.md Outdated Show resolved Hide resolved

patrickvonplaten reviewed Jan 15, 2021

View reviewed changes

zero-deepspeed-fairscale.md Outdated Show resolved Hide resolved

patrickvonplaten reviewed Jan 15, 2021

View reviewed changes

zero-deepspeed-fairscale.md Show resolved Hide resolved

patrickvonplaten approved these changes Jan 15, 2021

View reviewed changes

Update zero-deepspeed-fairscale.md

09e31d1

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

samyam reviewed Jan 15, 2021

View reviewed changes

zero-deepspeed-fairscale.md Outdated Show resolved Hide resolved

stas00 mentioned this pull request Jan 16, 2021

A request for clarity around 3D Parallelism in DeepSpeed microsoft/DeepSpeed#673

Open

stas00 and others added 6 commits January 15, 2021 17:23

rewrite the section on ZeRO - thanks @jeffra and @samyam

6e684e9

tweaks

23a804c

tweaks

ae707c5

tweaks

48e5514

Tweaks + thumbnail

1e8411b

@stas00 you can just use this asset as thumbnail, that'll work

Merge remote-tracking branch 'origin/master' into zero

da77f6d

samyam reviewed Jan 18, 2021

View reviewed changes

zero-deepspeed-fairscale.md Outdated Show resolved Hide resolved

suggestion from @samyam

f7f62f6

stas00 added 2 commits January 18, 2021 18:58

guess the date

7125318

guess the date

0b5dffc

julien-c merged commit 23ad8b1 into huggingface:master Jan 19, 2021

stas00 deleted the zero branch January 19, 2021 17:26

blefaudeux mentioned this pull request Jan 20, 2021

Model Parallelism and Big Models huggingface/transformers#8771

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZeRO blog post #71

ZeRO blog post #71

stas00 commented Jan 14, 2021

LysandreJik left a comment

thomwolf left a comment

julien-c commented Jan 14, 2021

sgugger left a comment

stas00 commented Jan 14, 2021 •

edited

Loading

stas00 commented Jan 14, 2021 •

edited

Loading

stas00 commented Jan 14, 2021 •

edited

Loading

sgugger left a comment

patrickvonplaten left a comment

stas00 commented Jan 18, 2021 •

edited

Loading

julien-c commented Jan 18, 2021

julien-c commented Jan 18, 2021

stas00 commented Jan 18, 2021

ZeRO blog post #71

ZeRO blog post #71

Conversation

stas00 commented Jan 14, 2021

LysandreJik left a comment

Choose a reason for hiding this comment

thomwolf left a comment

Choose a reason for hiding this comment

julien-c commented Jan 14, 2021

sgugger left a comment

Choose a reason for hiding this comment

stas00 commented Jan 14, 2021 • edited Loading

stas00 commented Jan 14, 2021 • edited Loading

stas00 commented Jan 14, 2021 • edited Loading

sgugger left a comment

Choose a reason for hiding this comment

patrickvonplaten left a comment

Choose a reason for hiding this comment

stas00 commented Jan 18, 2021 • edited Loading

julien-c commented Jan 18, 2021

julien-c commented Jan 18, 2021

stas00 commented Jan 18, 2021

stas00 commented Jan 14, 2021 •

edited

Loading

stas00 commented Jan 14, 2021 •

edited

Loading

stas00 commented Jan 14, 2021 •

edited

Loading

stas00 commented Jan 18, 2021 •

edited

Loading