Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZeRO blog post #71

Merged
merged 15 commits into from
Jan 19, 2021
Merged

ZeRO blog post #71

merged 15 commits into from
Jan 19, 2021

Conversation

stas00
Copy link
Contributor

@stas00 stas00 commented Jan 14, 2021

As requested on slack, here is a blog post that discusses the recently added integration of ZeRO via DeepSpeed and FairScale.

Please read the content and see if I have missed anything.

As before I have no idea how to make the thumbnail.

Please feel free to change the title and make suggestions.

@julien-c, @sgugger, @thomwolf, @LysandreJik

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic blog! It's really so impressive to manage to cram 20 batch size with t5-3b on a single GPU.

Left a few comments, but overall, looks awesome to me! Thanks for your work on this and for putting this blog together, it was a great read!

zero-deepspeed-fairscale.md Outdated Show resolved Hide resolved
zero-deepspeed-fairscale.md Outdated Show resolved Hide resolved
zero-deepspeed-fairscale.md Outdated Show resolved Hide resolved
zero-deepspeed-fairscale.md Outdated Show resolved Hide resolved
zero-deepspeed-fairscale.md Outdated Show resolved Hide resolved
zero-deepspeed-fairscale.md Outdated Show resolved Hide resolved
Copy link
Member

@thomwolf thomwolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really cool! I agree with the comments of Lysandre!

zero-deepspeed-fairscale.md Outdated Show resolved Hide resolved
zero-deepspeed-fairscale.md Outdated Show resolved Hide resolved
@julien-c
Copy link
Member

maybe slightly out of scope, but should we mention and/or research the AWS implementation of model parallelism in SageMaker Distributed?

or it could maybe be in a followup post (cc @n1t0 @philschmid)

Copy link
Contributor

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic! I left lots of small nits. The bigger issue is the big commands IMO. We shouild try a way to get them shorter or better presented.

zero-deepspeed-fairscale.md Outdated Show resolved Hide resolved
zero-deepspeed-fairscale.md Outdated Show resolved Hide resolved
zero-deepspeed-fairscale.md Outdated Show resolved Hide resolved
zero-deepspeed-fairscale.md Outdated Show resolved Hide resolved
zero-deepspeed-fairscale.md Outdated Show resolved Hide resolved
zero-deepspeed-fairscale.md Outdated Show resolved Hide resolved
zero-deepspeed-fairscale.md Outdated Show resolved Hide resolved
zero-deepspeed-fairscale.md Outdated Show resolved Hide resolved
zero-deepspeed-fairscale.md Outdated Show resolved Hide resolved
zero-deepspeed-fairscale.md Outdated Show resolved Hide resolved
stas00 and others added 3 commits January 14, 2021 09:28
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
@stas00
Copy link
Contributor Author

stas00 commented Jan 14, 2021

Thank you very much for taking the time to proofread and make suggestions, especially Sylvain - so much support!

Please have a look at how I dealt with the 3 cases of massive cl args sets - I simply left only the important parts, replaced the rest with [...] and pointed to the post where the full command is to be found.

I think everything else is good.

I'm just not sure how to make the thumbnail the way you need it - perhaps this could be automated, since this stage seems to be strange... maybe this process could be automated with imagemagick? So you have your desired background and font and we just tell imagemagick to generate the thumbnail with the title of the article. https://legacy.imagemagick.org/Usage/text/

@stas00
Copy link
Contributor Author

stas00 commented Jan 14, 2021

One note - I used CUDA_VISIBLE_DEVICES=0 to single out one gpu, but deepspeed has a bug now where it ignores that env var, so it'll be using 0th GPU even if CUDA_VISIBLE_DEVICES=1 microsoft/DeepSpeed#662 But hoping it will get fixed eventually.
There is probably no need to add noise to the blog post. I will update my initial comment instead.

@stas00
Copy link
Contributor Author

stas00 commented Jan 14, 2021

maybe slightly out of scope, but should we mention and/or research the AWS implementation of model parallelism in SageMaker Distributed?

If you feel it helps by all means add what needs to be added.

Ideally we should have a blog post dedicated to Model Parallelism where it'd fit naturally. It should discuss MP and PP (Pipeline Parallelism) as the latter solves the idling problem with the former.

I'd take some bits from these posts: huggingface/transformers#8771 (comment) and huggingface/transformers#8771 (comment) and expand on those.

Personally, I feel a need to implement PP first before I can write about it, so that I have a deep understanding. But others, of course, can do the writing as well, so please do what you think is the best.

I'm going to work on implementing PP for t5 or bart next.

Copy link
Contributor

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your changes look good to me @stas00 Thanks!

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing, very nicely written too!

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
zero-deepspeed-fairscale.md Outdated Show resolved Hide resolved
@stas00
Copy link
Contributor Author

stas00 commented Jan 18, 2021

Thank you, @julien-c

wrt to forwarding my PR from assets/9 to assets/11, why are we using a number assets/\d\d_ in first place? Why not match the asset to the blog post name? With just the name there is no need to try to figure out the next number if there are several in works... Also the asset folder is inconsistent with [-_] separator - sometimes it's the same as the blog post file, sometimes it's not. probably easier to stick to either always _ or - or the same blog-post file, which is also inconsistent.

@julien-c
Copy link
Member

Also the asset folder is inconsistent with [-_] separator - sometimes it's the same as the blog post file, sometimes it's not. probably easier to stick to either always _ or - or the same blog-post file, which is also inconsistent.

It used to be consistent... until someone made it inconsistent 😂

But anyway, yes, we can probably let go of the numbered prefix from now on if we want to. It's just a matter of personal taste.

Any ways... this is ready to merge on my side too.

@julien-c
Copy link
Member

Just me ping when you want to merge/publish!

zero-deepspeed-fairscale.md Outdated Show resolved Hide resolved
@stas00
Copy link
Contributor Author

stas00 commented Jan 18, 2021

@julien-c, I think this is good to publish. Thank you!

@julien-c julien-c merged commit 23ad8b1 into huggingface:master Jan 19, 2021
@stas00 stas00 deleted the zero branch January 19, 2021 17:26
xianbaoqian pushed a commit that referenced this pull request Dec 29, 2023
* Add: zh/gaussian-splatting.md & zh/reformer.md

* Lora-for-sequence-classification-with-Roberta-Llama-Mistral cn done (#66)

* Add: zh/ram-efficient-pytorch-fsdp.md

* Add: zh/pytorch-fsdp.md

* Add: zh/big-bird.md

* Add: zh/lcm_lora.md

* Add: zh/the_n_implementation_details_of_rlhf_with_ppo.md & zh/personal-copilot.md

* Add: zh/long-range-transformers.md

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

* Update: zh/long-range-transformers.md

* update zh/the n implementation details of rlhf

* update zh/personal copilot

---------

Signed-off-by: Yao Matrix <matrix.yao@intel.com>
Co-authored-by: Yao Matrix <yaoweifeng0301@126.com>
Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com>

* Update: zh/personal-copilot.md

* Update: zh/the_n_implementation_details_of_rlhf_with_ppo.md

Fix: wrong filename of zh/the_n_implementation_details_of_rlhf_with_ppo.md

* lcm_lora cn done

Signed-off-by: Yao Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao Matrix <matrix.yao@intel.com>
Co-authored-by: Yang Lee <45715979+innovation64@users.noreply.github.com>
Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com>

* Update: zh/lcm_lora.md

* Fix: zh/lcm_lora.md

* big-bird cn done

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao Matrix <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Yang Lee <45715979+innovation64@users.noreply.github.com>
Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com>

* Update: zh/big-bird.md

* pytorch-fsdp cn done

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao Matrix <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Yang Lee <45715979+innovation64@users.noreply.github.com>
Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com>

* Update: zh/pytorch-fsdp.md

* ram-efficient-pytorch-fsdp cn done

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

---------

Signed-off-by: Yao Matrix <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
Co-authored-by: Yang Lee <45715979+innovation64@users.noreply.github.com>
Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com>

* Update: zh/ram-efficient-pytorch-fsdp.md

* Lora-for-sequence-classification-with-Roberta-Llama-Mistral cn done

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao Matrix <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
Co-authored-by: Yang Lee <45715979+innovation64@users.noreply.github.com>
Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com>

* Update: zh/Lora-for-sequence-classification-with-Roberta-Llama-Mistral.md

* 1. gaussian-splatting cn done
2. reformer cn done

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

---------

Signed-off-by: Yao Matrix <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
Co-authored-by: Yang Lee <45715979+innovation64@users.noreply.github.com>
Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com>

* Update: zh/gaussian-splatting.md

* Update: zh/reformer.md

* Add: zh/moe.md

* add zh/moe.md

* add zh/moe.md

* add zh/moe.md

* continue trans

* continue trans

* continue trans

* continue trans

* continue trans

* continue trans

* continue trans

* continue trans

* continue trans

* continue trans

* continue trans

* Update: zh/moe.md

* Update: zh/moe.md

* Adding Zh translation to whisper-speculative-decoding.md

1. Adding Zh translation to zh\whisper-speculative-decoding.md
2. fix error code form in whisper-speculative-decoding.md

* Revert "Update: zh/moe.md"

This reverts commit 82ff6ad.

* fix commit to Update: zh/moe.md

fix file to commit 82ff6ad

* add zh/2023-in-llms.md (#71)

* format refine.

---------

Signed-off-by: Yao Matrix <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
Co-authored-by: Yao Matrix <yaoweifeng0301@126.com>
Co-authored-by: Yang Lee <45715979+innovation64@users.noreply.github.com>
Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com>
Co-authored-by: Xinyu Yang <cauyxy@163.com>
Co-authored-by: 1375626371 <1375626371@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants