-
Notifications
You must be signed in to change notification settings - Fork 708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZeRO blog post #71
ZeRO blog post #71
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic blog! It's really so impressive to manage to cram 20 batch size with t5-3b on a single GPU.
Left a few comments, but overall, looks awesome to me! Thanks for your work on this and for putting this blog together, it was a great read!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really cool! I agree with the comments of Lysandre!
maybe slightly out of scope, but should we mention and/or research the AWS implementation of model parallelism in SageMaker Distributed? or it could maybe be in a followup post (cc @n1t0 @philschmid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fantastic! I left lots of small nits. The bigger issue is the big commands IMO. We shouild try a way to get them shorter or better presented.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Thank you very much for taking the time to proofread and make suggestions, especially Sylvain - so much support! Please have a look at how I dealt with the 3 cases of massive cl args sets - I simply left only the important parts, replaced the rest with I think everything else is good. I'm just not sure how to make the thumbnail the way you need it - perhaps this could be automated, since this stage seems to be strange... maybe this process could be automated with imagemagick? So you have your desired background and font and we just tell imagemagick to generate the thumbnail with the title of the article. https://legacy.imagemagick.org/Usage/text/ |
One note - I used |
If you feel it helps by all means add what needs to be added. Ideally we should have a blog post dedicated to Model Parallelism where it'd fit naturally. It should discuss MP and PP (Pipeline Parallelism) as the latter solves the idling problem with the former. I'd take some bits from these posts: huggingface/transformers#8771 (comment) and huggingface/transformers#8771 (comment) and expand on those. Personally, I feel a need to implement PP first before I can write about it, so that I have a deep understanding. But others, of course, can do the writing as well, so please do what you think is the best. I'm going to work on implementing PP for t5 or bart next. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your changes look good to me @stas00 Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing, very nicely written too!
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Thank you, @julien-c wrt to forwarding my PR from assets/9 to assets/11, why are we using a number |
It used to be consistent... until someone made it inconsistent 😂 But anyway, yes, we can probably let go of the numbered prefix from now on if we want to. It's just a matter of personal taste. Any ways... this is ready to merge on my side too. |
Just me ping when you want to merge/publish! |
@julien-c, I think this is good to publish. Thank you! |
* Add: zh/gaussian-splatting.md & zh/reformer.md * Lora-for-sequence-classification-with-Roberta-Llama-Mistral cn done (#66) * Add: zh/ram-efficient-pytorch-fsdp.md * Add: zh/pytorch-fsdp.md * Add: zh/big-bird.md * Add: zh/lcm_lora.md * Add: zh/the_n_implementation_details_of_rlhf_with_ppo.md & zh/personal-copilot.md * Add: zh/long-range-transformers.md Signed-off-by: Yao Matrix <matrix.yao@intel.com> * Update: zh/long-range-transformers.md * update zh/the n implementation details of rlhf * update zh/personal copilot --------- Signed-off-by: Yao Matrix <matrix.yao@intel.com> Co-authored-by: Yao Matrix <yaoweifeng0301@126.com> Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com> * Update: zh/personal-copilot.md * Update: zh/the_n_implementation_details_of_rlhf_with_ppo.md Fix: wrong filename of zh/the_n_implementation_details_of_rlhf_with_ppo.md * lcm_lora cn done Signed-off-by: Yao Matrix <matrix.yao@intel.com> --------- Signed-off-by: Yao Matrix <matrix.yao@intel.com> Co-authored-by: Yang Lee <45715979+innovation64@users.noreply.github.com> Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com> * Update: zh/lcm_lora.md * Fix: zh/lcm_lora.md * big-bird cn done Signed-off-by: YAO Matrix <matrix.yao@intel.com> --------- Signed-off-by: Yao Matrix <matrix.yao@intel.com> Signed-off-by: YAO Matrix <matrix.yao@intel.com> Co-authored-by: Yang Lee <45715979+innovation64@users.noreply.github.com> Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com> * Update: zh/big-bird.md * pytorch-fsdp cn done Signed-off-by: YAO Matrix <matrix.yao@intel.com> --------- Signed-off-by: Yao Matrix <matrix.yao@intel.com> Signed-off-by: YAO Matrix <matrix.yao@intel.com> Co-authored-by: Yang Lee <45715979+innovation64@users.noreply.github.com> Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com> * Update: zh/pytorch-fsdp.md * ram-efficient-pytorch-fsdp cn done Signed-off-by: Matrix YAO <matrix.yao@intel.com> --------- Signed-off-by: Yao Matrix <matrix.yao@intel.com> Signed-off-by: YAO Matrix <matrix.yao@intel.com> Signed-off-by: Matrix YAO <matrix.yao@intel.com> Co-authored-by: Yang Lee <45715979+innovation64@users.noreply.github.com> Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com> * Update: zh/ram-efficient-pytorch-fsdp.md * Lora-for-sequence-classification-with-Roberta-Llama-Mistral cn done Signed-off-by: YAO Matrix <matrix.yao@intel.com> --------- Signed-off-by: Yao Matrix <matrix.yao@intel.com> Signed-off-by: YAO Matrix <matrix.yao@intel.com> Signed-off-by: Matrix YAO <matrix.yao@intel.com> Co-authored-by: Yang Lee <45715979+innovation64@users.noreply.github.com> Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com> * Update: zh/Lora-for-sequence-classification-with-Roberta-Llama-Mistral.md * 1. gaussian-splatting cn done 2. reformer cn done Signed-off-by: Matrix YAO <matrix.yao@intel.com> --------- Signed-off-by: Yao Matrix <matrix.yao@intel.com> Signed-off-by: YAO Matrix <matrix.yao@intel.com> Signed-off-by: Matrix YAO <matrix.yao@intel.com> Co-authored-by: Yang Lee <45715979+innovation64@users.noreply.github.com> Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com> * Update: zh/gaussian-splatting.md * Update: zh/reformer.md * Add: zh/moe.md * add zh/moe.md * add zh/moe.md * add zh/moe.md * continue trans * continue trans * continue trans * continue trans * continue trans * continue trans * continue trans * continue trans * continue trans * continue trans * continue trans * Update: zh/moe.md * Update: zh/moe.md * Adding Zh translation to whisper-speculative-decoding.md 1. Adding Zh translation to zh\whisper-speculative-decoding.md 2. fix error code form in whisper-speculative-decoding.md * Revert "Update: zh/moe.md" This reverts commit 82ff6ad. * fix commit to Update: zh/moe.md fix file to commit 82ff6ad * add zh/2023-in-llms.md (#71) * format refine. --------- Signed-off-by: Yao Matrix <matrix.yao@intel.com> Signed-off-by: YAO Matrix <matrix.yao@intel.com> Signed-off-by: Matrix YAO <matrix.yao@intel.com> Co-authored-by: Yao Matrix <yaoweifeng0301@126.com> Co-authored-by: Yang Lee <45715979+innovation64@users.noreply.github.com> Co-authored-by: Zhongdong Yang <zhongdong_y@outlook.com> Co-authored-by: Xinyu Yang <cauyxy@163.com> Co-authored-by: 1375626371 <1375626371@qq.com>
As requested on slack, here is a blog post that discusses the recently added integration of ZeRO via DeepSpeed and FairScale.
Please read the content and see if I have missed anything.
As before I have no idea how to make the thumbnail.
Please feel free to change the title and make suggestions.
@julien-c, @sgugger, @thomwolf, @LysandreJik