[PyTorch] Fuse permute+pad and unpermute+unpad ops for FP8 optimization #1921
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
1.Fused
moe_permute_with_probs
+Fp8Padding
and fusedmoe_unpermute
+Fp8Unpadding
, which removes the explicit padding/unpadding in the MOE experts module, improved performance and reduced peak gpu memory usage.2.Added tests of fused permute/pad and unpermute/unpad operations.
Description
This PR optimizes FP8 MoE permute and pad operations by:
moe_permute_with_probs
+Fp8Padding
intomoe_permute_and_pad_with_probs
moe_unpermute
+Fp8Unpadding
intomoe_unpermute
withpad_offsets
argumentResults:
tests/pytorch/test_permutation.py
)Performance data
Tests covering a wide range of model training configurations were performed comparing the fused operations ("Fused:") and the original version ("Orig:"). Running time (in milliseconds) are summarized in the table below and the speedup, measured as the reciprocal of the ratio between running times, are also provided. All tests were carried out with the tests/pytorch/test_permutation.py benchmark script.
The usage in Megatron-LM
Megatron-LM/megatron/core/transformer/moe/moe_utils.py
: Added Support for Fused Operations`
`
Megatron-LM/megatron/core/transformer/moe/token_dispatcher.py
: Scheduler Integration`
`
Type of change
Documentation change (change only to the documentation, either a fix or a new content)
Changes
moe_permute_and_pad_with_probs
api for fused permute and pad, modifiedmoe_unpermute
api with pad_offsets argument for fused unpermute and unpad in transformer_engine/pytorch/permutation.pytests/pytorch/test_permutation.py
Checklist: