[JAX] Support Flax sharding constraints #1933

jberchtold-nvidia · 2025-07-07T23:57:34Z

Description

Updates our with_sharding_constraint_by_logical_axes helper function to support and prefer Flax logical axis rules when they exist in the current context. If no Flax logical axes rules exist in the current context, it will fall back to TE's hardcoded logical axes, though this functionality is no deprecated and will be removed in the future.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Update with_sharding_constraint_by_logical_axes to check if Flax logical axis rules exist in the current context, and if so call the sharding constraint with flax.linen.with_logical_constraint
Update unfused attention in transformers.py to remove duplicate SEQLEN_AXIS usage in a single sharding constraint. This was not flagged before since our TE hardcoded axis system didn't check for this and it was always mapped to None/replicated.
Corrected logical axis rule context setup in encoder examples

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>

for more information, see https://pre-commit.ci

…ints

jberchtold-nvidia · 2025-07-10T18:12:11Z

/te-ci L0 L1

transformer_engine/jax/sharding.py

Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>

for more information, see https://pre-commit.ci

Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>

jberchtold-nvidia · 2025-07-10T23:42:41Z

/te-ci L0 L1

jberchtold-nvidia · 2025-07-11T01:16:16Z

/te-ci L0 L1

…ints

jberchtold-nvidia · 2025-07-11T01:47:07Z

/te-ci L0 L1 jax

…axis rule setup Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>

…ints

jberchtold-nvidia · 2025-07-11T18:14:56Z

/te-ci L0 L1

…ints

phu0ngng · 2025-07-11T20:49:36Z

/te-ci L1

phu0ngng · 2025-07-11T20:59:46Z

examples/jax/encoder/test_model_parallel_encoder.py

+        with jax.sharding.Mesh(
+            devices=device_mesh, axis_names=(DEVICE_DP_AXIS, DEVICE_TP_AXIS)
+        ) as mesh, flax.linen.logical_axis_rules(te_extended_axis_rules):


Hi,
I'm curious to learn what the difference is between having the jax.sharding.Mesh context before vs after the te.fp8_autocast context.

Afaik it doesn't matter the order of the Mesh vs. fp8_autocast, as long as both are before your model. But the order of te_flax.extend_logical_axis_rules and fp8_autocast does matter according to the docstring of the former.

So the ordering needs to be:

fp8_autocast context

Create Flax logical axis rule context. To support TE's hardcoded axis system in Flax (which UnfusedDotProductAttention requires), we extend the logical access rules with TE's rule table via te_flax.extend_logical_axis_rules, which must be inside an fp8_autocast.

Create and initialize the model, training loop, etc.

Afaik, the Mesh can come anywhere before item 3.

I just pulled up the fp8_autocast to the top and merged the Mesh with the with block for the logical axis rule context in item 2 to reduce indentation. But if a smaller diff is preferred, I can do Mesh -> fp8_autocast -> logical axis rules

jberchtold-nvidia and others added 2 commits July 7, 2025 10:32

Support flax sharding constraints

8fc7cd7

Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

80524c1

for more information, see https://pre-commit.ci

jberchtold-nvidia changed the title ~~[DRAFT] [JAX] Support Flax sharding constraints~~ [JAX] Support Flax sharding constraints Jul 10, 2025

Merge branch 'main' into dev/jberchtold/support-flax-sharding-constra…

209f1e6

…ints

jberchtold-nvidia requested a review from phu0ngng July 10, 2025 18:12

phu0ngng reviewed Jul 10, 2025

View reviewed changes

transformer_engine/jax/sharding.py Show resolved Hide resolved

jberchtold-nvidia and others added 3 commits July 10, 2025 13:00

Add warning for deprecated TE logical axes

246d893

Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

c333077

for more information, see https://pre-commit.ci

Fix dataset package version for examples

b51986f

Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>

Merge branch 'main' into dev/jberchtold/support-flax-sharding-constra…

33e3cf1

…ints

jberchtold-nvidia and others added 2 commits July 11, 2025 10:59

Update transformer attention weight sharding axes and update example …

062a58a

…axis rule setup Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>

Merge branch 'main' into dev/jberchtold/support-flax-sharding-constra…

06d3122

…ints

Merge branch 'main' into dev/jberchtold/support-flax-sharding-constra…

5c98794

…ints

phu0ngng reviewed Jul 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[JAX] Support Flax sharding constraints #1933

[JAX] Support Flax sharding constraints #1933

jberchtold-nvidia commented Jul 7, 2025 •

edited

Loading

Uh oh!

jberchtold-nvidia commented Jul 10, 2025

Uh oh!

Uh oh!

jberchtold-nvidia commented Jul 10, 2025

Uh oh!

jberchtold-nvidia commented Jul 11, 2025

Uh oh!

jberchtold-nvidia commented Jul 11, 2025

Uh oh!

jberchtold-nvidia commented Jul 11, 2025

Uh oh!

phu0ngng commented Jul 11, 2025

Uh oh!

phu0ngng Jul 11, 2025

Uh oh!

jberchtold-nvidia Jul 11, 2025

Uh oh!

Uh oh!

[JAX] Support Flax sharding constraints #1933

Are you sure you want to change the base?

[JAX] Support Flax sharding constraints #1933

Conversation

jberchtold-nvidia commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

Checklist:

Uh oh!

jberchtold-nvidia commented Jul 10, 2025

Uh oh!

Uh oh!

jberchtold-nvidia commented Jul 10, 2025

Uh oh!

jberchtold-nvidia commented Jul 11, 2025

Uh oh!

jberchtold-nvidia commented Jul 11, 2025

Uh oh!

jberchtold-nvidia commented Jul 11, 2025

Uh oh!

phu0ngng commented Jul 11, 2025

Uh oh!

phu0ngng Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

jberchtold-nvidia Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jberchtold-nvidia commented Jul 7, 2025 •

edited

Loading