Skip to content

[release/2.6][SWDEV-523736] Skip&Fix some testcases for archs without SDPA or Navi4x #2213

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: release/2.6
Choose a base branch
from

Conversation

k-artem
Copy link

@k-artem k-artem commented May 30, 2025

[release/2.6][SWDEV-523736] Skip some testcases for archs without SDPA or Navi4x

[SWDEV-523736] Fix some unittests for Navi4x

  • Most part of testcases work properly on Navi48(gfx1201) with
    TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1, in this commit enable it
    for this arch. No support of AOTriton currently for Navci44(gfx1200),
    so these testcases just skipped.
  • test_qconv2d_int8_mixed_bf16 skipped because it was originally skipped in
    [Inductor] [Quant] Enable QConv2d Unary int8-mixed-bf16 Lowering pytorch/pytorch#112550 but later lost.
  • test_sac_ilp_case1 skipped as per SWDEV-509011
  • test_distributed_checkpoint_state_dict_type[0-1]_cuda fixed bug
    with arguments.

Fixes #SWDEV-523736

@k-artem k-artem force-pushed the Skip_for_navi4x branch from e041a52 to 3e98db4 Compare May 30, 2025 15:50
@k-artem
Copy link
Author

k-artem commented May 30, 2025

pruthvistony please have a look.

@k-artem k-artem changed the title [release/2.6][SWDEV-523736] Skip not compatible testcase for Navi4x [release/2.6][SWDEV-523736] Skip some testcases for archs without SDPA or Navi4x May 30, 2025
@@ -10611,6 +10611,7 @@ def fn(q, k, v):
)

@expectedFailureXPU
@unittest.skipIf(not PLATFORM_SUPPORTS_FLASH_ATTENTION, "Some archs don't support SDPA")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SDPA is support for Navi4 through AOTriton, we shouldnt skip the whole test

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true for Navi48, but not for Navi44, currently no support it but it's planned as experimental in AOTriton 0.10.

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Jun 6, 2025

Jenkins build for 5d697ad2d52d7d659dc4cc2fb76f7a224dc7b2a3 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@k-artem k-artem changed the title [release/2.6][SWDEV-523736] Skip some testcases for archs without SDPA or Navi4x [release/2.6][SWDEV-523736] Skip&Fix some testcases for archs without SDPA or Navi4x Jun 13, 2025
@k-artem
Copy link
Author

k-artem commented Jun 13, 2025

The commit updated, pruthvistony please have a look.

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Jun 13, 2025

Jenkins build for 03c4453a242a92f069a50552f0592ba3a38f46ee commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

* Most part of testcases work properly on Navi48(gfx1201) with
  TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1, in this commit enable it
  for this arch. No support of AOTriton currently for Navci44(gfx1200),
  so these testcases just skipped.
* test_qconv2d_int8_mixed_bf16 skipped because it was originally skipped in
  pytorch#112550 but later lost.
* test_sac_ilp_case1 skipped as per SWDEV-509011
* test_distributed_checkpoint_state_dict_type[0-1]_cuda fixed bug
  with arguments.
@pruthvistony
Copy link
Collaborator

@xinyazhang ,
Please review this PR.

Copy link

@xinyazhang xinyazhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL in evaluate_platform_supports_flash_attention/etc.

@@ -6408,6 +6410,7 @@ def fn(x):
self.assertEqual(fn(inp), opt_fn(inp))

@requires_cuda
@skipIfRocmArch(NAVI44_ARCH)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to change to PLATFORM_SUPPORTS_FLASH_ATTENTION or is this intentional?


def CDNA2OrLater():
return evaluate_gfx_arch_within(["gfx90a", "gfx942"])

def evaluate_platform_supports_flash_attention():
if TEST_WITH_ROCM:
arch_list = ["gfx90a", "gfx942", "gfx1100"]
arch_list = ["gfx90a", "gfx942", "gfx1100", "gfx1201"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should take TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL into considration. Check the upstream's pattern as a reference.

Copy link
Author

@k-artem k-artem Jun 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It takes in line 48, but this patch is implemented another logic because target version is ROCm 6.4.2/pytorch/2.6, I enabled TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL in evaluate_gfx_arch_within

 + if result and gcn_arch_name == "gfx1201":
 +      os.environ['TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL'] = '1'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't do this.
UT should follow the guide of TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL rather than changing its value

@k-artem k-artem force-pushed the Skip_for_navi4x branch from 8b56090 to adbbad0 Compare July 1, 2025 14:07
@k-artem
Copy link
Author

k-artem commented Jul 1, 2025

@xinyazhang updated according comments, please review

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Jul 1, 2025

Jenkins build for adbbad0e7b7c4cc21c1841fa68e7fe93a789ddbd commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Copy link

@xinyazhang xinyazhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some lint concerns, otherwise LGTM

@k-artem k-artem requested a review from xinyazhang July 2, 2025 13:11
@k-artem
Copy link
Author

k-artem commented Jul 4, 2025

@pruthvistony please review.

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Jul 4, 2025

Jenkins build for 53a544d3b4f846ba5bb5934e1734ad3f01167ee7 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants