Open
Description
I get CUDA Error: misaligned address
when running the tp comm overlap unit test with recent pytorch container.
I think the error comes from the cublas versions that enables nvjet
.
[rank1]: Traceback (most recent call last):
[rank1]: File "/lustre/fsw/coreai_mlperf_training/slym/module_tests/tp_overlap/te.tp_tests/tests/pytorch/distributed/run_gemm_with_overlap.py", line 922, in <module>
[rank1]: sys.exit(_main(_parse_args()))
[rank1]: ^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.12/dist-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
[rank1]: return f(*args, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^
[rank1]: File "/lustre/fsw/coreai_mlperf_training/slym/module_tests/tp_overlap/te.tp_tests/tests/pytorch/distributed/run_gemm_with_overlap.py", line 721, in _main
[rank1]: all_outputs = _fp8_gemm()
[rank1]: ^^^^^^^^^^^
[rank1]: File "/lustre/fsw/coreai_mlperf_training/slym/module_tests/tp_overlap/te.tp_tests/tests/pytorch/distributed/run_gemm_with_overlap.py", line 602, in _fp8_gemm
[rank1]: return tex.fp8_gemm(
[rank1]: ^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.12/dist-packages/transformer_engine/pytorch/cpp_extensions/gemm.py", line 180, in fp8_gemm
[rank1]: _ = fn(*args)
[rank1]: ^^^^^^^^^
[rank1]: RuntimeError: /workspace/TransformerEngine/transformer_engine/common/comm_gemm_overlap/comm_gemm_overlap.cpp:802 in function split_overlap_ag: CUDA Error: misaligned address
Metadata
Metadata
Assignees
Labels
No labels