Skip to content

[release/2.4] Prevent static initialization of at::cuda::warp_size() (Backport #2293) #2308

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

xinyazhang
Copy link

Fixes SWDEV-540240, SWDEV-540309, SWDEV-539989

...

80cca70
created a static global variable that used at::cuda::warp_size() to
initialize its value, which needs GPUs to be visible to query device
properties. However, GPUs are not present on CPU-only build systems.

Convert static variable into a static function, thus preventing static
initialization.

http://rocm-ci.amd.com/job/pyt_whl_docker_mainline/1461/artifact/build_artifacts.txt/*view*/

Ran microbenchmark to confirm basic functionality:

root@ubb4-rack-22:/var/lib/jenkins/pytorch-micro-benchmarking# python3 micro_benchmarking_pytorch.py --network resnet50
INFO: running forward and backward for warmup.
INFO: running the benchmark..
OK: finished running benchmark..
--------------------SUMMARY--------------------------
Microbenchmark for network : resnet50
Num devices: 1
Dtype: FP32
Mini batch size [img] : 64
Time per mini-batch : 0.10158218145370483
Throughput [img/sec] : 630.0317544289736=

…:warp_size() (#2293)

Fixes SWDEV-540240, SWDEV-540309, SWDEV-539989

```
...
```

80cca70
created a static global variable that used `at::cuda::warp_size()` to
initialize its value, which needs GPUs to be visible to query device
properties. However, GPUs are not present on CPU-only build systems.

Convert static variable into a static function, thus preventing static
initialization.

http://rocm-ci.amd.com/job/pyt_whl_docker_mainline/1461/artifact/build_artifacts.txt/*view*/

Ran microbenchmark to confirm basic functionality:
```
root@ubb4-rack-22:/var/lib/jenkins/pytorch-micro-benchmarking# python3 micro_benchmarking_pytorch.py --network resnet50
INFO: running forward and backward for warmup.
INFO: running the benchmark..
OK: finished running benchmark..
--------------------SUMMARY--------------------------
Microbenchmark for network : resnet50
Num devices: 1
Dtype: FP32
Mini batch size [img] : 64
Time per mini-batch : 0.10158218145370483
Throughput [img/sec] : 630.0317544289736=
```
@xinyazhang xinyazhang changed the title [rocm7.0_internal_testing] Prevent static initialization of at::cuda::warp_size() (#2293) [release/2.4] Prevent static initialization of at::cuda::warp_size() (#2293) Jul 2, 2025
@xinyazhang xinyazhang changed the title [release/2.4] Prevent static initialization of at::cuda::warp_size() (#2293) [release/2.4] Prevent static initialization of at::cuda::warp_size() (Backport #2293) Jul 2, 2025
@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Jul 2, 2025

Jenkins build for fd2a0432ae459fdabb6d3e5651ff4b918ab947fa commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@xinyazhang xinyazhang marked this pull request as ready for review July 2, 2025 16:13
@xinyazhang xinyazhang marked this pull request as draft July 2, 2025 19:44
@xinyazhang
Copy link
Author

Superseded by #2318

@xinyazhang xinyazhang closed this Jul 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants