Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Add cmake flag USE_FATBIN_COMPRESSION, ON by default #19123

Merged
merged 2 commits into from
Sep 12, 2020

Conversation

DickJC123
Copy link
Contributor

@DickJC123 DickJC123 commented Sep 11, 2020

As the size of libmxnet.so grows to near 2GB, with increased functionality and the addition of cuda architectures, we're running into link failures, e.g. see issue #17045

One technique that lowers lib size dramatically is 'fatbin compression', enabled by the nvcc options --fatbin-options -compress-all. This has been always a part of Makefile builds, but this PR adds it to the cmake builds. Specifically, this PR adds support to CMakeLists.txt for the cmake option -DUSE_FATBIN_COMPRESSION={ON,OFF}, with a default of ON for CUDA 11 builds and beyond. This PR proposes to leave existing cmake builds against 10.2 as they are, without fatbin compression, to avoid unnecessarily introducing unforeseen consequences to existing use cases.

Results of experiments building the 1.x branch with cuda11:

With cmake options -DMXNET_CUDA_ARCH="5.2 6.0 6.1 7.0 7.2 7.5 8.0" -DUSE_FATBIN_COMPRESSION=OFF, a cuda11 build fails with link error:

libmxnet.so: PC-relative offset overflow in PLT entry for void mxnet::op::mxnet_op::Kernel<...> ...

With the same above cmake options, but dropping arches 5.2 and 7.2, the build succeeds with a libmxnet.so size of 1.8GB.
Finally, with the same first cmake options -DMXNET_CUDA_ARCH="5.2 6.0 6.1 7.0 7.2 7.5 8.0" a cuda11 build (using fatbin compression then by default) succeeds with a libmxnet.so size of 750MB, so over a 2X decrease in size.

Both succeeding builds, one with fatbin compression and one without, ran the command:

time python -c "import mxnet as mx; x = mx.nd.array([1,], ctx=mx.gpu(0)); print((x+1).asnumpy()))"
[2.]

in the same time of 7.6 secs.

@samskalicky @anirudh2290 @ChaiBapchya @ptrendx

@mxnet-bot
Copy link

Hey @DickJC123 , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [website, windows-gpu, miscellaneous, edge, centos-cpu, unix-cpu, centos-gpu, sanity, unix-gpu, windows-cpu, clang]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@leezu
Copy link
Contributor

leezu commented Sep 11, 2020

As we can change default options before the 2.0 release, it may be helpful to always enable USE_FATBIN_COMPRESSION? A couple of users run into the linking errors if gpu auto-detection fails and cmake defaults to building for all "common" gpu architectures.

@DickJC123
Copy link
Contributor Author

I've adjusted the default to be always ON, regardless of CUDA version.

Copy link
Contributor

@samskalicky samskalicky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DickJC123 for this great addition!

@szha szha merged commit 5c1aadc into apache:master Sep 12, 2020
@DickJC123 DickJC123 changed the title Add cmake flag USE_FATBIN_COMPRESSION, ON by default for CUDA >= 11 Add cmake flag USE_FATBIN_COMPRESSION, ON by default Sep 15, 2020
samskalicky pushed a commit that referenced this pull request Sep 17, 2020
…19123) (#19158)

* [1.x] Backport Add cmake flag USE_FATBIN_COMPRESSION, ON by default (#19123)

* Trigger CI

* Appending to existing CMAKE_CUDA_FLAGS in all cases
DickJC123 added a commit to DickJC123/mxnet that referenced this pull request Sep 18, 2020
…pache#19123) (apache#19158)

* [1.x] Backport Add cmake flag USE_FATBIN_COMPRESSION, ON by default (apache#19123)

* Trigger CI

* Appending to existing CMAKE_CUDA_FLAGS in all cases
chinakook pushed a commit to chinakook/mxnet that referenced this pull request Nov 5, 2021
…pache#19123)

* Add cmake flag USE_FATBIN_COMPRESSION, ON by default for CUDA >= 11

* cmake flag USE_FATBIN_COMPRESSION default is ON for all builds
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants