-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allowing dynamic linking of NCCL #4445
Comments
@jakirkham Static linking is used here so that multi-GPU training works "out of the box". Many users will install XGBoost by running |
@trivialfis I see that dynamic linking of NCCL may be desirable when XGBoost is used as a dependency to another application (via CMake exported target). Right now, if an application using NCCL imports XGBoost, the NCCL library will be duplicated and cause code bloat. |
Code bloat is fine, it's nothing close to DL library. The problem is it leads to undefined behaviour when we have functions with same signature but different implementation. The linker can fail to detect that, as illustrated in above issue and related reproducible code. |
@trivialfis Meaning two versions of NCCL co-existing? |
@trivialfis I don't think that link is exactly applicable to the case of NCCL, since we are not including the NCCL source code. We are importing the static library pre-compiled by NVIDIA. |
The example above imports statically compiled version of |
Got it. So to prevent undefined behavior, we want to de-duplicate dependencies, including dmlc-core and NCCL. |
Yes. That would be my reason. |
Another use case for shared libraries is in package management. Namely we are able to create smaller packages and better track relationships between them by building shared libraries and creating explicit dependencies between upstream and downstream packages that need these shared libraries. |
Appears that NCCL's static library is used when building xgboost. Am curious if it would be reasonable to allow users to dynamically link instead or as an alternative to static linking.
The text was updated successfully, but these errors were encountered: