-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add gpu_hist
support to Spark
#4175
Conversation
Thanks for the contribution First of all, we should not merge this at least for 0.82 release (which is happening soon), as it is never tested by anyone except the author and hard to be done within such few days Second, I don't think it's ready to support GPU in XGBoost-Spark since Spark itself is far from being ready:
so I didn't see I will agree to have this fancy feature in near future |
@@ -171,7 +179,7 @@ private[spark] trait GeneralParams extends Params { | |||
|
|||
final def getSeed: Long = $(seed) | |||
|
|||
setDefault(numRound -> 1, numWorkers -> 1, nthread -> 1, | |||
setDefault(numRound -> 1, numWorkers -> 1, nthread -> 1, nGpus -> 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be 0 at least
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be happy to change it, but it's only used when tree_method
is set to gpu_hist
, in which case the user probably expects to grab a GPU. It's also the default on the C++ side (https://github.com/dmlc/xgboost/blob/master/src/tree/param.h#L200), so having this default seems less surprising and more consistent.
@CodingCat Thanks for the quick review! I don't know about fancy, XGBoost GPU support was added in 2016. :) Yeah it's fine to hold off until 0.82 is released, and I totally agree not everything is ready, but isn't this the whole point of open source development, the Cathedral and the Bazaar, release early and release often, given enough eyeballs all bugs are shallow, etc. To address your specific concerns:
Looking through past issues, people have asked for this (#2983, #3499), so at least there is some demand. |
XGBoost GPU != XGBoost-Spark GPU
It's not the same situation, the current issue is that we know there is a bug and we push this known bug to the user?
here are something copied from that thread "One other concern is that adding another option that we say is ready to run out of the box for GPUs, is that we have to maintain this mode and ensure it is tested in CI." - this is from mccheah, who is one of the main persons behind spark@k8s which is the only way to make Spark "run" with GPU
no, this is not the definition of "working" especially nowadays XGBoost is not a research project anymore...if you track how Spark accepts features, it is more and more conservative along the way
I mean Spark, as the base of XGBoost-Spark, should prove that it is supporting GPU in a mature way, so feel free to work with Spark community on this
I think #3499 is to be addressed in #4095 regarding #2983, since the release of XGBoost-GPU, how many issues are raised about GPU, and how many are about distributed GPU in Spark? I don't think that's a convincing number to trigger us to take the risk of claiming it is supported in XGBoost-Spark even Spark didn't say that |
I would suggest, as you are working for NVIDIA, how about host a library in NVIDIA@GITHUB based on XGBoost-Spark to support distributed GPU? as that's the major interests for NVIDIA, but as the community member, I have concern on the quality of feature in the master branch and on some day (I also hope it will happen ASAP) when Spark supports GPU better, I'd more than happy to work with you to bring the feature here |
@@ -284,7 +286,7 @@ private[spark] object BoosterParams { | |||
|
|||
val supportedBoosters = HashSet("gbtree", "gblinear", "dart") | |||
|
|||
val supportedTreeMethods = HashSet("auto", "exact", "approx", "hist") | |||
val supportedTreeMethods = HashSet("auto", "exact", "approx", "hist", "gpu_hist", "gpu_exact") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most likely, gpu_exact
doesn't support running in a distributed setting. Could you remove it from the list of supported tree methods?
FWIW there is an actual design for GPU resource scheduling in Spark 3.0: https://issues.apache.org/jira/browse/SPARK-24615 It will probably go into 3.0 in some form. Yeah, that would be a good time to try to use GPU-aware scheduling. Anything else is a little hacky. |
Two parts to the PR:
I've tested this on GCP with a 20-node Spark Standalone cluster, 1 T4 GPU per node.
@RAMitchell @canonizer @CodingCat @mt-jones