-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split learner with composition. #5404
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5404 +/- ##
=======================================
Coverage 84.07% 84.07%
=======================================
Files 11 11
Lines 2411 2411
=======================================
Hits 2027 2027
Misses 384 384 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not really composition to put all of the member variables inside a writable struct and share that between the components. In this case all classes have access to everything and we haven't changed much.
Looking at this more I think the problem is that there is just too much stuff rather than that the stuff is not separated into components, although it helps a little. We need to keep working away at removing deprecated parts and simplifying.
Let me know what you think, I'll be happy to merge either approach.
/*! \brief Constant string identifying a passed in parameter is a metric name. */ | ||
static std::string const kEvalMetric; // NOLINT | ||
/*! \brief A global storage for internal prediction. */ | ||
PredictionContainer cache; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If any of these variables are only used by one of the learner classes and do not need to be shared, move them to be members of that class. The idea is that variables are not shared if they do not need to be.
} | ||
|
||
// add additional parameters | ||
// These are cosntraints that need to be satisfied. | ||
if (tparam_.dsplit == DataSplitMode::kAuto && rabit::IsDistributed()) { | ||
tparam_.dsplit = DataSplitMode::kRow; | ||
if (attr->tparam.dsplit == DataSplitMode::kAuto && rabit::IsDistributed()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we able to remove data split mode in the near future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends. Do we still want to support different split mode in the future?
// 'distcol' updater hidden until it becomes functional again | ||
// See discussion at https://github.com/dmlc/xgboost/issues/1832 | ||
LOG(FATAL) << "Column-wise data split is currently not supported."; | ||
void ConfigureNumFeatures(LearnerAttributes* attr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this in the near future and use DMatrix as the source of truth?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the JVM PR is merged.
|
||
void ConfigureGBM(LearnerAttributes *attr, | ||
LearnerTrainParam const &old, Args const &args) const { | ||
if (attr->gbm == nullptr || old.booster != attr->tparam.booster) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Learner had access to parameters on its construction, we could get rid of these chains of Configure methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't. The parameters can be loaded from pickle.
std::vector<std::pair<std::string, std::string> > extra_attr; | ||
mparam.contain_extra_attrs = 1; | ||
|
||
{ | ||
std::vector<std::string> saved_params; | ||
// check if rabit_bootstrap_cache were set to non zero before adding to checkpoint | ||
if (cfg_.find("rabit_bootstrap_cache") != cfg_.end() && | ||
(cfg_.find("rabit_bootstrap_cache"))->second != "0") { | ||
if (attr->cfg.find("rabit_bootstrap_cache") != attr->cfg.end() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what rabit_bootstrap_cache
is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's from JVM package.
I'm afraid the variables are indeed shared. |
@RAMitchell I prefer #5350 over this PR. I know that jumping around inheritance is not ideal, but at least that way we don't need an extra |
Sounds good to me. |
Another version of #5350, use composition instead of inheritance as suggested by @RAMitchell . Keeping the old branch for comparison.