Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split learner with composition. #5404

Closed
wants to merge 6 commits into from

Conversation

trivialfis
Copy link
Member

Another version of #5350, use composition instead of inheritance as suggested by @RAMitchell . Keeping the old branch for comparison.

@trivialfis trivialfis changed the title Split learner rebase Split learner with composition. Mar 10, 2020
@codecov-io
Copy link

Codecov Report

Merging #5404 into master will not change coverage by %.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #5404   +/-   ##
=======================================
  Coverage   84.07%   84.07%           
=======================================
  Files          11       11           
  Lines        2411     2411           
=======================================
  Hits         2027     2027           
  Misses        384      384           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7a99f8f...fb956bc. Read the comment docs.

Copy link
Member

@RAMitchell RAMitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really composition to put all of the member variables inside a writable struct and share that between the components. In this case all classes have access to everything and we haven't changed much.

Looking at this more I think the problem is that there is just too much stuff rather than that the stuff is not separated into components, although it helps a little. We need to keep working away at removing deprecated parts and simplifying.

Let me know what you think, I'll be happy to merge either approach.

/*! \brief Constant string identifying a passed in parameter is a metric name. */
static std::string const kEvalMetric; // NOLINT
/*! \brief A global storage for internal prediction. */
PredictionContainer cache;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If any of these variables are only used by one of the learner classes and do not need to be shared, move them to be members of that class. The idea is that variables are not shared if they do not need to be.

}

// add additional parameters
// These are cosntraints that need to be satisfied.
if (tparam_.dsplit == DataSplitMode::kAuto && rabit::IsDistributed()) {
tparam_.dsplit = DataSplitMode::kRow;
if (attr->tparam.dsplit == DataSplitMode::kAuto && rabit::IsDistributed()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we able to remove data split mode in the near future?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends. Do we still want to support different split mode in the future?

// 'distcol' updater hidden until it becomes functional again
// See discussion at https://github.com/dmlc/xgboost/issues/1832
LOG(FATAL) << "Column-wise data split is currently not supported.";
void ConfigureNumFeatures(LearnerAttributes* attr) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this in the near future and use DMatrix as the source of truth?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the JVM PR is merged.


void ConfigureGBM(LearnerAttributes *attr,
LearnerTrainParam const &old, Args const &args) const {
if (attr->gbm == nullptr || old.booster != attr->tparam.booster) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Learner had access to parameters on its construction, we could get rid of these chains of Configure methods.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't. The parameters can be loaded from pickle.

std::vector<std::pair<std::string, std::string> > extra_attr;
mparam.contain_extra_attrs = 1;

{
std::vector<std::string> saved_params;
// check if rabit_bootstrap_cache were set to non zero before adding to checkpoint
if (cfg_.find("rabit_bootstrap_cache") != cfg_.end() &&
(cfg_.find("rabit_bootstrap_cache"))->second != "0") {
if (attr->cfg.find("rabit_bootstrap_cache") != attr->cfg.end() &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what rabit_bootstrap_cache is

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's from JVM package.

@trivialfis
Copy link
Member Author

I'm afraid the variables are indeed shared.

@trivialfis
Copy link
Member Author

@RAMitchell I prefer #5350 over this PR. I know that jumping around inheritance is not ideal, but at least that way we don't need an extra this pointer (attr in this PR). The split PR is only to split up the functionality so we can remove them as much as possible in the future.

@RAMitchell
Copy link
Member

Sounds good to me.

@trivialfis trivialfis closed this Mar 12, 2020
@trivialfis trivialfis deleted the split-learner-rebase branch March 12, 2020 08:31
@lock lock bot locked as resolved and limited conversation to collaborators Jun 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants