Move prediction cache to Learner. #5220

trivialfis · 2020-01-22T05:43:48Z

Clean-ups

Remove duplicated cache in Learner and GBM.
Remove ad-hoc fix of invalid cache.
Remove PredictFromCache in predictors.
Remove prediction cache for linear altogether, as it's only moving the prediction into training process but doesn't provide any actual overall speed gain.
The cache is now unique to Learner, which means the ownership is no longer shared by any other components.

Changes

Add version to prediction cache.
Use weak ptr to check expired DMatrix.
Pass shared pointer instead of raw pointer.

Risks

Before the DMatrix is tied to Booster with a shared pointer, which means the access to it is always safe as it can't expire before the booster. Now we need to make sure the safety ourselves.

Advantages

As now we don't store the shared pointer any more, this should reduce the memory usage when doing grid search with libraries like Scikit-Learn (during evaluation the training matrix can be released). Also, performance for training continuation (incremental training, checkpoint recovery) will be significantly better as we cache all encountered DMatrix.

Last thing is this PR is intended to ease the development of our DMatrix refactoring plan, which requires an easy to access and reliable prediction cache, so that we can use quantile values as a proxy to actual data.

Related issues and PRs:

PRs:

#5272

Issues:

#3946
#4786
#4774
#4482

trivialfis · 2020-01-22T17:24:28Z

Requires some more thoughts to make a clean design for the cache.

RAMitchell

As an idea, what if the learner handles all prediction caching and when it needs to update the cache it calls predict only on the most recent tree. We remove all functions like "UpdatePredictionCache" and the caching mechanism is isolated to the learner. There might be a small loss of efficiency because we do not make use of the information in the Updater, but we still achieve algorithmic efficiency by predicting from only one tree per iteration.

trivialfis · 2020-02-10T14:01:33Z

@hcho3 We talked about adding a custom segfault handler like: https://github.com/bombela/backward-cpp

I added a simple one in this PR for debugging on Jenkins (and it was helpful). But it breaks JVM package's error handling/recovery logic, so I will remove it once the PR is done. You may find it useful in the future as it's super simple (about 50+ added lines).

include/xgboost/predictor.h

RAMitchell

Nice work, I think the approach is solid.

This PR is too large to review properly. The gblinear changes can be extracted. There are unnecessary formatting changes or changes not related to prediction caching. Changing the DMatrix pointer arguments to shared pointers could potentially be extracted.

I want to review the changes to caching logic more carefully and in parts. You can help by breaking things down as much is possible. As you know this part of the code base is very bug prone.

src/learner.cc

RAMitchell · 2020-02-10T23:16:05Z

src/learner.cc

+
+      predictions_.SetDevice(generic_parameters_.gpu_id);
+      predictions_.Resize(predt.predictions.Size());
+      predictions_.Copy(predt.predictions);


Is there an extra copy here that wasn't there before?

@RAMitchell It's a copy moved from Predictor::PredictFromCache. I removed this function altogether as now the cache is managed by Learner, hence the copying is also done by Learner. ~~It's 1 copy less as I also removed the temporary vector in C API.~~

If I use two differently sized evaluation matrices, the predictions vector will get resized twice every single iteration. This is a very common case.

src/learner.cc

RAMitchell · 2020-02-10T23:17:57Z

src/learner.cc

+      this->PredictRaw(data.get(), predts, training, ntree_limit);
+      out_preds->SetDevice(generic_parameters_.gpu_id);
+      out_preds->Resize(predts->predictions.Size());
+      out_preds->Copy(predts->predictions);


Again, is there extra copying happening here?

src/learner.cc

src/tree/updater_prune.cc

trivialfis · 2020-02-11T00:25:07Z

Wil revert some changes, I sometimes added formatting changes because clangd is warning with clang-tidy checks and it's quite difficult to focus.

trivialfis · 2020-02-11T01:52:24Z

Changing the DMatrix pointer arguments to shared pointers could potentially be extracted.

This might not be possible as we need the shared pointer for constructing std::weak_ptr.

RAMitchell · 2020-02-11T02:17:44Z

You could do the shared pointers first. This way we get past some harmless changes before we look at the core logic.

Extracted from dmlc#5220 .

Extracted from #5220 .

trivialfis · 2020-02-11T07:17:55Z

src/predictor/cpu_predictor.cc

    int old_ntree = model.trees.size() - num_new_trees;
    // update cache entry
-    for (auto& kv : (*cache_)) {


It might be worth noting, before the PR, this function updates the cache for all DMatrix, including validation datasets. Now it only updates the training DMatrix, as I don't think the updater can/should have information about validation datasets in the future (otherwise that would be peeking and violates the purpose of validation).

* Clean-ups - Remove duplicated cache in Learner and GBM. - Remove ad-hoc fix of invalid cache. - Remove `PredictFromCache` in predictors. - Remove prediction cache for linear altogether, as it's only moving the prediction into training process but doesn't provide any actual overall speed gain. - The cache is now unique to Learner, which means the ownership is no longer shared by any other components. * Changes - Add version to prediction cache. - Use weak ptr to check expired DMatrix. - Pass shared pointer instead of raw pointer.

trivialfis · 2020-02-11T17:26:23Z

Not sure why, putting HostDeviceVector into thread local store crashes Python on Windows at the end of execution. @RAMitchell Have you ever experienced this?

RAMitchell · 2020-02-11T17:58:36Z

If you use device vectors as static variables they can get destructed after the cuda context has been released, leading to errors.

trivialfis · 2020-02-11T19:21:49Z

This might make #5207 difficult to implement...

trivialfis · 2020-02-11T19:26:48Z

@RAMitchell Anyway problem for another day. I cleaned up the PR per your request. Formatting changes are reverted. But the change in linear is kept, as gbm no longer has access to global prediction cache, the code removal for linear is necessary.

RAMitchell

Looks very nice. Is using an integer for prediction state and begin/end ranges is the most robust approach? You have already noted that this causes ambiguity when there are multiple output groups. In a perfect world we would have unique hashes for every tree generated and be able to see which trees have contributed to a prediction vector. Maybe we could get closer to this?

It looks to me like your version is smarter and only computes predictions that it needs, now that some state is carried along with the prediction vector. If you change the updaters to return false in UpdatePredictionCache you could get an idea of performance regression from removing these functions. Of course we cannot do this until gpu_predictor is able to predict from Ellpack, otherwise we have memory allocation problems.

RAMitchell · 2020-02-11T22:55:09Z

src/learner.cc

+
+      predictions_.SetDevice(generic_parameters_.gpu_id);
+      predictions_.Resize(predt.predictions.Size());
+      predictions_.Copy(predt.predictions);


If I use two differently sized evaluation matrices, the predictions vector will get resized twice every single iteration. This is a very common case.

src/predictor/cpu_predictor.cc

trivialfis · 2020-02-12T04:04:38Z

@RAMitchell

If I use two differently sized evaluation matrices, the predictions vector will get resized twice every single iteration. This is a very common case.

Before this PR, Learner stores another prediction vector for each DMatrix (hence 2 prediction vectors for each DMatrix, one for cache one for temporary), I can restore that to avoid the resize. Another way around is I can modify the input to metric into a span and resize the prediction vector when it doesn't have sufficient space for current DMatrix.

Update:

I restored the additional vectors for each matrix to avoid any additional refactor in this PR.

trivialfis · 2020-02-12T06:10:34Z

I don't have a better name than version. If you are boosting forest with 4 trees per-forest, the updated version each round is 4. So boosted_rounds is not an option. This is non-ideal, but right now there's not much we can do.

As another problem regarding refresh updater I mentioned offline with @RAMitchell , it should be fine as training continuation requires at least 1 serialization. The cache is cleared.

mli · 2020-02-12T08:11:42Z

Codecov Report

Merging #5220 into master will not change coverage by %.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #5220   +/-   ##
=======================================
  Coverage   83.76%   83.76%           
=======================================
  Files          11       11           
  Lines        2409     2409           
=======================================
  Hits         2018     2018           
  Misses        391      391

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 29eeea7...4fcc43e. Read the comment docs.

RAMitchell

LGTM.

Looking through the predict function it seems like there are about four copies between generating the predictions and reaching the end user. I think we need to look at optimising this process. We can look at this when considering prediction to device memory.

The other thing I think can be improved with the predict API is passing in a bunch of bools to change behaviour. I think there should be separate functions for separate functionality.

trivialfis · 2020-02-14T05:00:14Z

The copying is annoying. I don't have any thoughts on to get rid of them yet.

RAMitchell reviewed Jan 22, 2020

View reviewed changes

trivialfis force-pushed the prediction-cache branch from 04d4dc5 to 5e1d8c0 Compare January 28, 2020 19:36

trivialfis mentioned this pull request Jan 30, 2020

Remove predict instance. #5249

Closed

trivialfis force-pushed the prediction-cache branch 2 times, most recently from 63d001d to def3135 Compare February 3, 2020 19:04

trivialfis force-pushed the prediction-cache branch from def3135 to 4f31bfb Compare February 9, 2020 09:46

trivialfis added the status: WIP label Feb 9, 2020

trivialfis force-pushed the prediction-cache branch from a6f14ff to 0d544e7 Compare February 10, 2020 09:47

trivialfis removed the status: WIP label Feb 10, 2020

trivialfis marked this pull request as ready for review February 10, 2020 18:52

trivialfis changed the title ~~[WIP] Move prediction cache to Learner.~~ Move prediction cache to Learner. Feb 10, 2020

trivialfis mentioned this pull request Feb 10, 2020

[native] [jvm-packages] allow rebuild prediction cache when it is not initialized #5272

Closed

trivialfis requested a review from RAMitchell February 10, 2020 19:11

trivialfis commented Feb 10, 2020

View reviewed changes

include/xgboost/predictor.h Outdated Show resolved Hide resolved

include/xgboost/predictor.h Outdated Show resolved Hide resolved

RAMitchell reviewed Feb 10, 2020

View reviewed changes

trivialfis added a commit to trivialfis/xgboost that referenced this pull request Feb 11, 2020

Pass shared pointer instead of raw pointer to Learner.

0ed7d65

Extracted from dmlc#5220 .

trivialfis mentioned this pull request Feb 11, 2020

Pass shared pointer instead of raw pointer to Learner. #5302

Merged

trivialfis added a commit that referenced this pull request Feb 11, 2020

Pass shared pointer instead of raw pointer to Learner. (#5302)

29eeea7

Extracted from #5220 .

trivialfis force-pushed the prediction-cache branch from 545726d to cbd5a3c Compare February 11, 2020 06:56

trivialfis commented Feb 11, 2020

View reviewed changes

trivialfis added 2 commits February 11, 2020 23:52

Revert C API change.

2bce357

trivialfis force-pushed the prediction-cache branch from bea3258 to 2bce357 Compare February 11, 2020 15:57

RAMitchell reviewed Feb 12, 2020

View reviewed changes

Prediction vector per matrix.

4fcc43e

RAMitchell approved these changes Feb 13, 2020

View reviewed changes

trivialfis merged commit c35cdec into dmlc:master Feb 14, 2020

trivialfis deleted the prediction-cache branch February 14, 2020 05:04

hcho3 mentioned this pull request Feb 21, 2020

[Roadmap] 1.1.0 Roadmap #5337

Closed

12 tasks

lock bot locked as resolved and limited conversation to collaborators May 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move prediction cache to Learner. #5220

Move prediction cache to Learner. #5220

trivialfis commented Jan 22, 2020 •

edited

Loading

trivialfis commented Jan 22, 2020

RAMitchell left a comment

trivialfis commented Feb 10, 2020 •

edited

Loading

RAMitchell left a comment

RAMitchell Feb 10, 2020

trivialfis Feb 11, 2020 •

edited

Loading

RAMitchell Feb 11, 2020

RAMitchell Feb 10, 2020

trivialfis commented Feb 11, 2020

trivialfis commented Feb 11, 2020

RAMitchell commented Feb 11, 2020

trivialfis Feb 11, 2020 •

edited

Loading

trivialfis commented Feb 11, 2020

RAMitchell commented Feb 11, 2020

trivialfis commented Feb 11, 2020

trivialfis commented Feb 11, 2020

RAMitchell left a comment

RAMitchell Feb 11, 2020

trivialfis commented Feb 12, 2020 •

edited

Loading

trivialfis commented Feb 12, 2020 •

edited

Loading

mli commented Feb 12, 2020

RAMitchell left a comment

trivialfis commented Feb 14, 2020

Move prediction cache to Learner. #5220

Move prediction cache to Learner. #5220

Conversation

trivialfis commented Jan 22, 2020 • edited Loading

Clean-ups

Changes

Risks

Advantages

Related issues and PRs:

PRs:

Issues:

trivialfis commented Jan 22, 2020

RAMitchell left a comment

Choose a reason for hiding this comment

trivialfis commented Feb 10, 2020 • edited Loading

RAMitchell left a comment

Choose a reason for hiding this comment

RAMitchell Feb 10, 2020

Choose a reason for hiding this comment

trivialfis Feb 11, 2020 • edited Loading

Choose a reason for hiding this comment

RAMitchell Feb 11, 2020

Choose a reason for hiding this comment

RAMitchell Feb 10, 2020

Choose a reason for hiding this comment

trivialfis commented Feb 11, 2020

trivialfis commented Feb 11, 2020

RAMitchell commented Feb 11, 2020

trivialfis Feb 11, 2020 • edited Loading

Choose a reason for hiding this comment

trivialfis commented Feb 11, 2020

RAMitchell commented Feb 11, 2020

trivialfis commented Feb 11, 2020

trivialfis commented Feb 11, 2020

RAMitchell left a comment

Choose a reason for hiding this comment

RAMitchell Feb 11, 2020

Choose a reason for hiding this comment

trivialfis commented Feb 12, 2020 • edited Loading

trivialfis commented Feb 12, 2020 • edited Loading

mli commented Feb 12, 2020

Codecov Report

RAMitchell left a comment

Choose a reason for hiding this comment

trivialfis commented Feb 14, 2020

trivialfis commented Jan 22, 2020 •

edited

Loading

trivialfis commented Feb 10, 2020 •

edited

Loading

trivialfis Feb 11, 2020 •

edited

Loading

trivialfis Feb 11, 2020 •

edited

Loading

trivialfis commented Feb 12, 2020 •

edited

Loading

trivialfis commented Feb 12, 2020 •

edited

Loading