Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

- training with external memory part 1 of 2 #4486

Merged
merged 9 commits into from
May 29, 2019
Merged

Conversation

sriramch
Copy link
Contributor

  • this pr focuses on computing the quantiles using multiple gpus on a
    dataset that uses the external cache capabilities
  • there will a follow-up pr soon after this that will support creation
    of histogram indices on large dataset as well
  • both of these changes are required to support training with external memory
  • the sparse pages in dmatrix are taken in batches and the the cut matrices
    are incrementally built
  • also snuck in some (perf) changes related to sketches aggregation amongst multiple
    features across multiple sparse page batches. instead of aggregating the summary
    inside each device and merged later, it is aggregated in-place when the device
    is working on different rows but the same feature

   - this pr focuses on computing the quantiles using multiple gpus on a
     dataset that uses the external cache capabilities
   - there will a follow-up pr soon after this that will support creation
     of histogram indices on large dataset as well
   - both of these changes are required to support training with external memory
   - the sparse pages in dmatrix are taken in batches and the the cut matrices
     are incrementally built
   - also snuck in some (perf) changes related to sketches aggregation amongst multiple
     features across multiple sparse page batches. instead of aggregating the summary
     inside each device and merged later, it is aggregated in-place when the device
     is working on different rows but the same feature
@sriramch
Copy link
Contributor Author

@canonizer @RAMitchell @rongou - please review

this is very similar to #4448 and i have split it up per the earlier review request

@sriramch
Copy link
Contributor Author

@hcho3 any reason why i do not see all the checks with this pr (typically there are ~ 10 of them)?

@hcho3
Copy link
Collaborator

hcho3 commented May 22, 2019

@sriramch Looks like GitHub webhook didn't get activated. Let me try to re-trigger

@hcho3 hcho3 closed this May 22, 2019
@hcho3 hcho3 reopened this May 22, 2019
@hcho3
Copy link
Collaborator

hcho3 commented May 22, 2019

@sriramch See https://www.githubstatus.com/. Looks like there is an issue with notification delivery. We'll have to wait until it's fixed

@sriramch
Copy link
Contributor Author

@sriramch See https://www.githubstatus.com/. Looks like there is an issue with notification delivery. We'll have to wait until it's fixed

thanks @hcho3 for looking and helping!

@hcho3
Copy link
Collaborator

hcho3 commented May 22, 2019

@sriramch For now, I've manually triggered all Jenkins tests (Linux/Win64).

@sriramch
Copy link
Contributor Author

@sriramch For now, I've manually triggered all Jenkins tests (Linux/Win64).

@hcho3 thanks for your help - much appreciated!

@hcho3
Copy link
Collaborator

hcho3 commented May 22, 2019

@sriramch I'm having some issues with triggering Linux tests. Can we wait until the webhook issue is fixed?

@sriramch
Copy link
Contributor Author

@sriramch I'm having some issues with triggering Linux tests. Can we wait until the webhook issue is fixed?

@hcho3 - no problem - we can wait!

@codecov-io
Copy link

codecov-io commented May 22, 2019

Codecov Report

Merging #4486 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #4486   +/-   ##
=======================================
  Coverage   81.42%   81.42%           
=======================================
  Files           9        9           
  Lines        1626     1626           
=======================================
  Hits         1324     1324           
  Misses        302      302

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5a567ec...85afb2d. Read the comment docs.

*/
struct SketchContainer {
std::vector<HistCutMatrix::WXQSketch> sketches_; // NOLINT
std::vector<std::unique_ptr<std::mutex>> col_locks_; // NOLINT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to use std::vector<std::mutex>, unless there is a reason to use std::unique_ptr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elements within a vector have to be copy/assign-able or move-able. a mutex is neither of those.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think copying/assignment/moving only applies when resizing the vector. Creating a mutex vector of a fixed size should be possible.

I'm not a C++ expert here, however, so I may be wrong.

n_rows_(row_end - row_begin), param_(std::move(param)), sketch_container_(sketch_container) {
}

inline size_t GetRowStride() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to have a method here? If the purpose is to make row_stride_ accessible, consider making DeviceShard a struct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the issue with having this accessor? i can consider making it a const member function, if any, as the state isn't getting mutated. deviceshard already delineates what is public and private and i do not want to loosen the accessibility restrictions without fully understanding why it was made a class in the first place.

WXQSketch::LimitSizeLevel(row_batch.Size(), eps, &dummy_nlevel, &n_cuts_);
// double ncuts to be the same as the number of values
// in the temporary buffers of the sketches
n_cuts_ *= 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to keep doubling n_cuts_?

}
};

void Sketch(const SparsePage& batch, const MetaInfo& info,
HistCutMatrix* hmat, int gpu_batch_nrows) {
size_t SketchBatch(const GPUDistribution &dist, const SparsePage &batch,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The row stride can be definitely passed through a shard or class variable. No need to return a value here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

void Sketch(const SparsePage& batch, const MetaInfo& info,
HistCutMatrix* hmat, int gpu_batch_nrows) {
size_t SketchBatch(const GPUDistribution &dist, const SparsePage &batch,
const MetaInfo &info, SketchContainer *sketch_container) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sketch_container should be a member of a class or shard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

common::DeviceSketch(batch, *info_, param_, &hmat_, hist_maker_param_.gpu_batch_nrows);
// TODO(sriramch): The return value will be used when we add support for histogram
// index creation for multiple batches
(void)common::DeviceSketch(param_, hist_maker_param_.gpu_batch_nrows, dmat, &hmat_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to cast the return value to void. Just ignoring it is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll clean all these up in a follow-up pr for the histogram. i added a todo and did this just to remind myself of this. the void return casting is to just to notate for now that the return value is ignored

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A TODO describing that you ignore the value is fine. No need to cast to void, ignoring return values in C/C++ is OK.

namespace xgboost {
namespace common {

void TestDeviceSketch(const GPUSet& devices) {
void TestDeviceSketch(const GPUSet& devices, bool use_external_memory = false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need a default parameter value here? Wouldn't updating calls to TestDeviceSketch be easier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i do not see an issue either way. i'll change it nonetheless.

std::shared_ptr<xgboost::DMatrix> *dmat = nullptr;

size_t num_cols = 1;
if (!use_external_memory) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider if (use_external_memory) and swapping the branches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

HistCutMatrix hmat_gpu;
DeviceSketch(batch, (*dmat)->Info(), p, &hmat_gpu, gpu_batch_nrows);
(void)DeviceSketch(p, gpu_batch_nrows, (*dmat).get(), &hmat_gpu);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to cast to void.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is to indicate that the return value is ignored. i can do it c++ style if you want!

HistCutMatrix hmat_gpu;
DeviceSketch(batch, (*dmat)->Info(), p, &hmat_gpu, gpu_batch_nrows);
(void)DeviceSketch(p, gpu_batch_nrows, (*dmat).get(), &hmat_gpu);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dmat->get()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do...

*/
struct SketchContainer {
std::vector<HistCutMatrix::WXQSketch> sketches_; // NOLINT
std::vector<std::unique_ptr<std::mutex>> col_locks_; // NOLINT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think copying/assignment/moving only applies when resizing the vector. Creating a mutex vector of a fixed size should be possible.

I'm not a C++ expert here, however, so I may be wrong.

// Initialize Sketches for this dmatrix
sketches_.resize(info.num_col_);
col_locks_.resize(info.num_col_);
#pragma omp parallel for schedule(static) if (info.num_col_ > 1000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This number occurs in more than one place. Consider defining an constant and using it instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

private:
std::vector<std::unique_ptr<DeviceShard>> shards_;
const tree::TrainParam &param_;
int gpu_batch_nrows_;
size_t row_stride_;
std::unique_ptr<SketchContainer> sketch_container_;
};

size_t DeviceSketch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document what value is being returned.

Alternatively, consider using a pointer parameter instead of returning a value.

}

/* Builds the sketches on the GPU */
/* Builds the sketches on the GPU for the dmatrix */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document the return value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -51,7 +51,7 @@ void TestDeviceSketch(const GPUSet& devices, bool use_external_memory = false) {

// find the cuts on the GPU
HistCutMatrix hmat_gpu;
(void)DeviceSketch(p, gpu_batch_nrows, (*dmat).get(), &hmat_gpu);
(void)DeviceSketch(p, gpu_batch_nrows, dmat->get(), &hmat_gpu);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also check that the correct row stride is returned?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

common::DeviceSketch(batch, *info_, param_, &hmat_, hist_maker_param_.gpu_batch_nrows);
// TODO(sriramch): The return value will be used when we add support for histogram
// index creation for multiple batches
(void)common::DeviceSketch(param_, hist_maker_param_.gpu_batch_nrows, dmat, &hmat_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A TODO describing that you ignore the value is fine. No need to cast to void, ignoring return values in C/C++ is OK.

@sriramch
Copy link
Contributor Author

w.r.t. this:

#4486 (comment)

to accommodate this, we have to explicitly make the type non-copy-able/move-able/assign-able. i have made just that to prevent such accidental usage.

@RAMitchell RAMitchell merged commit fed665a into dmlc:master May 29, 2019
@hcho3
Copy link
Collaborator

hcho3 commented May 30, 2019

@sriramch So is the external memory feature now complete? That is, can I take today's master branch and train with GPU and external memory data?

@sriramch
Copy link
Contributor Author

w.r.t. this:

#4486 (comment)

@hcho3 - not yet. as this pr states, it is only part 1 of 2. i'm working on the last part and hope to have the pr up soon.

unfortunately, the last commit #4478 broke the xgboost cli, thus hampering my testing.

@hcho3
Copy link
Collaborator

hcho3 commented May 30, 2019

@sriramch It seems okay now: https://xgboost-ci.net/blue/organizations/jenkins/xgboost/detail/master/225/pipeline Can you clarify how CLI is broken (sorry thought you were talking about CI)

@sriramch
Copy link
Contributor Author

@hcho3 - what i meant was that if i include this commit in my branch, then the cli just sits and spins chewing a lot of memory on a large dataset that has the external cache enabled, before it can even create those sparse pages on disk. this seems like a clear regression. hence, i reverted this commit in my private branch. i did not refer to the jenkins build error. sorry for not being clear earlier.

if you are interested, the full set of external memory training changes are here

@hcho3
Copy link
Collaborator

hcho3 commented May 30, 2019

@sriramch I will need to look into the regression. Can you post a reproducible example? (your CLI config)

My guess is that #4478 might have mishandled # in the data path that's used to enable external memory. Sorry for my oversight.

@hcho3
Copy link
Collaborator

hcho3 commented May 30, 2019

@sriramch Actually, your CLI config would be sufficient for me to diagnose the issue.

@sriramch
Copy link
Contributor Author

@hcho3 - please refer to this gist

yes, i think the '#' tokens are possibly ignored. is it possible to revert this for now?

@hcho3
Copy link
Collaborator

hcho3 commented May 30, 2019

@sriramch Indeed, your CLI config makes the problem very clear. I will resolve this issue very soon.

@hcho3 hcho3 mentioned this pull request May 31, 2019
@sriramch sriramch deleted the quantile branch June 27, 2019 18:55
@lock lock bot locked as resolved and limited conversation to collaborators Sep 25, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants