- training with external memory part 1 of 2 #4486

sriramch · 2019-05-22T17:24:31Z

this pr focuses on computing the quantiles using multiple gpus on a
dataset that uses the external cache capabilities
there will a follow-up pr soon after this that will support creation
of histogram indices on large dataset as well
both of these changes are required to support training with external memory
the sparse pages in dmatrix are taken in batches and the the cut matrices
are incrementally built
also snuck in some (perf) changes related to sketches aggregation amongst multiple
features across multiple sparse page batches. instead of aggregating the summary
inside each device and merged later, it is aggregated in-place when the device
is working on different rows but the same feature

- this pr focuses on computing the quantiles using multiple gpus on a dataset that uses the external cache capabilities - there will a follow-up pr soon after this that will support creation of histogram indices on large dataset as well - both of these changes are required to support training with external memory - the sparse pages in dmatrix are taken in batches and the the cut matrices are incrementally built - also snuck in some (perf) changes related to sketches aggregation amongst multiple features across multiple sparse page batches. instead of aggregating the summary inside each device and merged later, it is aggregated in-place when the device is working on different rows but the same feature

sriramch · 2019-05-22T17:25:43Z

@canonizer @RAMitchell @rongou - please review

this is very similar to #4448 and i have split it up per the earlier review request

sriramch · 2019-05-22T17:41:43Z

@hcho3 any reason why i do not see all the checks with this pr (typically there are ~ 10 of them)?

hcho3 · 2019-05-22T17:44:49Z

@sriramch Looks like GitHub webhook didn't get activated. Let me try to re-trigger

hcho3 · 2019-05-22T17:46:36Z

@sriramch See https://www.githubstatus.com/. Looks like there is an issue with notification delivery. We'll have to wait until it's fixed

sriramch · 2019-05-22T17:47:29Z

@sriramch See https://www.githubstatus.com/. Looks like there is an issue with notification delivery. We'll have to wait until it's fixed

thanks @hcho3 for looking and helping!

hcho3 · 2019-05-22T17:48:03Z

@sriramch For now, I've manually triggered all Jenkins tests (Linux/Win64).

sriramch · 2019-05-22T17:54:19Z

@sriramch For now, I've manually triggered all Jenkins tests (Linux/Win64).

@hcho3 thanks for your help - much appreciated!

hcho3 · 2019-05-22T18:01:17Z

@sriramch I'm having some issues with triggering Linux tests. Can we wait until the webhook issue is fixed?

sriramch · 2019-05-22T18:02:23Z

@sriramch I'm having some issues with triggering Linux tests. Can we wait until the webhook issue is fixed?

@hcho3 - no problem - we can wait!

codecov-io · 2019-05-22T21:48:57Z

Codecov Report

Merging #4486 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #4486   +/-   ##
=======================================
  Coverage   81.42%   81.42%           
=======================================
  Files           9        9           
  Lines        1626     1626           
=======================================
  Hits         1324     1324           
  Misses        302      302

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5a567ec...85afb2d. Read the comment docs.

canonizer · 2019-05-24T17:08:47Z

src/common/hist_util.cu

+ */
+struct SketchContainer {
+  std::vector<HistCutMatrix::WXQSketch> sketches_;  // NOLINT
+  std::vector<std::unique_ptr<std::mutex>> col_locks_; // NOLINT


Feel free to use std::vector<std::mutex>, unless there is a reason to use std::unique_ptr.

elements within a vector have to be copy/assign-able or move-able. a mutex is neither of those.

I think copying/assignment/moving only applies when resizing the vector. Creating a mutex vector of a fixed size should be possible.

I'm not a C++ expert here, however, so I may be wrong.

canonizer · 2019-05-24T17:11:09Z

src/common/hist_util.cu

+      n_rows_(row_end - row_begin), param_(std::move(param)), sketch_container_(sketch_container) {
+    }
+
+    inline size_t GetRowStride() {


Any reason to have a method here? If the purpose is to make row_stride_ accessible, consider making DeviceShard a struct.

what is the issue with having this accessor? i can consider making it a const member function, if any, as the state isn't getting mutated. deviceshard already delineates what is public and private and i do not want to loosen the accessibility restrictions without fully understanding why it was made a class in the first place.

canonizer · 2019-05-24T17:12:04Z

src/common/hist_util.cu

-      WXQSketch::LimitSizeLevel(row_batch.Size(), eps, &dummy_nlevel, &n_cuts_);
-      // double ncuts to be the same as the number of values
-      // in the temporary buffers of the sketches
-      n_cuts_ *= 2;


Is it possible to keep doubling n_cuts_?

canonizer · 2019-05-24T17:17:29Z

src/common/hist_util.cu

    }
  };

-  void Sketch(const SparsePage& batch, const MetaInfo& info,
-              HistCutMatrix* hmat, int gpu_batch_nrows) {
+  size_t SketchBatch(const GPUDistribution &dist, const SparsePage &batch,


The row stride can be definitely passed through a shard or class variable. No need to return a value here.

canonizer · 2019-05-24T17:18:42Z

src/common/hist_util.cu

-  void Sketch(const SparsePage& batch, const MetaInfo& info,
-              HistCutMatrix* hmat, int gpu_batch_nrows) {
+  size_t SketchBatch(const GPUDistribution &dist, const SparsePage &batch,
+                     const MetaInfo &info, SketchContainer *sketch_container) {


sketch_container should be a member of a class or shard.

canonizer · 2019-05-24T17:33:12Z

src/tree/updater_gpu_hist.cu

-    common::DeviceSketch(batch, *info_, param_, &hmat_, hist_maker_param_.gpu_batch_nrows);
+    // TODO(sriramch): The return value will be used when we add support for histogram
+    // index creation for multiple batches
+    (void)common::DeviceSketch(param_, hist_maker_param_.gpu_batch_nrows, dmat, &hmat_);


No need to cast the return value to void. Just ignoring it is fine.

i'll clean all these up in a follow-up pr for the histogram. i added a todo and did this just to remind myself of this. the void return casting is to just to notate for now that the return value is ignored

A TODO describing that you ignore the value is fine. No need to cast to void, ignoring return values in C/C++ is OK.

canonizer · 2019-05-24T17:34:01Z

tests/cpp/common/test_gpu_hist_util.cu

 namespace xgboost {
 namespace common {

-void TestDeviceSketch(const GPUSet& devices) {
+void TestDeviceSketch(const GPUSet& devices, bool use_external_memory = false) {


Do you need a default parameter value here? Wouldn't updating calls to TestDeviceSketch be easier?

i do not see an issue either way. i'll change it nonetheless.

canonizer · 2019-05-24T17:34:58Z

tests/cpp/common/test_gpu_hist_util.cu

+  std::shared_ptr<xgboost::DMatrix> *dmat = nullptr;
+
+  size_t num_cols = 1;
+  if (!use_external_memory) {


Consider if (use_external_memory) and swapping the branches.

canonizer · 2019-05-24T17:35:30Z

tests/cpp/common/test_gpu_hist_util.cu

  HistCutMatrix hmat_gpu;
-  DeviceSketch(batch, (*dmat)->Info(), p, &hmat_gpu, gpu_batch_nrows);
+  (void)DeviceSketch(p, gpu_batch_nrows, (*dmat).get(), &hmat_gpu);


No need to cast to void.

this is to indicate that the return value is ignored. i can do it c++ style if you want!

canonizer · 2019-05-24T17:36:17Z

tests/cpp/common/test_gpu_hist_util.cu

  HistCutMatrix hmat_gpu;
-  DeviceSketch(batch, (*dmat)->Info(), p, &hmat_gpu, gpu_batch_nrows);
+  (void)DeviceSketch(p, gpu_batch_nrows, (*dmat).get(), &hmat_gpu);


dmat->get()

canonizer · 2019-05-29T14:37:35Z

src/common/hist_util.cu

+ */
+struct SketchContainer {
+  std::vector<HistCutMatrix::WXQSketch> sketches_;  // NOLINT
+  std::vector<std::unique_ptr<std::mutex>> col_locks_; // NOLINT


I think copying/assignment/moving only applies when resizing the vector. Creating a mutex vector of a fixed size should be possible.

I'm not a C++ expert here, however, so I may be wrong.

canonizer · 2019-05-29T14:38:45Z

src/common/hist_util.cu

+    // Initialize Sketches for this dmatrix
+    sketches_.resize(info.num_col_);
+    col_locks_.resize(info.num_col_);
+#pragma omp parallel for schedule(static) if (info.num_col_ > 1000)


This number occurs in more than one place. Consider defining an constant and using it instead.

canonizer · 2019-05-29T14:47:41Z

src/common/hist_util.cu

  }

 private:
  std::vector<std::unique_ptr<DeviceShard>> shards_;
  const tree::TrainParam &param_;
  int gpu_batch_nrows_;
+  size_t row_stride_;
+  std::unique_ptr<SketchContainer> sketch_container_;
 };

 size_t DeviceSketch


Please document what value is being returned.

Alternatively, consider using a pointer parameter instead of returning a value.

canonizer · 2019-05-29T14:48:01Z

src/common/hist_util.cu

  }

-  /* Builds the sketches on the GPU */
+  /* Builds the sketches on the GPU for the dmatrix */


Please document the return value.

canonizer · 2019-05-29T14:49:52Z

tests/cpp/common/test_gpu_hist_util.cu

@@ -51,7 +51,7 @@ void TestDeviceSketch(const GPUSet& devices, bool use_external_memory = false) {

  // find the cuts on the GPU
  HistCutMatrix hmat_gpu;
-  (void)DeviceSketch(p, gpu_batch_nrows, (*dmat).get(), &hmat_gpu);
+  (void)DeviceSketch(p, gpu_batch_nrows, dmat->get(), &hmat_gpu);


Could you also check that the correct row stride is returned?

canonizer · 2019-05-29T14:53:27Z

src/tree/updater_gpu_hist.cu

-    common::DeviceSketch(batch, *info_, param_, &hmat_, hist_maker_param_.gpu_batch_nrows);
+    // TODO(sriramch): The return value will be used when we add support for histogram
+    // index creation for multiple batches
+    (void)common::DeviceSketch(param_, hist_maker_param_.gpu_batch_nrows, dmat, &hmat_);


A TODO describing that you ignore the value is fine. No need to cast to void, ignoring return values in C/C++ is OK.

sriramch · 2019-05-29T17:49:57Z

w.r.t. this:

#4486 (comment)

to accommodate this, we have to explicitly make the type non-copy-able/move-able/assign-able. i have made just that to prevent such accidental usage.

hcho3 · 2019-05-30T21:51:24Z

@sriramch So is the external memory feature now complete? That is, can I take today's master branch and train with GPU and external memory data?

sriramch · 2019-05-30T22:21:28Z

w.r.t. this:

#4486 (comment)

@hcho3 - not yet. as this pr states, it is only part 1 of 2. i'm working on the last part and hope to have the pr up soon.

unfortunately, the last commit #4478 broke the xgboost cli, thus hampering my testing.

hcho3 · 2019-05-30T22:27:39Z

@sriramch ~~It seems okay now: https://xgboost-ci.net/blue/organizations/jenkins/xgboost/detail/master/225/pipeline~~ Can you clarify how CLI is broken (sorry thought you were talking about CI)

sriramch · 2019-05-30T22:32:40Z

@hcho3 - what i meant was that if i include this commit in my branch, then the cli just sits and spins chewing a lot of memory on a large dataset that has the external cache enabled, before it can even create those sparse pages on disk. this seems like a clear regression. hence, i reverted this commit in my private branch. i did not refer to the jenkins build error. sorry for not being clear earlier.

if you are interested, the full set of external memory training changes are here

hcho3 · 2019-05-30T22:35:53Z

@sriramch I will need to look into the regression. Can you post a reproducible example? (your CLI config)

My guess is that #4478 might have mishandled # in the data path that's used to enable external memory. Sorry for my oversight.

hcho3 · 2019-05-30T22:39:56Z

@sriramch Actually, your CLI config would be sufficient for me to diagnose the issue.

sriramch · 2019-05-30T22:54:45Z

@hcho3 - please refer to this gist

yes, i think the '#' tokens are possibly ignored. is it possible to revert this for now?

hcho3 · 2019-05-30T22:57:54Z

@sriramch Indeed, your CLI config makes the problem very clear. I will resolve this issue very soon.

hcho3 closed this May 22, 2019

hcho3 reopened this May 22, 2019

- fix clang-tidy errors

85afb2d

Merge branch 'master' of https://github.com/dmlc/xgboost into quantile

1607bd9

canonizer suggested changes May 24, 2019

View reviewed changes

sriramch added 3 commits May 24, 2019 12:36

Merge branch 'master' of https://github.com/dmlc/xgboost into quantile

50b0786

- address review comments

494a4ee

- remove the default param

86e986c

RAMitchell approved these changes May 28, 2019

View reviewed changes

canonizer reviewed May 29, 2019

View reviewed changes

Merge branch 'master' of https://github.com/dmlc/xgboost into quantile

7f2c580

sriramch added 2 commits May 29, 2019 10:50

Merge branch 'master' of https://github.com/dmlc/xgboost into quantile

abf6f92

- fix clang/lint issues

e440df2

RAMitchell merged commit fed665a into dmlc:master May 29, 2019

hcho3 mentioned this pull request May 31, 2019

Fix C++11 config parser #4521

Merged

sriramch deleted the quantile branch June 27, 2019 18:55

lock bot locked as resolved and limited conversation to collaborators Sep 25, 2019

- training with external memory part 1 of 2 #4486

- training with external memory part 1 of 2 #4486

Conversation

sriramch commented May 22, 2019

sriramch commented May 22, 2019

sriramch commented May 22, 2019

hcho3 commented May 22, 2019

hcho3 commented May 22, 2019

sriramch commented May 22, 2019

hcho3 commented May 22, 2019

sriramch commented May 22, 2019

hcho3 commented May 22, 2019

sriramch commented May 22, 2019

codecov-io commented May 22, 2019 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sriramch commented May 29, 2019

hcho3 commented May 30, 2019

sriramch commented May 30, 2019

hcho3 commented May 30, 2019 • edited Loading

sriramch commented May 30, 2019

hcho3 commented May 30, 2019 • edited Loading

hcho3 commented May 30, 2019

sriramch commented May 30, 2019

hcho3 commented May 30, 2019

codecov-io commented May 22, 2019 •

edited

Loading

hcho3 commented May 30, 2019 •

edited

Loading

hcho3 commented May 30, 2019 •

edited

Loading