Split up `LearnerImpl`. #5350

trivialfis · 2020-02-23T08:23:47Z

This PR splits up LearnerImpl into 3 different components. Starting from basic configuration to model IO and lastly the actual learner that performs training/prediction . This is an attempt to modularize it so we can shrink it down in the future.

For instance, some configurations are no longer needed in current state of XGBoost, like the objective function configuration should be simplified. Binary model IO is really difficult to work with and should be reduced to absolute minimum. This PR splits the monolithic LearnerImpl so we can look into individual group of features more easily.

trivialfis · 2020-02-23T09:10:04Z

ping @hcho3 . No functionality change, only refactoring.

mli · 2020-02-23T09:32:51Z

Codecov Report

Merging #5350 into master will not change coverage by %.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #5350   +/-   ##
=======================================
  Coverage   83.75%   83.75%           
=======================================
  Files          11       11           
  Lines        2413     2413           
=======================================
  Hits         2021     2021           
  Misses        392      392

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 655cf17...6439621. Read the comment docs.

RAMitchell

Consider composition over inheritance: https://en.wikipedia.org/wiki/Composition_over_inheritance

I think it makes much more sense in this case.

trivialfis · 2020-02-28T12:18:26Z

@RAMitchell I think direct inheritance is more suitable here. Following is my reasoning:

We only have 1 Learner class, there's no need to think about code reuse hence no need for composition.
Multiple inheritance is more complicated.
We need to share states, like training parameter, model parameter, indicator for whether do we need configuration etc. These states have a clear linear chain: model(gbm, obj, mparam) -> configuration -> IO -> actual functionality. The lower class in the chain doesn't access states in higher class.

hcho3 · 2020-03-11T16:56:52Z

@trivialfis Do you want me to review this?

trivialfis · 2020-03-11T17:13:16Z

@hcho3 Yes please. Another version is #5404 . This one uses inheritance while the other one uses composition. Me and @RAMitchell didn't decide which one is better.

trivialfis · 2020-03-11T18:31:09Z

@hcho3 I have been seeing non-deterministic parser error on CI and it's becoming quite often.

In https://xgboost-ci.net/blue/organizations/jenkins/xgboost/detail/PR-5350/2/pipeline it reports a weird symbol as delimiter, another time I saw this issue on CI it said the string gbtree is a csv delimiter. I can't reproduce it on my machine. Do you have any idea? It might be some thread safety issue in the static storage of dmlc parameter, but so far I can't think of anything more concrete.

[2020-03-11T17:30:40.571Z] 
[2020-03-11T17:30:40.571Z] ret = -1
[2020-03-11T17:30:40.571Z] 
[2020-03-11T17:30:40.571Z]     def _check_call(ret):
[2020-03-11T17:30:40.571Z]         """Check the return value of C API call
[2020-03-11T17:30:40.571Z]     
[2020-03-11T17:30:40.571Z]         This function will raise exception when error occurs.
[2020-03-11T17:30:40.571Z]         Wrap every API call with this function
[2020-03-11T17:30:40.571Z]     
[2020-03-11T17:30:40.571Z]         Parameters
[2020-03-11T17:30:40.571Z]         ----------
[2020-03-11T17:30:40.571Z]         ret : int
[2020-03-11T17:30:40.571Z]             return value from API calls
[2020-03-11T17:30:40.571Z]         """
[2020-03-11T17:30:40.571Z]         if ret != 0:
[2020-03-11T17:30:40.571Z] >           raise XGBoostError(py_str(_LIB.XGBGetLastError()))
[2020-03-11T17:30:40.571Z] E           xgboost.core.XGBoostError: [17:25:58] /workspace/src/data/data.cc:379: Encountered parser error:
[2020-03-11T17:30:40.571Z] E           [17:25:58] /workspace/dmlc-core/src/data/csv_parser.h:126: Delimiter '�
[2020-03-11T17:30:40.571Z] E           Stack trace:
[2020-03-11T17:30:40.571Z] E             [bt] (0) /home/ubuntu/.local/lib/python3.7/site-packages/xgboost/lib/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x54) [0x7f548cb1a944]
[2020-03-11T17:30:40.571Z] E             [bt] (1) /home/ubuntu/.local/lib/python3.7/site-packages/xgboost/lib/libxgboost.so(xgboost::DMatrix::Load(std::string const&, bool, bool, std::string const&, unsigned long)+0x102c) [0x7f548cb6798c]
[2020-03-11T17:30:40.571Z] E             [bt] (2) /home/ubuntu/.local/lib/python3.7/site-packages/xgboost/lib/libxgboost.so(XGDMatrixCreateFromFile+0xc7) [0x7f548cb06eb7]
[2020-03-11T17:30:40.571Z] E             [bt] (3) /opt/python/lib/python3.7/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x7f54ae770ec0]
[2020-03-11T17:30:40.571Z] E             [bt] (4) /opt/python/lib/python3.7/lib-dynload/../../libffi.so.6(ffi_call+0x22d) [0x7f54ae77087d]
[2020-03-11T17:30:40.571Z] E             [bt] (5) /opt/python/lib/python3.7/lib-dynload/_ctypes.cpython-37m-x86_64-linux-gnu.so(_ctypes_callproc+0x2ce) [0x7f54ae986eee]
[2020-03-11T17:30:40.571Z] E             [bt] (6) /opt/python/lib/python3.7/lib-dynload/_ctypes.cpython-37m-x86_64-linux-gnu.so(+0x13924) [0x7f54ae987924]
[2020-03-11T17:30:40.571Z] E             [bt] (7) /opt/python/bin/python(_PyObject_FastCallKeywords+0x4ab) [0x5594c3b8665b]
[2020-03-11T17:30:40.571Z] E             [bt] (8) /opt/python/bin/python(_PyEval_EvalFrameDefault+0x532e) [0x5594c3be240e]
[2020-03-11T17:30:40.571Z] 
[2020-03-11T17:30:40.571Z] /home/ubuntu/.local/lib/python3.7/site-packages/xgboost/core.py:189: XGBoostError

trivialfis · 2020-03-11T18:37:37Z

Another error trace: https://xgboost-ci.net/blue/rest/organizations/jenkins/pipelines/xgboost/branches/PR-5389/runs/12/nodes/211/steps/254/log/?start=0

hcho3 · 2020-03-11T18:51:23Z

@trivialfis I have no idea. We can try running tests with thread sanitizer enabled and see if there’s any thread safety issue. Right now, our CI does not run any tests with thread sanitizer.

trivialfis · 2020-03-11T19:08:15Z

@hcho3 tsan is not really useful in the present of openmp. Multiple threads writing to a same std::vector is considered a race (and many other errors I haven't really look into). But right now our non-omp build is broken. I will try to fix it later.

codecov-io · 2020-03-11T22:30:53Z

Codecov Report

Merging #5350 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #5350   +/-   ##
=======================================
  Coverage   84.07%   84.07%           
=======================================
  Files          11       11           
  Lines        2411     2411           
=======================================
  Hits         2027     2027           
  Misses        384      384

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7a99f8f...143593b. Read the comment docs.

RAMitchell reviewed Feb 24, 2020

View reviewed changes

trivialfis mentioned this pull request Mar 10, 2020

Split learner with composition. #5404

Closed

Split up configuration.

143593b

trivialfis force-pushed the split-learner branch from 6439621 to 143593b Compare March 11, 2020 17:04

trivialfis requested a review from hcho3 March 11, 2020 23:53

hcho3 approved these changes Mar 12, 2020

View reviewed changes

trivialfis merged commit 45a97dd into dmlc:master Mar 12, 2020

trivialfis deleted the split-learner branch March 12, 2020 08:30

trivialfis mentioned this pull request Apr 21, 2020

[Roadmap] 1.1.0 Roadmap #5337

Closed

12 tasks

lock bot locked as resolved and limited conversation to collaborators Jun 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split up `LearnerImpl`. #5350

Split up `LearnerImpl`. #5350

trivialfis commented Feb 23, 2020

trivialfis commented Feb 23, 2020

mli commented Feb 23, 2020

RAMitchell left a comment

trivialfis commented Feb 28, 2020 •

edited

Loading

hcho3 commented Mar 11, 2020

trivialfis commented Mar 11, 2020

trivialfis commented Mar 11, 2020 •

edited

Loading

trivialfis commented Mar 11, 2020

hcho3 commented Mar 11, 2020

trivialfis commented Mar 11, 2020 •

edited

Loading

codecov-io commented Mar 11, 2020

Split up LearnerImpl. #5350

Split up LearnerImpl. #5350

Conversation

trivialfis commented Feb 23, 2020

trivialfis commented Feb 23, 2020

mli commented Feb 23, 2020

Codecov Report

RAMitchell left a comment

Choose a reason for hiding this comment

trivialfis commented Feb 28, 2020 • edited Loading

hcho3 commented Mar 11, 2020

trivialfis commented Mar 11, 2020

trivialfis commented Mar 11, 2020 • edited Loading

trivialfis commented Mar 11, 2020

hcho3 commented Mar 11, 2020

trivialfis commented Mar 11, 2020 • edited Loading

codecov-io commented Mar 11, 2020

Codecov Report

Split up `LearnerImpl`. #5350

Split up `LearnerImpl`. #5350

trivialfis commented Feb 28, 2020 •

edited

Loading

trivialfis commented Mar 11, 2020 •

edited

Loading

trivialfis commented Mar 11, 2020 •

edited

Loading