Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split up LearnerImpl. #5350

Merged
merged 1 commit into from
Mar 12, 2020
Merged

Split up LearnerImpl. #5350

merged 1 commit into from
Mar 12, 2020

Conversation

trivialfis
Copy link
Member

This PR splits up LearnerImpl into 3 different components. Starting from basic configuration to model IO and lastly the actual learner that performs training/prediction . This is an attempt to modularize it so we can shrink it down in the future.

For instance, some configurations are no longer needed in current state of XGBoost, like the objective function configuration should be simplified. Binary model IO is really difficult to work with and should be reduced to absolute minimum. This PR splits the monolithic LearnerImpl so we can look into individual group of features more easily.

@trivialfis
Copy link
Member Author

ping @hcho3 . No functionality change, only refactoring.

@mli
Copy link
Member

mli commented Feb 23, 2020

Codecov Report

Merging #5350 into master will not change coverage by %.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #5350   +/-   ##
=======================================
  Coverage   83.75%   83.75%           
=======================================
  Files          11       11           
  Lines        2413     2413           
=======================================
  Hits         2021     2021           
  Misses        392      392           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 655cf17...6439621. Read the comment docs.

Copy link
Member

@RAMitchell RAMitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider composition over inheritance: https://en.wikipedia.org/wiki/Composition_over_inheritance

I think it makes much more sense in this case.

@trivialfis
Copy link
Member Author

trivialfis commented Feb 28, 2020

@RAMitchell I think direct inheritance is more suitable here. Following is my reasoning:

  • We only have 1 Learner class, there's no need to think about code reuse hence no need for composition.
  • Multiple inheritance is more complicated.
  • We need to share states, like training parameter, model parameter, indicator for whether do we need configuration etc. These states have a clear linear chain: model(gbm, obj, mparam) -> configuration -> IO -> actual functionality. The lower class in the chain doesn't access states in higher class.

@hcho3
Copy link
Collaborator

hcho3 commented Mar 11, 2020

@trivialfis Do you want me to review this?

@trivialfis
Copy link
Member Author

@hcho3 Yes please. Another version is #5404 . This one uses inheritance while the other one uses composition. Me and @RAMitchell didn't decide which one is better.

@trivialfis
Copy link
Member Author

trivialfis commented Mar 11, 2020

@hcho3 I have been seeing non-deterministic parser error on CI and it's becoming quite often.

In https://xgboost-ci.net/blue/organizations/jenkins/xgboost/detail/PR-5350/2/pipeline it reports a weird symbol as delimiter, another time I saw this issue on CI it said the string gbtree is a csv delimiter. I can't reproduce it on my machine. Do you have any idea? It might be some thread safety issue in the static storage of dmlc parameter, but so far I can't think of anything more concrete.

[2020-03-11T17:30:40.571Z] 
[2020-03-11T17:30:40.571Z] ret = -1
[2020-03-11T17:30:40.571Z] 
[2020-03-11T17:30:40.571Z]     def _check_call(ret):
[2020-03-11T17:30:40.571Z]         """Check the return value of C API call
[2020-03-11T17:30:40.571Z]     
[2020-03-11T17:30:40.571Z]         This function will raise exception when error occurs.
[2020-03-11T17:30:40.571Z]         Wrap every API call with this function
[2020-03-11T17:30:40.571Z]     
[2020-03-11T17:30:40.571Z]         Parameters
[2020-03-11T17:30:40.571Z]         ----------
[2020-03-11T17:30:40.571Z]         ret : int
[2020-03-11T17:30:40.571Z]             return value from API calls
[2020-03-11T17:30:40.571Z]         """
[2020-03-11T17:30:40.571Z]         if ret != 0:
[2020-03-11T17:30:40.571Z] >           raise XGBoostError(py_str(_LIB.XGBGetLastError()))
[2020-03-11T17:30:40.571Z] E           xgboost.core.XGBoostError: [17:25:58] /workspace/src/data/data.cc:379: Encountered parser error:
[2020-03-11T17:30:40.571Z] E           [17:25:58] /workspace/dmlc-core/src/data/csv_parser.h:126: Delimiter '�
[2020-03-11T17:30:40.571Z] E           Stack trace:
[2020-03-11T17:30:40.571Z] E             [bt] (0) /home/ubuntu/.local/lib/python3.7/site-packages/xgboost/lib/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x54) [0x7f548cb1a944]
[2020-03-11T17:30:40.571Z] E             [bt] (1) /home/ubuntu/.local/lib/python3.7/site-packages/xgboost/lib/libxgboost.so(xgboost::DMatrix::Load(std::string const&, bool, bool, std::string const&, unsigned long)+0x102c) [0x7f548cb6798c]
[2020-03-11T17:30:40.571Z] E             [bt] (2) /home/ubuntu/.local/lib/python3.7/site-packages/xgboost/lib/libxgboost.so(XGDMatrixCreateFromFile+0xc7) [0x7f548cb06eb7]
[2020-03-11T17:30:40.571Z] E             [bt] (3) /opt/python/lib/python3.7/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x7f54ae770ec0]
[2020-03-11T17:30:40.571Z] E             [bt] (4) /opt/python/lib/python3.7/lib-dynload/../../libffi.so.6(ffi_call+0x22d) [0x7f54ae77087d]
[2020-03-11T17:30:40.571Z] E             [bt] (5) /opt/python/lib/python3.7/lib-dynload/_ctypes.cpython-37m-x86_64-linux-gnu.so(_ctypes_callproc+0x2ce) [0x7f54ae986eee]
[2020-03-11T17:30:40.571Z] E             [bt] (6) /opt/python/lib/python3.7/lib-dynload/_ctypes.cpython-37m-x86_64-linux-gnu.so(+0x13924) [0x7f54ae987924]
[2020-03-11T17:30:40.571Z] E             [bt] (7) /opt/python/bin/python(_PyObject_FastCallKeywords+0x4ab) [0x5594c3b8665b]
[2020-03-11T17:30:40.571Z] E             [bt] (8) /opt/python/bin/python(_PyEval_EvalFrameDefault+0x532e) [0x5594c3be240e]
[2020-03-11T17:30:40.571Z] 
[2020-03-11T17:30:40.571Z] /home/ubuntu/.local/lib/python3.7/site-packages/xgboost/core.py:189: XGBoostError

@trivialfis
Copy link
Member Author

@hcho3
Copy link
Collaborator

hcho3 commented Mar 11, 2020

@trivialfis I have no idea. We can try running tests with thread sanitizer enabled and see if there’s any thread safety issue. Right now, our CI does not run any tests with thread sanitizer.

@trivialfis
Copy link
Member Author

trivialfis commented Mar 11, 2020

@hcho3 tsan is not really useful in the present of openmp. Multiple threads writing to a same std::vector is considered a race (and many other errors I haven't really look into). But right now our non-omp build is broken. I will try to fix it later.

@codecov-io
Copy link

Codecov Report

Merging #5350 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #5350   +/-   ##
=======================================
  Coverage   84.07%   84.07%           
=======================================
  Files          11       11           
  Lines        2411     2411           
=======================================
  Hits         2027     2027           
  Misses        384      384

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7a99f8f...143593b. Read the comment docs.

@trivialfis trivialfis requested a review from hcho3 March 11, 2020 23:53
@trivialfis trivialfis merged commit 45a97dd into dmlc:master Mar 12, 2020
@trivialfis trivialfis deleted the split-learner branch March 12, 2020 08:30
@trivialfis trivialfis mentioned this pull request Apr 21, 2020
12 tasks
@lock lock bot locked as resolved and limited conversation to collaborators Jun 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants