[Discussion] efficiency improvements #2791

guolinke · 2020-02-21T04:02:32Z

This is to call the efficiency improvement for LightGBM, include but not limited to:

Tree learning algorithm acceleration
I/O related, like dataset loading and model saving
Inference speed improvement
Better GPU acceleration
Distributed learning improvement

If you have any ideas, you can discuss with us here, and open the corresponding issues/pull requests if needed.

StrikerRUS · 2020-03-23T14:24:21Z

MaxHalford · 2020-04-11T14:52:53Z

I have a suggestion but I'm not sure it's in-scope. It concerns improving efficiency during the prediction speed of trees. This in turn could improve the overall training time because gradient boosting requires asking the weak models to predict gradients.

I'm not familiar with the tree implementation in LightGBM. The intuitive way of having two pointers that refer to the left and right childs doesn't seem to be the best format for evaluating a tree. Basically, it seems that there are two alternative ways:

Compile the tree to a sequence of ifs and elses. AFAIK this is what treelite allows you to do.
Convert a tree into a heap, which means the nodes are contiguous in memory. Each branch contains two integers that indicate where to go next in the heap. This is quite well explained in this Wikipedia article section.

I thought about suggesting this after having read this great blog post by Andrew Tulloch, which includes some in-depth benchmarks. I don't know if this is something LightGBM already integrates. Also I'm not well-versed in C++ so I can't help on that side. However these solutions seem rather simple to implement.

julioasotodv · 2020-04-27T19:36:16Z

Apart from the possibility to use CUDA instead of OpenCL for GPU trees (even though it would reduce compatibility), xgboost has made a lot of progress using Dask for distributed training (in fact, I believe that other experimental distributed training techniques they had are now deprecated).

I believe someone started a project fro smoother integration between LightGBM and Dask, but I am unable to find it right now...

As for GPU, I think that guys at Rapids (Nvidia) are looking forward to developing more CUDA implementations for different ML algorithms (just like they have done with xgboost, as they have rewritten the whole gpu tree learner in the library). Maybe contacting them could be a good idea.

jameslamb · 2020-04-27T19:41:32Z

dask-lightgbm is here: https://github.com/dask/dask-lightgbm

It's not something we officially maintain or contribute to yet, but I agree it could be a useful thing for us to help with.

guolinke · 2020-08-06T01:07:15Z

@MaxHalford Thanks very much. Actually, LightGBM is able to convert tree model to c++ codes.
However, the tree in lightgbm is not the balance tree, so it is not trivial to optimized by the methods your post.
But this is indeed a good potential direction.
BTW, we will implement the symmetric tree (like in CatBoost) in the future (by @shiyu1994 ), which will be much faster in prediction time, and more robust for the small dataset.

MaxHalford · 2020-08-06T01:11:29Z

@guolinke I think the trick still works. You just need to maintain two arrays that indicate the left and right child of each node. That way you can "walk" to the next node with something like:

if x[features[node]] < thresholds[node]:
    node = left[node]
else:
    node = right[node]

guolinke · 2020-08-06T01:21:45Z

Thank you @julioasotodv .
Actually, the CUDA implementation in existing tools already integrates most "Histogram tricks" from LightGBM.
So it is more like the "reinvert the wheel" if we just simply re-implement it again.
We plan to revisit the existing implementations and estimate the potential improvements. If there are still some rooms, we will develop the new version of LightGBM GPU.

guolinke · 2020-08-06T01:23:04Z

@MaxHalford ,
The current prediction logic is very similar to that fashion, it is

LightGBM/include/LightGBM/tree.h

Lines 572 to 584 in 9b26373

    
           inline int Tree::GetLeaf(const double* feature_values) const { 
        
             int node = 0; 
        
             if (num_cat_ > 0) { 
        
               while (node >= 0) { 
        
                 node = Decision(feature_values[split_feature_[node]], node); 
        
               } 
        
             } else { 
        
               while (node >= 0) { 
        
                 node = NumericalDecision(feature_values[split_feature_[node]], node); 
        
               } 
        
             } 
        
             return ~node; 
        
           }

.

MaxHalford · 2020-09-06T11:09:52Z

Cheers @guolinke. I don't believe there's much improvement to bring from that side then.

How about the method that is discusses in this scikit-learn issue? Is it already implemented?

guolinke · 2020-09-06T16:32:30Z

@MaxHalford i quick through the PR. I think it is related to python-side histogram implementation, and our cpp implementation is already optimized for both speed and memory.

And we have LRU histogram pool to further reduce memory cost.

ogrisel · 2020-09-07T07:51:15Z

For the record, LightGBM is 1.5x to 2x faster than the master branch of scikit-learn in sequential mode at the moment. This difference reduces with many threads because of the diminishing returns of parallelism after 8 to 16 CPU threads but that depends on the shape of the dataset, number of nodes... I haven't investigated yet why LightGBM is so much faster in sequential mode yet. It wasn't the case 6 months ago if I remember correctly.

guolinke · 2020-09-07T11:18:24Z

@ogrisel the sequential mode is num_thread=1?
Did you test the LightGBM 3.0.0 ?
We have several speed optimizations in the 3.0.0 release. In particular, LightGBM also implement row-wise histogram algorithm (previous is column-wise), and may is faster with smaller #thread or smaller #feature.
LightGBM will test the col-wise and row-wise before the training, when given a data and parameter, and use the faster one automatically.

ogrisel · 2020-09-07T11:33:59Z

@ogrisel the sequential mode is num_thread=1?

I just set the OMP_NUM_THREADS=1 environment variable when running my benchmark script. I suppose this is equivalent.

Did you test the LightGBM 3.0.0 ?

Yes.

We have several speed optimizations in the 3.0.0 release. In particular, LightGBM also implement row-wise histogram algorithm (previous is column-wise), and may is faster with smaller #thread or smaller #feature.

Ah thanks that's probably it then. Indeed the Higgs boson benchmark I used has not that many features.

jameslamb · 2020-09-08T05:02:07Z

I've just added a new efficiency-related feature: #3369

If anyone following along here has experience with Apache Arrow, please add any context you think could help on that issue, or let me know if I've made any mistakes.

jameslamb · 2020-10-26T17:12:16Z

I've started a discussion at dask/community#104 about the possibility of bringing dask-lightgbm into this project, similar to the direction xgboost has gone:

discussion: Migrate to XGBoost mainline repository dask/dask-xgboost#39
first PR: Add native support for Dask dmlc/xgboost#4473

@StrikerRUS @guolinke I've been working with Dask a lot and would be willing to take responsibility for maintenance work on this. I agree with the decisions reached in those two xgboost links above, that having a native integration with Dask that we maintain would lead to a better user experience.

StrikerRUS · 2020-10-27T23:53:16Z

@jameslamb Awesome proposal!
Unfortunately, I'm not familiar with dask, but my intuition was that dask is about scaling training up to multiple machines. Something like "for small data use your own laptop, for big data use dask on several machines, for huge data use Spark on several clusters". Please correct me if I'm wrong. Anyway, my main concern is that by adopting dask wrapper we will have to ensure its' quality by unit tests on out side. But we have no resources to run distributed tests. Our current MPI test just ensures that MPI version can be built, but all tests are executed on one machine.

jameslamb · 2020-10-28T00:08:24Z

Thanks!

but my intuition was that dask is about scaling training up to multiple machines. Something like "for small data use your own laptop, for big data use dask on several machines, for huge data use Spark on several clusters". Please correct me if I'm wrong

Dask is a thing that allows for horizontal scaling across multiple machines, but it's also an API for expressing Python work that could be parallelized (even if that parallelization is all on one machine).

So, for example, people often use Dask Dataframes as a way to work with larger-than-memory data frames on a single machine. You write code that closely follows the pandas API, and Dask takes care of moving data back and forth between disk and memory without you needing to think about it.

So if we follow that proposal, you'd be able to use a Dask-ified version of lightgbm models to train on Dask collections (Dask DataFrame, Dask Array), which would be beneficial even if not scaling out to multiple machines.

by adopting dask wrapper we will have to ensure its' quality by unit tests on out side. But we have no resources to run distributed tests. Our current MPI test just ensures that MPI version can be built, but all tests are executed on one machine.

I don't think we'd have to do this for this to be successful. The dask-lightgbm project already does not do this, and approximates it with docker-compose.

One of the main design principles of Dask is that it scales from your laptop to a distributed cluster with minimal code changes...that means that we can test against a dask.distributed.LocalCluster and have some confidence that it will work on a true multi-node cluster.

Issues that are specific to the multi-node setting might be raised by users and have to be addressed here with manual testing, but that is already the current state of LightGBM-on-Dask with dask-lightgbm, so I don't think it's a reason not to move forward with that proposal.

StrikerRUS · 2020-10-28T12:57:34Z

@jameslamb Thanks a lot for the detailed response!

.that means that we can test against a dask.distributed.LocalCluster and have some confidence that it will work on a true multi-node cluster.

OK, this is great!

jameslamb · 2020-10-28T17:03:25Z

Thanks! @guolinke what do you think?

If we move forward with this, I'll ask the dask-lightgbm maintainers to make a PR here with the main components, and then we I can generate some issues for what else would need to be done for testing, docs, etc.

guolinke · 2020-10-29T02:15:12Z

awesome! sounds good to me

ppaleja · 2021-01-26T20:30:22Z

Hi, I hope this is the right place to post this, but I was reading through the original LightGBM paper, and it mentions a greedy coloring scheme in which to do the (I think) one of the main optimizations that LightGBM, the feature bundling. I was wondering why a greedy method was chosen to do this instead of some other targeted approach (I think the paper mentions that bundling works well in the sparse matrix case and I believe there are coloring algorithms that specifically target sparse graphs and have more provable guarantees than a greedy method would). Sorry if this has already been asked, but has there been any work on using a different algorithm for this bundling process? I feel that this might be worth looking into, but I wasn't able to find in the code where the bundling process occurs (if it exists currently), to see if I could 'plug in' another algorithm and see if it would give a noticeable improvement.

lorentzenchr · 2024-02-20T07:14:39Z

We have several speed optimizations in the 3.0.0 release. In particular, LightGBM also implement row-wise histogram algorithm (previous is column-wise), and may is faster with smaller #thread or smaller #feature.

Is it possible to lay out the algorithm in this case of row-wise histogram computation (pointing to literature, add code comments, writing a blog post, ...)? I find it very difficult to follow the logic reading the code.

guolinke added feature request efficiency help wanted labels Feb 21, 2020

guolinke mentioned this issue Feb 21, 2020

Feature Requests & Voting Hub #2302

Open

guolinke removed the feature request label Feb 21, 2020

guolinke pinned this issue Feb 21, 2020

NicolasHug mentioned this issue Sep 7, 2020

Multicore scalability of the Histogram-based GBDT scikit-learn/scikit-learn#14306

Open

ogrisel mentioned this issue Sep 11, 2020

sklearn HistGBT szilard/GBM-perf#36

Open

jameslamb mentioned this issue Oct 29, 2020

improving LightGBM, XGBoost experience with Dask dask/community#104

Closed

nd7141 mentioned this issue Oct 27, 2021

Symmetric/Oblivious Trees in LightGBM #4728

Open

shiyu1994 unpinned this issue Feb 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discussion] efficiency improvements #2791

[Discussion] efficiency improvements #2791

guolinke commented Feb 21, 2020

StrikerRUS commented Mar 23, 2020

MaxHalford commented Apr 11, 2020 •

edited

Loading

julioasotodv commented Apr 27, 2020

jameslamb commented Apr 27, 2020

guolinke commented Aug 6, 2020

MaxHalford commented Aug 6, 2020

guolinke commented Aug 6, 2020

guolinke commented Aug 6, 2020

MaxHalford commented Sep 6, 2020

guolinke commented Sep 6, 2020 •

edited

Loading

ogrisel commented Sep 7, 2020 •

edited

Loading

guolinke commented Sep 7, 2020 •

edited

Loading

ogrisel commented Sep 7, 2020

jameslamb commented Sep 8, 2020

jameslamb commented Oct 26, 2020

StrikerRUS commented Oct 27, 2020

jameslamb commented Oct 28, 2020

StrikerRUS commented Oct 28, 2020

jameslamb commented Oct 28, 2020

guolinke commented Oct 29, 2020

ppaleja commented Jan 26, 2021

lorentzenchr commented Feb 20, 2024

[Discussion] efficiency improvements #2791

[Discussion] efficiency improvements #2791

Comments

guolinke commented Feb 21, 2020

StrikerRUS commented Mar 23, 2020

MaxHalford commented Apr 11, 2020 • edited Loading

julioasotodv commented Apr 27, 2020

jameslamb commented Apr 27, 2020

guolinke commented Aug 6, 2020

MaxHalford commented Aug 6, 2020

guolinke commented Aug 6, 2020

guolinke commented Aug 6, 2020

MaxHalford commented Sep 6, 2020

guolinke commented Sep 6, 2020 • edited Loading

ogrisel commented Sep 7, 2020 • edited Loading

guolinke commented Sep 7, 2020 • edited Loading

ogrisel commented Sep 7, 2020

jameslamb commented Sep 8, 2020

jameslamb commented Oct 26, 2020

StrikerRUS commented Oct 27, 2020

jameslamb commented Oct 28, 2020

StrikerRUS commented Oct 28, 2020

jameslamb commented Oct 28, 2020

guolinke commented Oct 29, 2020

ppaleja commented Jan 26, 2021

lorentzenchr commented Feb 20, 2024

MaxHalford commented Apr 11, 2020 •

edited

Loading

guolinke commented Sep 6, 2020 •

edited

Loading

ogrisel commented Sep 7, 2020 •

edited

Loading

guolinke commented Sep 7, 2020 •

edited

Loading