Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SketchBoost: Fast Gradient Boosted Decision Tree for Multioutput Problems

About

Gradient Boosted Decision Tree (GBDT) is a widely-used machine learning algorithm that has been shown to achieve state-of-the-art results on many standard data science problems. We are interested in its application to multioutput problems when the output is highly multidimensional. Although there are highly effective GBDT implementations, their scalability to such problems is still unsatisfactory. In this paper, we propose novel methods aiming to accelerate the training process of GBDT in the multioutput scenario. The idea behind these methods lies in the approximate computation of a scoring function used to find the best split of decision trees. These methods are implemented in SketchBoost, which itself is integrated into our easily customizable Python-based GPU implementation of GBDT called Py-Boost. Our numerical study demonstrates that SketchBoost speeds up the training process of GBDT by up to over 40 times while achieving comparable or even better performance.

Leonid Iosipoi, Anton Vakhrushev• 2022

Related benchmarks

TaskDatasetResultRank
Multiclass ClassificationOtto 9 classes (test)
Cross-Entropy Loss0.4566
7
Multiclass ClassificationHelena 100 classes (test)
Cross-Entropy2.5673
7
Multiclass ClassificationDionis 355 classes (test)
Cross-Entropy Loss0.2848
7
Multilabel ClassificationMediamill 101 labels (test)
Cross-Entropy0.0743
7
Multilabel ClassificationMoA 206 labels (test)
Cross-Entropy Loss0.016
7
Multitask regressionSCM20D 16 tasks (test)
RMSE85.8061
7
Multiclass ClassificationSF-Crime 39 classes (test)
Cross-Entropy Loss2.2037
7
Multilabel ClassificationDelicious 983 labels (test)
Cross-Entropy0.0619
7
Multitask regressionRF1 8 tasks (test)
RMSE0.9056
7
Showing 9 of 9 rows

Other info

Code

Follow for update