Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CatBoost: unbiased boosting with categorical features

About

This paper presents the key algorithmic techniques behind CatBoost, a new gradient boosting toolkit. Their combination leads to CatBoost outperforming other publicly available boosting implementations in terms of quality on a variety of datasets. Two critical algorithmic advances introduced in CatBoost are the implementation of ordered boosting, a permutation-driven alternative to the classic algorithm, and an innovative algorithm for processing categorical features. Both techniques were created to fight a prediction shift caused by a special kind of target leakage present in all currently existing implementations of gradient boosting algorithms. In this paper, we provide a detailed analysis of this problem and demonstrate that proposed algorithms solve it effectively, leading to excellent empirical results.

Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, Andrey Gulin• 2017

Related benchmarks

TaskDatasetResultRank
Tabular Classification75 Tabular Classification Datasets (test)
Accuracy72.64
89
Tabular Regression52 Tabular Datasets (test)
NMAE0.158
85
Classification33 datasets missing rate <= 10% (test)
AUC86.42
65
Classification10 Datasets Missing rate > 10% (test)
AUC80.34
50
RegressionCA Housing
RMSE0.4303
45
ClassificationHI
Accuracy0.564
45
ClassificationHE
Accuracy38.46
38
Aggregate Tabular BenchmarkingAggregate
Avg Rank7.44
33
Binary ClassificationHiggs (test)
AUC84.5425
30
Tabular Data ClassificationUCI machine learning repository 21 datasets (test)
Median Rank14
29
Showing 10 of 107 rows
...

Other info

Follow for update