Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CatBoost: unbiased boosting with categorical features

About

This paper presents the key algorithmic techniques behind CatBoost, a new gradient boosting toolkit. Their combination leads to CatBoost outperforming other publicly available boosting implementations in terms of quality on a variety of datasets. Two critical algorithmic advances introduced in CatBoost are the implementation of ordered boosting, a permutation-driven alternative to the classic algorithm, and an innovative algorithm for processing categorical features. Both techniques were created to fight a prediction shift caused by a special kind of target leakage present in all currently existing implementations of gradient boosting algorithms. In this paper, we provide a detailed analysis of this problem and demonstrate that proposed algorithms solve it effectively, leading to excellent empirical results.

Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, Andrey Gulin• 2017

Related benchmarks

TaskDatasetResultRank
Image ClassificationFashionMNIST (test)
Accuracy69.83
363
ClassificationLung
ACC91.57
96
Tabular Classification75 Tabular Classification Datasets (test)
Accuracy72.64
89
ClassificationAdult
Accuracy89.6
86
Tabular Regression52 Tabular Datasets (test)
NMAE0.158
85
ClassificationDiabetes
Accuracy80.71
80
ClassificationTOX_171
Accuracy81.95
78
ClassificationGLI_85
Accuracy84.71
78
ClassificationColon
Accuracy72.65
78
Binary ClassificationTabArena
Elo Rating1.41e+3
74
Showing 10 of 339 rows
...

Other info

Follow for update