CatBoost: unbiased boosting with categorical features

About

This paper presents the key algorithmic techniques behind CatBoost, a new gradient boosting toolkit. Their combination leads to CatBoost outperforming other publicly available boosting implementations in terms of quality on a variety of datasets. Two critical algorithmic advances introduced in CatBoost are the implementation of ordered boosting, a permutation-driven alternative to the classic algorithm, and an innovative algorithm for processing categorical features. Both techniques were created to fight a prediction shift caused by a special kind of target leakage present in all currently existing implementations of gradient boosting algorithms. In this paper, we provide a detailed analysis of this problem and demonstrate that proposed algorithms solve it effectively, leading to excellent empirical results.

Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, Andrey Gulin• 2017

Related benchmarks

Task	Dataset	Result
Image Classification	FashionMNIST (test)	Accuracy69.83	363
Classification	Lung	ACC91.57	96
Tabular Classification	75 Tabular Classification Datasets (test)	Accuracy72.64	89
Classification	Adult	Accuracy89.6	86
Tabular Regression	52 Tabular Datasets (test)	NMAE0.158	85
Classification	Diabetes	Accuracy80.71	80
Classification	TOX_171	Accuracy81.95	78
Classification	GLI_85	Accuracy84.71	78
Classification	Colon	Accuracy72.65	78
Binary Classification	TabArena	Elo Rating1.41e+3	74

Showing 10 of 339 rows

...

Other info

Follow for update

@wizwand_team Discord