Revisiting Deep Learning Models for Tabular Data

About

The existing literature on deep learning for tabular data proposes a wide range of novel architectures and reports competitive results on various datasets. However, the proposed models are usually not properly compared to each other and existing works often use different benchmarks and experiment protocols. As a result, it is unclear for both researchers and practitioners what models perform best. Additionally, the field still lacks effective baselines, that is, the easy-to-use models that provide competitive performance across different problems. In this work, we perform an overview of the main families of DL architectures for tabular data and raise the bar of baselines in tabular DL by identifying two simple and powerful deep architectures. The first one is a ResNet-like architecture which turns out to be a strong baseline that is often missing in prior works. The second model is our simple adaptation of the Transformer architecture for tabular data, which outperforms other solutions on most tasks. Both models are compared to many existing architectures on a diverse set of tasks under the same training and tuning protocols. We also compare the best DL models with Gradient Boosted Decision Trees and conclude that there is still no universally superior solution.

Yury Gorishniy, Ivan Rubachev, Valentin Khrulkov, Artem Babenko• 2021

Related benchmarks

Task	Dataset	Result
CTR Prediction	Criteo	AUC0.7849	309
Click-Through Rate Prediction	Industrial	AUC75.57	120
Classification	Lung	ACC67.3	96
Click-Through Rate Prediction	AutoML	AUC82.71	90
Tabular Classification	75 Tabular Classification Datasets (test)	Accuracy71.45	89
Classification	Adult	Accuracy83	86
Tabular Regression	52 Tabular Datasets (test)	NMAE0.354	85
Classification	TOX_171	Accuracy79.45	78
Classification	Colon	Accuracy69.25	78
Classification	GLI_85	Accuracy52.46	78

Showing 10 of 317 rows

...

Other info

Follow for update

@wizwand_team Discord