Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Tabular Data: Deep Learning is Not All You Need

About

A key element in solving real-life data science problems is selecting the types of models to use. Tree ensemble models (such as XGBoost) are usually recommended for classification and regression problems with tabular data. However, several deep learning models for tabular data have recently been proposed, claiming to outperform XGBoost for some use cases. This paper explores whether these deep models should be a recommended option for tabular data by rigorously comparing the new deep models to XGBoost on various datasets. In addition to systematically comparing their performance, we consider the tuning and computation they require. Our study shows that XGBoost outperforms these deep models across the datasets, including the datasets used in the papers that proposed the deep models. We also demonstrate that XGBoost requires much less tuning. On the positive side, we show that an ensemble of deep models and XGBoost performs better on these datasets than XGBoost alone.

Ravid Shwartz-Ziv, Amitai Armon• 2021

Related benchmarks

TaskDatasetResultRank
ClassificationLung
ACC63.5
96
ClassificationAdult
Accuracy69.2
86
ClassificationTOX_171
Accuracy71.85
78
ClassificationColon
Accuracy58.7
78
ClassificationGLI_85
Accuracy56.92
78
ClassificationALLAML
Accuracy60.42
72
ClassificationSMK_CAN_187
Accuracy40.92
72
ClassificationHDLSS Datasets Summary
Average Rank14.67
66
ClassificationProstate_GE
Accuracy70
64
ClassificationARCENE
Accuracy54.8
60
Showing 10 of 29 rows

Other info

Follow for update