A Closer Look at Deep Learning Methods on Tabular Datasets

About

Tabular data is prevalent across diverse domains in machine learning. With the rapid progress of deep tabular prediction methods, especially pretrained (foundation) models, there is a growing need to evaluate these methods systematically and to understand their behavior. We present an extensive study on TALENT, a collection of 300+ datasets spanning broad ranges of size, feature composition (numerical/categorical mixes), domains, and output types (binary, multi--class, regression). Our evaluation shows that ensembling benefits both tree-based and neural approaches. Traditional gradient-boosted trees remain very strong baselines, yet recent pretrained tabular models now match or surpass them on many tasks, narrowing--but not eliminating--the historical advantage of tree ensembles. Despite architectural diversity, top performance concentrates within a small subset of models, providing practical guidance for method selection. To explain these outcomes, we quantify dataset heterogeneity by learning from meta-features and early training dynamics to predict later validation behavior. This dynamics-aware analysis indicates that heterogeneity--such as the interplay of categorical and numerical attributes--largely determines which family of methods is favored. Finally, we introduce a two-level design beyond the 300 common-size datasets: a compact TALENT-tiny core (45 datasets) for rapid, reproducible evaluation, and a TALENT-extension suite targeting high-dimensional, many-class, and very large-scale settings for stress testing. In summary, these results offer actionable insights into the strengths, limitations, and future directions for improving deep tabular learning.

Han-Jia Ye, Si-Yang Liu, Hao-Run Cai, Qi-Le Zhou, De-Chuan Zhan• 2024

Related benchmarks

Task	Dataset	Result
Binary Classification	TALENT (test)	Top-1 Accuracy18.5	113
Binary Classification	TabArena	Elo Rating1.40e+3	74
Multiclass Classification	TabArena Lite	Elo Rating1.44e+3	63
Regression	TabArena Lite	Average Rank10.1	56
Classification	Covertype	--	52
Multiclass Classification	TALENT	SGMε32.6	42
Multiclass Classification	TALENT Multiclass (> 10 classes) Full (avg across datasets)	Rank6	31
Regression	TALENT 100 datasets	Rank7.79	28
Classification	higgs	ERR24.35	19
Classification	Miniboone	Error Rate4.88	11

Showing 10 of 27 rows

Other info

Follow for update

@wizwand_team Discord