Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Revisiting Nearest Neighbor for Tabular Data: A Deep Tabular Baseline Two Decades Later

About

The widespread enthusiasm for deep learning has recently expanded into the domain of tabular data. Recognizing that the advancement in deep tabular methods is often inspired by classical methods, e.g., integration of nearest neighbors into neural networks, we investigate whether these classical methods can be revitalized with modern techniques. We revisit a differentiable version of $K$-nearest neighbors (KNN) -- Neighbourhood Components Analysis (NCA) -- originally designed to learn a linear projection to capture semantic similarities between instances, and seek to gradually add modern deep learning techniques on top. Surprisingly, our implementation of NCA using SGD and without dimensionality reduction already achieves decent performance on tabular data, in contrast to the results of using existing toolboxes like scikit-learn. Further equipping NCA with deep representations and additional training stochasticity significantly enhances its capability, being on par with the leading tree-based method CatBoost and outperforming existing deep tabular models in both classification and regression tasks on 300 datasets. We conclude our paper by analyzing the factors behind these improvements, including loss functions, prediction strategies, and deep architectures. The code is available at https://github.com/qile2000/LAMDA-TALENT.

Han-Jia Ye, Huai-Hong Yin, De-Chuan Zhan, Wei-Lun Chao• 2024

Related benchmarks

TaskDatasetResultRank
ClassificationDiabetes
F1 Score86.3
33
Medical Image ClassificationCOVID-19
F1-Score85.9
20
Tabular LearningTabReD
AUC (HI)96.2
20
ClassificationKidney
AUROC95.8
19
RegressionTIT
Regression Metric1.063
19
Urinary tract infection (UTI) predictioneICU
AUROC0.931
19
ClassificationGGCM
AUROC88.1
19
ClassificationSSMI
AUROC0.692
19
RegressionOXF-PT
Regression Metric0.291
19
Sepsis PredictioneICU
AUROC97.5
19
Showing 10 of 20 rows

Other info

Follow for update