SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training

About

Tabular data underpins numerous high-impact applications of machine learning from fraud detection to genomics and healthcare. Classical approaches to solving tabular problems, such as gradient boosting and random forests, are widely used by practitioners. However, recent deep learning methods have achieved a degree of performance competitive with popular techniques. We devise a hybrid deep learning approach to solving tabular data problems. Our method, SAINT, performs attention over both rows and columns, and it includes an enhanced embedding method. We also study a new contrastive self-supervised pre-training method for use when labels are scarce. SAINT consistently improves performance over previous deep learning methods, and it even outperforms gradient boosting methods, including XGBoost, CatBoost, and LightGBM, on average over a variety of benchmark tasks.

Gowthami Somepalli, Micah Goldblum, Avi Schwarzschild, C. Bayan Bruss, Tom Goldstein• 2021

Related benchmarks

Task	Dataset	Result
Image Classification	FashionMNIST (test)	Accuracy62.15	363
Classification	Lung	ACC78	96
Classification	Adult	Accuracy82.6	86
Classification	TOX_171	Accuracy75.1	78
Classification	GLI_85	Accuracy78.56	78
Classification	Colon	Accuracy67.6	78
Classification	SMK_CAN_187	Accuracy50.34	72
Classification	ALLAML	Accuracy52.92	72
Classification	HDLSS Datasets Summary	Average Rank27.5	66
Classification	Prostate_GE	Accuracy61.99	64

Showing 10 of 90 rows

...

Other info

Follow for update

@wizwand_team Discord