TabTransformer: Tabular Data Modeling Using Contextual Embeddings

About

We propose TabTransformer, a novel deep tabular data modeling architecture for supervised and semi-supervised learning. The TabTransformer is built upon self-attention based Transformers. The Transformer layers transform the embeddings of categorical features into robust contextual embeddings to achieve higher prediction accuracy. Through extensive experiments on fifteen publicly available datasets, we show that the TabTransformer outperforms the state-of-the-art deep learning methods for tabular data by at least 1.0% on mean AUC, and matches the performance of tree-based ensemble models. Furthermore, we demonstrate that the contextual embeddings learned from TabTransformer are highly robust against both missing and noisy data features, and provide better interpretability. Lastly, for the semi-supervised setting we develop an unsupervised pre-training procedure to learn data-driven contextual embeddings, resulting in an average 2.1% AUC lift over the state-of-the-art methods.

Xin Huang, Ashish Khetan, Milan Cvitkovic, Zohar Karnin• 2020

Related benchmarks

Task	Dataset	Result
Classification	Lung	ACC21.01	96
Classification	Adult	Accuracy85	86
Classification	GLI_85	Accuracy47.77	78
Classification	Colon	Accuracy46.44	78
Classification	TOX_171	Accuracy23.66	78
Classification	SMK_CAN_187	Accuracy50.26	72
Classification	ALLAML	Accuracy53.7	72
Classification	HDLSS Datasets Summary	Average Rank38.75	66
Classification	Prostate_GE	Accuracy51.01	64
Classification	ARCENE	Accuracy48.2	60

Showing 10 of 108 rows

...

Other info

Follow for update

@wizwand_team Discord