Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption

About

Self-supervised contrastive representation learning has proved incredibly successful in the vision and natural language domains, enabling state-of-the-art performance with orders of magnitude less labeled data. However, such methods are domain-specific and little has been done to leverage this technique on real-world tabular datasets. We propose SCARF, a simple, widely-applicable technique for contrastive learning, where views are formed by corrupting a random subset of features. When applied to pre-train deep neural networks on the 69 real-world, tabular classification datasets from the OpenML-CC18 benchmark, SCARF not only improves classification accuracy in the fully-supervised setting but does so also in the presence of label noise and in the semi-supervised setting where only a fraction of the available training data is labeled. We show that SCARF complements existing strategies and outperforms alternatives like autoencoders. We conduct comprehensive ablations, detailing the importance of a range of factors.

Dara Bahri, Heinrich Jiang, Yi Tay, Donald Metzler• 2021

Related benchmarks

TaskDatasetResultRank
ClassificationHI
Accuracy0.56
45
Binary Classificationdresses-sales (DS) (test)
AUROC66.3
40
Binary Classificationcylinder-bands (CB) (test)
AUROC0.719
40
Binary Classificationincome IC 1995 (test)
AUROC0.905
39
Credit approval predictionCredit Approval dataset (test)
AUROC0.861
37
Aggregate Tabular BenchmarkingAggregate
Avg Rank8.56
33
Tabular ClassificationAdult (test)
AUROC91.1
28
ClassificationInfarction 10% labels
AUC0.7992
27
ClassificationCAD 1% labels
AUC75.76
27
ClassificationCAD 10% labels
AUC82.43
27
Showing 10 of 27 rows

Other info

Follow for update