DABS: A Domain-Agnostic Benchmark for Self-Supervised Learning

About

Self-supervised learning algorithms, including BERT and SimCLR, have enabled significant strides in fields like natural language processing, computer vision, and speech processing. However, these algorithms are domain-specific, meaning that new self-supervised learning algorithms must be developed for each new setting, including myriad healthcare, scientific, and multimodal domains. To catalyze progress toward domain-agnostic methods, we introduce DABS: a Domain-Agnostic Benchmark for Self-supervised learning. To perform well on DABS, an algorithm is evaluated on seven diverse domains: natural images, multichannel sensor data, English text, speech recordings, multilingual text, chest x-rays, and images with text descriptions. Each domain contains an unlabeled dataset for pretraining; the model is then is scored based on its downstream performance on a set of labeled tasks in the domain. We also present e-Mix and ShED: two baseline domain-agnostic algorithms; their relatively modest performance demonstrates that significant progress is needed before self-supervised learning is an out-of-the-box solution for arbitrary domains. Code for benchmark datasets and baseline algorithms is available at https://github.com/alextamkin/dabs.

Alex Tamkin, Vincent Liu, Rongfei Lu, Daniel Fein, Colin Schultz, Noah Goodman• 2021

Related benchmarks

Task	Dataset	Result
Image Classification	Aircraft	Accuracy2.7	340
Classification	CUB	Accuracy1.6	93
Visual Question Answering	VQA	Accuracy53.4	66
Classification	DTD	Accuracy7.4	50
Classification	Google commands	Accuracy4.9	17
Regression	Stability	Spearman Correlation0.31	12
Classification	Audio MNIST	Accuracy33.1	12
Classification	VGG-Flowers	Accuracy9	12
Classification	Fluent Loc	Accuracy62.1	6
Classification	SCOP	Accuracy8	6

Showing 10 of 32 rows

Other info

Code

Follow for update

@wizwand_team Discord