Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

WILDS: A Benchmark of in-the-Wild Distribution Shifts

About

Distribution shifts -- where the training distribution differs from the test distribution -- can substantially degrade the accuracy of machine learning (ML) systems deployed in the wild. Despite their ubiquity in the real-world deployments, these distribution shifts are under-represented in the datasets widely used in the ML community today. To address this gap, we present WILDS, a curated benchmark of 10 datasets reflecting a diverse range of distribution shifts that naturally arise in real-world applications, such as shifts across hospitals for tumor identification; across camera traps for wildlife monitoring; and across time and location in satellite imaging and poverty mapping. On each dataset, we show that standard training yields substantially lower out-of-distribution than in-distribution performance. This gap remains even with models trained by existing methods for tackling distribution shifts, underscoring the need for new methods for training models that are more robust to the types of distribution shifts that arise in practice. To facilitate method development, we provide an open-source package that automates dataset loading, contains default model architectures and hyperparameters, and standardizes evaluations. Code and leaderboards are available at https://wilds.stanford.edu.

Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, Tony Lee, Etienne David, Ian Stavness, Wei Guo, Berton A. Earnshaw, Imran S. Haque, Sara Beery, Jure Leskovec, Anshul Kundaje, Emma Pierson, Sergey Levine, Chelsea Finn, Percy Liang• 2020

Related benchmarks

TaskDatasetResultRank
Domain GeneralizationDomainBed (test)
VLCS Accuracy77.4
110
Wildlife Species ClassificationWILDS-iWildCam ID (test)
Macro F155.8
23
Toxicity DetectionCivilComments-WILDS (test)
Average Accuracy92.2
19
Multiclass ClassificationiWildCam WILDS 2.0 (OOD)
Macro F141.4
17
Animal species (186 classes). Domains: 324 camera locations.iWildCam WILDS (test)
Macro F10.278
17
Animal species (186 classes). Domains: 324 camera locations.iWildCam-WILDS (val)
Accuracy62.7
15
Tumor DetectionCamelyon17-WILDS (test)
Accuracy73.3
14
Species ClassificationiWildCam WILDS 2020 (test)
Accuracy71.6
14
Medical Image ClassificationCamelyon17 OOD WILDS (test)
Average Accuracy70.8
13
Sentiment AnalysisAMAZON-WILDS (test)
10th Percentile Accuracy56
12
Showing 10 of 20 rows

Other info

Code

Follow for update