Accuracy on the Line: On the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization

About

For machine learning systems to be reliable, we must understand their performance in unseen, out-of-distribution environments. In this paper, we empirically show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts. Specifically, we demonstrate strong correlations between in-distribution and out-of-distribution performance on variants of CIFAR-10 & ImageNet, a synthetic pose estimation task derived from YCB objects, satellite imagery classification in FMoW-WILDS, and wildlife classification in iWildCam-WILDS. The strong correlations hold across model architectures, hyperparameters, training set size, and training duration, and are more precise than what is expected from existing domain adaptation theory. To complete the picture, we also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS. Finally, we provide a candidate theory based on a Gaussian data model that shows how changes in the data covariance arising from distribution shift can affect the observed correlations.

John Miller, Rohan Taori, Aditi Raghunathan, Shiori Sagawa, Pang Wei Koh, Vaishaal Shankar, Percy Liang, Yair Carmon, Ludwig Schmidt• 2021

Related benchmarks

Task	Dataset	Result
Post-deployment performance monitoring	PACS Style Shift	R20.765	23
Post-deployment performance monitoring	Camelyon Institution Shift 17	R2 Score0.423	23
Fine-grained Recognition	iWildCam-WILDS 1.0 (test-ID)	Top-1 Acc77.3	15
Correlation with OOD performance	Terra Incognita (Geographic Shift)	R^20.537	11
Correlation with OOD performance	Average across PACS, Camelyon17, Terra Incognita All generalization tasks	Mean Score0.632	11

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord