Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Assessing Generalization of SGD via Disagreement

About

We empirically show that the test error of deep networks can be estimated by simply training the same architecture on the same training set but with a different run of Stochastic Gradient Descent (SGD), and measuring the disagreement rate between the two networks on unlabeled test data. This builds on -- and is a stronger version of -- the observation in Nakkiran & Bansal '20, which requires the second run to be on an altogether fresh training set. We further theoretically show that this peculiar phenomenon arises from the \emph{well-calibrated} nature of \emph{ensembles} of SGD-trained models. This finding not only provides a simple empirical measure to directly predict the test error using unlabeled test data, but also establishes a new conceptual connection between generalization and calibration.

Yiding Jiang, Vaishnavh Nagarajan, Christina Baek, J. Zico Kolter• 2021

Related benchmarks

TaskDatasetResultRank
Image ClassificationImageNet Matched Frequency V2--
92
Accuracy EstimationPACS
R20.613
50
Unsupervised Accuracy EstimationRR1-WILDS
R-squared0.946
36
Unsupervised Accuracy EstimationDomainNet
R^20.455
36
Accuracy EstimationEntity-30 Subpopulation Shift
R20.914
36
Accuracy EstimationEntity-13 Subpopulation Shift
R20.901
36
Unsupervised Accuracy EstimationOffice-Home
R^20.132
36
Accuracy EstimationLiving-17 Subpopulation Shift
R20.652
36
Accuracy EstimationNonliving-26 Subpopulation Shift
R20.676
36
Text ClassificationSNLI
MAPE (%)2.5
6
Showing 10 of 22 rows

Other info

Follow for update