Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Confidence and Dispersity Speak: Characterising Prediction Matrix for Unsupervised Accuracy Estimation

About

This work aims to assess how well a model performs under distribution shifts without using labels. While recent methods study prediction confidence, this work reports prediction dispersity is another informative cue. Confidence reflects whether the individual prediction is certain; dispersity indicates how the overall predictions are distributed across all categories. Our key insight is that a well-performing model should give predictions with high confidence and high dispersity. That is, we need to consider both properties so as to make more accurate estimates. To this end, we use the nuclear norm that has been shown to be effective in characterizing both properties. Extensive experiments validate the effectiveness of nuclear norm for various models (e.g., ViT and ConvNeXt), different datasets (e.g., ImageNet and CUB-200), and diverse types of distribution shifts (e.g., style shift and reproduction shift). We show that the nuclear norm is more accurate and robust in accuracy estimation than existing methods. Furthermore, we validate the feasibility of other measurements (e.g., mutual information maximization) for characterizing dispersity and confidence. Lastly, we investigate the limitation of the nuclear norm, study its improved variant under severe class imbalance, and discuss potential directions.

Weijian Deng, Yumin Suh, Stephen Gould, Liang Zheng• 2023

Related benchmarks

TaskDatasetResultRank
Accuracy EstimationPACS
R20.834
50
Accuracy EstimationLiving-17 Subpopulation Shift
R20.975
36
Accuracy EstimationEntity-13 Subpopulation Shift
R20.989
36
Accuracy EstimationEntity-30 Subpopulation Shift
R20.985
36
Accuracy EstimationNonliving-26 Subpopulation Shift
R20.97
36
Unsupervised Accuracy EstimationDomainNet
R^20.85
36
Unsupervised Accuracy EstimationOffice-Home
R^20.766
36
Unsupervised Accuracy EstimationRR1-WILDS
R-squared0.906
36
Accuracy EstimationTinyImageNet
MAE1.053
27
Accuracy EstimationEntity-13
MAE1.297
27
Showing 10 of 23 rows

Other info

Follow for update