Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Energy-based Automated Model Evaluation

About

The conventional evaluation protocols on machine learning models rely heavily on a labeled, i.i.d-assumed testing dataset, which is not often present in real world applications. The Automated Model Evaluation (AutoEval) shows an alternative to this traditional workflow, by forming a proximal prediction pipeline of the testing performance without the presence of ground-truth labels. Despite its recent successes, the AutoEval frameworks still suffer from an overconfidence issue, substantial storage and computational cost. In that regard, we propose a novel measure -- Meta-Distribution Energy (MDE) -- that allows the AutoEval framework to be both more efficient and effective. The core of the MDE is to establish a meta-distribution statistic, on the information (energy) associated with individual samples, then offer a smoother representation enabled by energy-based learning. We further provide our theoretical insights by connecting the MDE with the classification loss. We provide extensive experiments across modalities, datasets and different architectural backbones to validate MDE's validity, together with its superiority compared with prior approaches. We also prove MDE's versatility by showing its seamless integration with large-scale models, and easy adaption to learning scenarios with noisy- or imbalanced- labels. Code and data are available: https://github.com/pengr/Energy_AutoEval

Ru Peng, Heming Zou, Haobo Wang, Yawen Zeng, Zenan Huang, Junbo Zhao• 2024

Related benchmarks

TaskDatasetResultRank
Accuracy EstimationPACS
R20.18
50
Unsupervised Accuracy EstimationRR1-WILDS
R-squared0.954
36
Accuracy EstimationNonliving-26 Subpopulation Shift
R20.929
36
Accuracy EstimationLiving-17 Subpopulation Shift
R20.927
36
Accuracy EstimationEntity-30 Subpopulation Shift
R20.931
36
Accuracy EstimationEntity-13 Subpopulation Shift
R20.927
36
Unsupervised Accuracy EstimationDomainNet
R^20.52
36
Unsupervised Accuracy EstimationOffice-Home
R^20.342
36
Accuracy EstimationCIFAR-10
MAE0.846
27
Accuracy EstimationCIFAR-100
MAE0.846
27
Showing 10 of 23 rows

Other info

Follow for update