Are Labels Always Necessary for Classifier Accuracy Evaluation?

About

To calculate the model accuracy on a computer vision task, e.g., object recognition, we usually require a test set composing of test samples and their ground truth labels. Whilst standard usage cases satisfy this requirement, many real-world scenarios involve unlabeled test data, rendering common model evaluation methods infeasible. We investigate this important and under-explored problem, Automatic model Evaluation (AutoEval). Specifically, given a labeled training set and a classifier, we aim to estimate the classification accuracy on unlabeled test datasets. We construct a meta-dataset: a dataset comprised of datasets generated from the original images via various transformations such as rotation, background substitution, foreground scaling, etc. As the classification accuracy of the model on each sample (dataset) is known from the original dataset labels, our task can be solved via regression. Using the feature statistics to represent the distribution of a sample dataset, we can train regression models (e.g., a regression neural network) to predict model performance. Using synthetic meta-dataset and real-world datasets in training and testing, respectively, we report a reasonable and promising prediction of the model accuracy. We also provide insights into the application scope, limitation, and potential future direction of AutoEval.

Weijian Deng, Liang Zheng• 2020

Related benchmarks

Task	Dataset	Result
Object Detection	Cityscapes to Foggy Cityscapes (test)	mAP48.69	196
Object Detection	Sim10K → Cityscapes (test)	--	104
Object Detection	Pascal VOC -> Clipart (test)	mAP45.18	91
Accuracy Estimation	PACS	R20.624	50
Accuracy Estimation	Text2SQL source-target transfers Spider BIRD WikiSQL SParC CoSQL SynSQL-2.5M	MAE11.62	42
Accuracy Estimation	MNIST, USPS, SVHN, COCO, PASCAL, ImageNet source-target transfers	MAE11.44	42
Accuracy Estimation	Entity-13 Subpopulation Shift	R20.95	36
Unsupervised Accuracy Estimation	DomainNet	R^20.746	36
Accuracy Estimation	Living-17 Subpopulation Shift	R20.931	36
Unsupervised Accuracy Estimation	RR1-WILDS	R-squared0.936	36

Showing 10 of 30 rows

Other info

Follow for update

@wizwand_team Discord