Learning to Evaluate: Cost-Effective Model Evaluation on Unlabeled Data with Meta-Learning

About

The rapid advancement of machine learning has led to an unprecedented expansion of model ecosystems, making it increasingly difficult to assess the reliability of newly released models on unseen and unlabeled data. Existing evaluation pipelines typically rely on costly annotation, repeated fine-tuning, or assumptions that do not generalize well to new models. We introduce MetaEvaluator, a cost-effective, model-agnostic framework for fast, label-free evaluation of unseen models across diverse architectures and modalities. MetaEvaluator meta-learns over a pool of reference models to acquire an effective initialization for accurate assessment of unseen models, thereby amortizing evaluation cost and eliminating the need for per-model retraining. To the best of our knowledge, this is the first model-agnostic framework that evaluates new models on unlabeled datasets. Extensive experiments demonstrate that MetaEvaluator delivers stable and accurate performance estimates at substantially lower cost than conventional approaches, enabling scalable benchmarking on unlabeled datasets for emerging models. The code is available at: https://github.com/phkhanhtrinh23/MetaEvaluator.

Trinh Pham, Viet Huynh, Hongzhi Yin, Quoc Viet Hung Nguyen, Thanh Tam Nguyen• 2026

Related benchmarks

Task	Dataset	Result	Rank
Accuracy Estimation	Text2SQL source-target transfers Spider BIRD WikiSQL SParC CoSQL SynSQL-2.5M	MAE3.41		42
Accuracy Estimation	MNIST, USPS, SVHN, COCO, PASCAL, ImageNet source-target transfers	MAE3.58		42

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord