MILE-RefHumEval: A Reference-Free, Multi-Independent LLM Framework for Human-Aligned Evaluation

About

We introduce MILE-RefHumEval, a reference-free framework for evaluating Large Language Models (LLMs) without ground-truth annotations or evaluator coordination. It leverages an ensemble of independently prompted evaluators guided by a human-aligned schema, supporting both discrete and continuous scoring judgement. With task-specific prompts from best candidate selection, summarization and image captioning to dialogue, MILE-RefHumEval provides flexible, interpretable, and scalable assessments. Experiments show it aligns closely with human judgments, outperforms prior methods, and reduces computational overhead, offering an efficient, robust, and human-aligned solution for real-world LLM evaluation.

Nalin Srun, Parisa Rastin, Gu\'ena\"el Cabanes, Lydia Boudjeloud Assala• 2026

Related benchmarks

Task	Dataset	Result
LLM Evaluation Performance	FairEval	Accuracy0.6375	14
LLM Evaluation	PandaLM	Accuracy78.98	12
Summarization Evaluation	SummEval	MSE0.495	8
Image Captioning Evaluation	OID Rated Image Caption	Accuracy58.91	7
Dialogue Evaluation	Amazon Topical-Chat	Naturalness (Pearson r)0.806	2

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord