Let's Measure Information Step-by-Step: AI-Based Evaluation Beyond Vibes

About

We evaluate artificial intelligence (AI) systems without ground truth by exploiting a link between strategic gaming and information loss. Building on established information theory, we analyze which mechanisms resist adversarial manipulation. This motivates mutual evaluation, where the overseer is treated as a strategic player estimating mutual information by prompting, making truthful agent reporting an optimal strategy. We show that certain f-divergences, such as total variation distance (TVD), maintain polynomial guarantees under attack, building on an established exponential barrier for estimating mutual information (MI) in worst-case certification settings. Under adversarial attacks, TVD-MI maintains effectiveness (area under the curve 0.70--0.77) while other approaches can decay toward chance, demonstrating that prompting the same system for information relationships rather than quality judgments can improve robustness. The mechanisms decompose pairwise evaluations into reliable item-level detection scores without ground truth, addressing a key limitation of standard peer prediction. Pre-registration: https://osf.io/c7pum .

Zachary Robertson, Sanmi Koyejo• 2025

Related benchmarks

Task	Dataset	Result
Discrimination between Good Faith and Problematic agents (Peer Review)	ICLR 20.2:1	Cohen's d1.82	6
Discrimination between Good Faith and Problematic agents (Summarization)	SamSum 4.8:1	Cohen's d6.14	6
Discrimination between Good Faith and Problematic agents (Summarization)	Multi-News 9.0:1	Cohen's d6.55	6
Discrimination between Good Faith and Problematic agents (Summarization)	BillSum 9.3:1	Cohen's d5.91	6
Discrimination between Good Faith and Problematic agents (Summarization)	CNN/Daily 13.8:1	Cohen's d5.87	6
Discrimination between Good Faith and Problematic agents (Summarization)	Reddit TIFU 16.1:1	Cohen's d7.23	6
Discrimination between Good Faith and Problematic agents (Summarization)	XSum 18.5:1	Cohen's d6.69	6
Discrimination between Good Faith and Problematic agents (Translation)	WMT14 1.1:1	Cohen's d3.32	6
Discrimination between Good Faith and Problematic agents (Summarization)	PubMed 6.7:1	Cohen's d6.53	6
Discrimination between Good Faith and Problematic agents (Translation)	Opus Books 1.3:1	Cohen's d3.08	6

Showing 10 of 20 rows

Other info

Follow for update

@wizwand_team Discord