Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

EAGer: Entropy-Aware GEneRation for Adaptive Inference-Time Scaling

About

With the rise of reasoning language models and test-time scaling methods as a paradigm for improving model performance, substantial computation is often required to generate multiple candidate sequences from the same prompt. This enables exploration of different reasoning paths toward the correct solution, however, allocates the same compute budget for each prompt. Grounded on the assumption that different prompts carry different degrees of complexity, and thus different computation needs, we propose EAGer, a training-free generation method that leverages model uncertainty through token-wise entropy distribution to reduce redundant computation and concurrently improve overall performance. EAGer allows branching to multiple reasoning paths only in the presence of high-entropy tokens, and reallocates the saved compute budget to instances where exploration of alternative paths is most needed. We validate EAGer across multiple open-source models on complex reasoning benchmarks, with gains specifically demonstrated on AIME 2025. When target labels are accessible -- as in RLVR training pipelines -- EAGer achieves up to +37% in Pass@k and 59% fewer tokens; in test-time settings it still yields +12% in Pass@k and 64% fewer tokens compared to Full Parallel Sampling.

Daniel Scalena, Leonidas Zotos, Elisabetta Fersini, Malvina Nissim, Ahmet \"Ust\"un• 2025

Related benchmarks

TaskDatasetResultRank
Code GenerationHumanEval+--
393
Code Correctness PredictionMultiPL-E Java
ECE0.075
60
Code Correctness PredictionMultiPL-E Java
Brier Score0.232
60
Code Correctness PredictionLiveCodeBench Python
Brier Score0.081
60
Predicting code correctnessLiveCodeBench Python
ECE0.05
60
Code Correctness PredictionMultiPL-E Java
AUROC0.674
60
Code Correctness PredictionLiveCodeBench Python
AUROC79.7
60
Predicting code correctnessLiveSQLBench SQLite
Brier Score0.184
55
Code correctness classificationLiveSQLBench SQLite
AUROC0.712
55
Mathematical ReasoningAIME 2025
Pass@k93
12
Showing 10 of 11 rows

Other info

Follow for update