Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Test-Time Learning with an Evolving Library

About

We introduce EvoLib, a test-time learning framework that enables large language models to accumulate, reuse, and evolve knowledge across problem instances without parameter updates or external supervision. Instead of adapting model parameters, our approach maintains a shared library of knowledge abstractions, including modular skills and reflective insights, automatically extracted from the model's own inference trajectories. To support continual improvement, we introduce a principled weighting and consolidation mechanism that jointly optimizes for immediate utility and long-term value. This allows simple, instance-specific abstractions to evolve into more general and reusable ones over time. Across challenging benchmarks in mathematical reasoning, code generation, and multi-turn agentic environments, EvoLib improves substantially over the top test-time scaling and learning methods without ground-truth feedback.

Weijia Xu, Alessandro Sordoni, Chandan Singh, Zelalem Gero, Michel Galley, Xingdi Yuan, Jianfeng Gao• 2026

Related benchmarks

TaskDatasetResultRank
CodingLiveCodeBench
Accuracy70
38
MathematicsHMMT
Accuracy77.4
32
Multi-turn Agentic TaskScienceWorld
Success Rate57.4
28
Multi-turn Agentic TaskPDDL
Success Rate72.8
28
CodingBigCodeBench
Pass Rate40.8
6
Showing 5 of 5 rows

Other info

Follow for update