Test-Time Learning with an Evolving Library

About

We introduce EvoLib, a test-time learning framework that enables large language models to accumulate, reuse, and evolve knowledge across problem instances without parameter updates or external supervision. Instead of adapting model parameters, our approach maintains a shared library of knowledge abstractions, including modular skills and reflective insights, automatically extracted from the model's own inference trajectories. To support continual improvement, we introduce a principled weighting and consolidation mechanism that jointly optimizes for immediate utility and long-term value. This allows simple, instance-specific abstractions to evolve into more general and reusable ones over time. Across challenging benchmarks in mathematical reasoning, code generation, and multi-turn agentic environments, EvoLib improves substantially over the top test-time scaling and learning methods without ground-truth feedback.

Weijia Xu, Alessandro Sordoni, Chandan Singh, Zelalem Gero, Michel Galley, Xingdi Yuan, Jianfeng Gao• 2026

Related benchmarks

Task	Dataset	Result
Coding	LiveCodeBench	Accuracy70	40
Mathematics	HMMT	Accuracy77.4	32
Multi-turn Agentic Task	ScienceWorld	Success Rate57.4	28
Multi-turn Agentic Task	PDDL	Success Rate72.8	28
Coding	BigCodeBench	Pass Rate40.8	6

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord