Mistake Notebook Learning: Batch-Clustered Failures for Training-Free Agent Adaptation

About

With the growing adoption of Large Language Model (LLM) agents in persistent, real-world roles, they naturally encounter continuous streams of tasks and inevitable failures. A key limitation, however, is their inability to systematically learn from these mistakes, forcing them to repeat identical errors in similar contexts. Unlike prior training-free methods that primarily store raw instance-level experience or focus on retrieving successful trajectories, we propose Mistake Notebook Learning (MNL), a novel memory framework that enables agents to self-curate generalizable guidance from batch-clustered failures. This mechanism allows agents to distill shared error patterns into structured "mistake notes," updating an external memory only when batch performance improves to ensure stability. To further amplify adaptability, we integrate MNL with test-time scaling, leveraging aggregated failure patterns to actively steer the search process away from known pitfalls. Experiments on mathematical reasoning, Text-to-SQL, and interactive agent benchmarks show that MNL achieves competitive performance compared to existing memory mechanisms and in-context methods in both effectiveness and efficiency. These findings position structured mistake abstraction as a critical lever for robust agent evolution, enabling continuous improvement without the cost of parameter updates. The code is available at https://github.com/Bairong-Xdynamics/MistakeNotebookLearning/tree/main.

Xuanbo Su, Yingfang Zhang, Hao Luo, Xiaoteng Liu, Leo Huang• 2025

Related benchmarks

Task	Dataset	Result
Agentic task solving	AppWorld	TGC73.2	28
Text-to-SQL	KaggleDBQA (test)	EA (%)64	14
Mathematical Reasoning	AIME 2025	Pass@3296	12
Mathematical Reasoning	AIME 2024	Pass@3293	12
Interactive agent tasks	Mind2Web	Task Success Rate18.86	8

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord