Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mistake Notebook Learning: Batch-Clustered Failures for Training-Free Agent Adaptation

About

With the growing adoption of Large Language Model (LLM) agents in persistent, real-world roles, they naturally encounter continuous streams of tasks and inevitable failures. A key limitation, however, is their inability to systematically learn from these mistakes, forcing them to repeat identical errors in similar contexts. Unlike prior training-free methods that primarily store raw instance-level experience or focus on retrieving successful trajectories, we propose Mistake Notebook Learning (MNL), a novel memory framework that enables agents to self-curate generalizable guidance from batch-clustered failures. This mechanism allows agents to distill shared error patterns into structured "mistake notes," updating an external memory only when batch performance improves to ensure stability. To further amplify adaptability, we integrate MNL with test-time scaling, leveraging aggregated failure patterns to actively steer the search process away from known pitfalls. Experiments on mathematical reasoning, Text-to-SQL, and interactive agent benchmarks show that MNL achieves competitive performance compared to existing memory mechanisms and in-context methods in both effectiveness and efficiency. These findings position structured mistake abstraction as a critical lever for robust agent evolution, enabling continuous improvement without the cost of parameter updates. The code is available at https://github.com/Bairong-Xdynamics/MistakeNotebookLearning/tree/main.

Xuanbo Su, Yingfang Zhang, Hao Luo, Xiaoteng Liu, Leo Huang• 2025

Related benchmarks

TaskDatasetResultRank
Agentic task solvingAppWorld
TGC73.2
28
Text-to-SQLKaggleDBQA (test)
EA (%)64
14
Mathematical ReasoningAIME 2025
Pass@3296
12
Mathematical ReasoningAIME 2024
Pass@3293
12
Interactive agent tasksMind2Web
Task Success Rate18.86
8
Showing 5 of 5 rows

Other info

Follow for update