Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Metacognitive Self-Correction for Multi-Agent System via Prototype-Guided Next-Execution Reconstruction

About

Large Language Model based multi-agent systems (MAS) excel at collaborative problem solving but remain brittle to cascading errors: a single faulty step can propagate across agents and disrupt the trajectory. In this paper, we present MASC, a metacognitive framework that endows MAS with real-time, unsupervised, step-level error detection and self-correction. MASC rethinks detection as history-conditioned anomaly scoring via two complementary designs: (1) Next-Execution Reconstruction, which predicts the embedding of the next step from the query and interaction history to capture causal consistency, and (2) Prototype-Guided Enhancement, which learns a prototype prior over normal-step embeddings and uses it to stabilize reconstruction and anomaly scoring under sparse context (e.g., early steps). When an anomaly step is flagged, MASC triggers a correction agent to revise the acting agent's output before information flows downstream. On the Who&When benchmark, MASC consistently outperforms all baselines, improving step-level error detection by up to 8.47% AUC-ROC ; When plugged into diverse MAS frameworks, it delivers consistent end-to-end gains across architectures, confirming that our metacognitive monitoring and targeted correction can mitigate error propagation with minimal overhead.

Xu Shen, Qi Zhang, Song Wang, Zhen Tan, Xinyu Zhao, Laura Yao, Vaishnav Tadiparthi, Hossein Nourkhiz Mahjoub, Ehsan Moradi Pari, Kwonjoon Lee, Tianlong Chen• 2025

Related benchmarks

TaskDatasetResultRank
Question AnsweringGAIA
Accuracy (Pass@4)9.17
22
Failure attributionWho&When Hand-Crafted
Step-level Accuracy20.79
13
Failure attributionWho&When Total
Step-level Accuracy21.62
13
Failure attributionWho&When Algorithm-Generated
Step-level Accuracy21.72
13
Error ForecastingWho&When
Eta (%)42.19
6
Tool-augmented Question AnsweringBamboogle
Accuracy51.47
4
Tool-augmented Question Answering2Wiki
Accuracy42.83
4
Tool-augmented Question AnsweringHotpotQA
Accuracy51.33
4
Tool-augmented Question AnsweringMuSiQue
Accuracy10.67
4
Tool-augmented Question AnsweringMedQA
Accuracy71
4
Showing 10 of 10 rows

Other info

Follow for update