COCO: Cognitive Operating System with Continuous Oversight for Multi-Agent Workflow Reliability

About

A critical limitation in large-scale multi-agent systems is the cascading of errors. And without intermediate verification, downstream agents exacerbate upstream inaccuracies, resulting in significant quality degradation. To bridge this gap, we introduce \textbf{COCO} (\textbf{C}ognitive \textbf{O}perating System with \textbf{C}ontinuous \textbf{O}versight), a theoretically grounded framework for asynchronous self-monitoring and adaptive error correction in multi-agent systems. COCO reconciles the fundamental tension between quality assurance and computational efficiency via a novel decoupled architecture. This design isolates error detection from the critical execution path and incorporates an automated configuration engine to minimize deployment complexity. The framework relies on three algorithmic innovations to mitigate both systematic and stochastic errors: (1) a Contextual Rollback Mechanism that leverages execution history for informed state recovery rather than naive retries; (2) a Bidirectional Reflection Protocol to ensure convergence and prevent oscillatory control loops; and (3) a Heterogeneous Cross-Validation Mechanism that utilizes ensemble disagreement to identify bias and hallucinations. Extensive experiments on diverse benchmarks demonstrate that COCO delivers a 6.5\% average performance improvement. Notably, the framework achieves 95.1\% of large-model performance with a 30$\times$ parameter reduction, confirming the potential for efficient, high-reliability deployment, and establishing COCO as a practical, annotation-based solution for critical autonomous domains.

Churong Liang, Jinling Gan, Kairan Hong, Qiushi Tian, Zongze Wu, Runnan Li• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM-Hard	Accuracy69.89	46
Question Answering	GAIA	Accuracy (Pass@4)10.76	22
Code Generation	MBPP	Accuracy76.6	9
Commonsense Generation	CommonGen Hard	Accuracy84.77	9
Multi-task Language Understanding	MMLU-Pro	Accuracy68.69	9
Tool-augmented Question Answering	Bamboogle	Accuracy52.27	4
Tool-augmented Question Answering	2Wiki	Accuracy44.83	4
Tool-augmented Question Answering	HotpotQA	Accuracy56.33	4
Tool-augmented Question Answering	MuSiQue	Accuracy14	4
Tool-augmented Question Answering	MedQA	Accuracy71.56	4

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord