Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

COCO: Cognitive Operating System with Continuous Oversight for Multi-Agent Workflow Reliability

About

A critical limitation in large-scale multi-agent systems is the cascading of errors. And without intermediate verification, downstream agents exacerbate upstream inaccuracies, resulting in significant quality degradation. To bridge this gap, we introduce \textbf{COCO} (\textbf{C}ognitive \textbf{O}perating System with \textbf{C}ontinuous \textbf{O}versight), a theoretically grounded framework for asynchronous self-monitoring and adaptive error correction in multi-agent systems. COCO reconciles the fundamental tension between quality assurance and computational efficiency via a novel decoupled architecture. This design isolates error detection from the critical execution path and incorporates an automated configuration engine to minimize deployment complexity. The framework relies on three algorithmic innovations to mitigate both systematic and stochastic errors: (1) a Contextual Rollback Mechanism that leverages execution history for informed state recovery rather than naive retries; (2) a Bidirectional Reflection Protocol to ensure convergence and prevent oscillatory control loops; and (3) a Heterogeneous Cross-Validation Mechanism that utilizes ensemble disagreement to identify bias and hallucinations. Extensive experiments on diverse benchmarks demonstrate that COCO delivers a 6.5\% average performance improvement. Notably, the framework achieves 95.1\% of large-model performance with a 30$\times$ parameter reduction, confirming the potential for efficient, high-reliability deployment, and establishing COCO as a practical, annotation-based solution for critical autonomous domains.

Churong Liang, Jinling Gan, Kairan Hong, Qiushi Tian, Zongze Wu, Runnan Li• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM-Hard
Accuracy69.89
46
Code GenerationMBPP
Accuracy76.6
9
Commonsense GenerationCommonGen Hard
Accuracy84.77
9
Multi-task Language UnderstandingMMLU-Pro
Accuracy68.69
9
Showing 4 of 4 rows

Other info

Follow for update