DIANOIA: Diagnostic Decomposition and Joint Optimization for Multi-Agent Reasoning

About

Multi-agent LLM systems consistently outperform single-agent baselines, yet practitioners still cannot predict which design works for a new task or diagnose why one fails. We argue this gap persists largely because the field lacks a diagnostic framework with measurable primitives and testable predictions. We introduce \textbf{DIANOIA}, a three-channel decomposition of multi-agent reasoning gain into coverage, fidelity, and synthesis, each of which is empirically measurable. From this decomposition, we derive a diagnostic protocol that identifies the bottleneck channels for any given task. We instantiate the protocol as a multi-agent system whose three components mirror the channels: role-diverse proposers for coverage, execution-grounded verification for fidelity, and iterative synthesis. On GSM8K, AIME-2025, MBPP, and BFCL-SP, our method outperforms strong multi-agent baselines under matched token budgets, dominating the Pareto frontier on MBPP at $\sim$$5{\times}$ token savings and reaching $+4.6$pp at matched cost. On every benchmark, the protocol picks the right bottleneck channels; the system we built around it leads across models. We release code, adapters, diagnostic metrics, and a Claude Code skill at https://anonymous.4open.science/r/DIANOIA4MAS. DIANOIA reframes multi-agent design as channel-aware resource allocation: diagnose which channel is the bottleneck for your task, then invest tokens accordingly.

Yiming Yang, Zhuoyuan Li, Fanxiang Zeng, Hao Fu, Yue Liu• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K (test)	Accuracy91.1	954
Code Generation	MBPP (test)	Pass@184.6	411
Mathematical Reasoning	AIME 2025	Accuracy93.3	227
Function Calling	BFCL Simple Python	Accuracy0.923	20

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord