From Debate to Deliberation: Structured Collective Reasoning with Typed Epistemic Acts
About
Multi-agent LLM systems increasingly tackle complex reasoning, yet their interaction patterns remain limited to voting, unstructured debate, or pipeline orchestration. None model deliberation: a phased process where differentiated participants exchange typed reasoning moves, preserve disagreements, and converge on accountable outcomes. We introduce Deliberative Collective Intelligence (DCI), specifying four reasoning archetypes, 14 typed epistemic acts, a shared workspace, and DCI-CF, a convergent flow algorithm that guarantees termination with a structured decision packet containing the selected option, residual objections, minority report, and reopen conditions. We evaluate on 45 tasks across seven domains using Gemini 2.5 Flash. On non-routine tasks (n=40), DCI significantly improves over unstructured debate (+0.95, 95% CI [+0.41, +1.54]). DCI excels on hidden-profile tasks requiring perspective integration (9.56, highest of any system on any domain) while failing on routine decisions (5.39), confirming task-dependence. DCI produces 100% structured decision packets and 98% minority reports, artifacts absent from all baselines. However, DCI consumes ~62x single-agent tokens, and single-agent generation outperforms DCI on overall quality. DCI's contribution is not that more agents are better, but that consequential decisions benefit from deliberative structure when process accountability justifies the cost.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Decision Making | Deliberative decision-making tasks n=45 (overall) | Mean Tokens2.38e+5 | 5 | |
| Hidden-Profile Integration | DCI Evaluation Suite Hidden-Prof | Quality Score9.56 | 5 | |
| Process Artifact Analysis | Deliberative Decision-Making Evaluation Set | Decision Packet Completeness100 | 5 | |
| Late-Evidence Analysis | DCI Evaluation Suite Late-Evid. | Quality Score9.24 | 5 | |
| Policy Analysis | DCI Evaluation Suite Policy | Quality Score8.55 | 5 | |
| Risk Assessment | DCI Evaluation Suite Risk | Quality Score8.48 | 5 | |
| Disagreement Handling | DCI Evaluation Suite Disagree | Quality Score8.15 | 5 | |
| Reasoning evaluation | Full task set (n=45) | Overall Score8.24 | 5 | |
| Software Architecture | DCI Evaluation Suite Arch. | Quality Score8.13 | 5 | |
| Routine Task Management | DCI Evaluation Suite Routine | Quality Score5.39 | 5 |