Debate Only When Necessary: Adaptive Multiagent Collaboration for Efficient LLM Reasoning

About

Multiagent collaboration has emerged as a promising framework for enhancing the reasoning capabilities of large language models (LLMs). Despite improvements in reasoning, the approach introduces substantial computational overhead resulting from iterative agent interactions. Furthermore, engaging in unnecessary debates increases the risk of generating erroneous responses. To address these challenges, we propose Debate Only When Necessary (DOWN), an adaptive multiagent debate framework that selectively activates debate based on the confidence score of the agent's initial response. Debate is activated only for queries requiring further deliberation, during which agents refine their outputs by referencing peer responses and associated confidence scores. Evaluations on benchmarks show that DOWN improves efficiency by up to six times while preserving or even outperforming the performance of existing methods. Further analysis indicates that DOWN effectively mitigates the risk of error propagation stemming from the unnecessary debate process. These findings demonstrate the effectiveness of our approach in delivering high-performance LLM solutions at a lower computational cost.

Sugyeong Eo, Hyeonseok Moon, Evelyn Hayoon Zi, Chanjun Park, Heuiseok Lim• 2025

Related benchmarks

Task	Dataset	Result
Multitask Language Understanding	MMLU	Accuracy84.06	263
Algebraic Reasoning	AQUA	Accuracy84.65	65
Graduate-Level Reasoning	GPQA	Accuracy51.01	44
Aggregate Reasoning Evaluation	Multi-dataset Reasoning Suite	Average Accuracy80.09	12
Commonsense Reasoning	CommonQA	Accuracy84.11	12

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord