MARS: toward more efficient multi-agent collaboration for LLM reasoning

About

Large language models (LLMs) have achieved impressive results in natural language understanding, yet their reasoning capabilities remain limited when operating as single agents. Multi-Agent Debate (MAD) has been proposed to address this limitation by enabling collaborative reasoning among multiple models in a round-table debate manner. While effective, MAD introduces substantial computational overhead due to the number of agents involved and the frequent communication required. In this paper, we propose MARS (Multi-Agent Review System), a role-based collaboration framework inspired by the review process. In MARS, an author agent generates an initial solution, reviewer agents provide decisions and comments independently, and a meta-reviewer integrates the feedback to make the final decision and guide further revision. This design enhances reasoning quality while avoiding costly reviewer-to-reviewer interactions, thereby controlling token consumption and inference time. We compared MARS with both MAD and other state-of-the-art reasoning strategies across multiple benchmarks. Extensive experiments with different LLMs show that MARS matches the accuracy of MAD while reducing both token usage and inference time by approximately 50\%. Code is available at https://github.com/xwang97/MARS.

Xiao Wang, Jia Wang, Yijie Wang, Pengtao Dang, Sha Cao, Chi Zhang• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K	Accuracy98	499
Multitask Language Understanding	MMLU	Accuracy78.63	263
Algebraic Reasoning	AQUA	Accuracy83.07	65
Graduate-Level Reasoning	GPQA	Accuracy49.49	44
Question Answering	GPQA	Accuracy60	30
Question Answering	MMLU	Accuracy85.67	30
Scientific Question Answering	GPQA	Average Inference Time (s)9.54	30
Multi-task Language Understanding	MMLU	Average Inference Time (s)7.61	30
Mathematical Reasoning	GSM8K	Average Inference Time (s)7.17	30
Aggregate Reasoning Evaluation	Multi-dataset Reasoning Suite	Average Accuracy77.55	12

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord