Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs

About

Large Language Models (LLMs) still struggle with natural language reasoning tasks. Motivated by the society of minds (Minsky, 1988), we propose ReConcile, a multi-model multi-agent framework designed as a round table conference among diverse LLM agents. ReConcile enhances collaborative reasoning between LLM agents via multiple rounds of discussion, learning to convince other agents to improve their answers, and employing a confidence-weighted voting mechanism that leads to a better consensus. In each round, ReConcile initiates discussion between agents via a 'discussion prompt' that consists of (a) grouped answers and explanations generated by each agent in the previous round, (b) their confidence scores, and (c) demonstrations of answer-rectifying human explanations, used for convincing other agents. Experiments on seven benchmarks demonstrate that ReConcile significantly improves LLMs' reasoning -- both individually and as a team -- surpassing prior single-agent and multi-agent baselines by up to 11.4% and even outperforming GPT-4 on three datasets. ReConcile also flexibly incorporates different combinations of agents, including API-based, open-source, and domain-specific models, leading to an 8% improvement on MATH. Finally, we analyze the individual components of ReConcile, demonstrating that the diversity originating from different models is critical to its superior performance. Code: https://github.com/dinobby/ReConcile

Justin Chih-Yao Chen, Swarnadeep Saha, Mohit Bansal• 2023

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K (test)
Accuracy89.8
797
Question AnsweringARC Challenge--
749
Mathematical ReasoningMATH
Accuracy50.7
643
Code GenerationMBPP (test)
Pass@177.2
276
Mathematical ReasoningAIME 2025
Accuracy70
227
Long-context Language UnderstandingLongBench
M-Avg52.55
219
Visual Question AnsweringA-OKVQA
Acc65.5
175
Science Question AnsweringARC-C--
127
Graduate-level Question AnsweringGPQA
Accuracy30.8
114
Massive Multi-discipline Multimodal UnderstandingMMMU
Accuracy63.1
88
Showing 10 of 28 rows

Other info

Follow for update