ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs

About

Large Language Models (LLMs) still struggle with natural language reasoning tasks. Motivated by the society of minds (Minsky, 1988), we propose ReConcile, a multi-model multi-agent framework designed as a round table conference among diverse LLM agents. ReConcile enhances collaborative reasoning between LLM agents via multiple rounds of discussion, learning to convince other agents to improve their answers, and employing a confidence-weighted voting mechanism that leads to a better consensus. In each round, ReConcile initiates discussion between agents via a 'discussion prompt' that consists of (a) grouped answers and explanations generated by each agent in the previous round, (b) their confidence scores, and (c) demonstrations of answer-rectifying human explanations, used for convincing other agents. Experiments on seven benchmarks demonstrate that ReConcile significantly improves LLMs' reasoning -- both individually and as a team -- surpassing prior single-agent and multi-agent baselines by up to 11.4% and even outperforming GPT-4 on three datasets. ReConcile also flexibly incorporates different combinations of agents, including API-based, open-source, and domain-specific models, leading to an 8% improvement on MATH. Finally, we analyze the individual components of ReConcile, demonstrating that the diversity originating from different models is critical to its superior performance. Code: https://github.com/dinobby/ReConcile

Justin Chih-Yao Chen, Swarnadeep Saha, Mohit Bansal• 2023

Related benchmarks

Task	Dataset	Result
Code Generation	HumanEval	--	1043
Mathematical Reasoning	GSM8K (test)	Accuracy89.8	954
Question Answering	ARC Challenge	--	906
Mathematical Reasoning	MATH	Accuracy50.7	882
Medical Question Answering	MedMCQA	Accuracy60.74	521
Code Generation	MBPP (test)	Pass@177.2	405
Long-context Language Understanding	LongBench	M-Avg52.55	294
Science Question Answering	ARC-C	--	261
Reasoning	MMLU-Pro	Accuracy44.19	241
Visual Question Answering	A-OKVQA	Acc65.5	228

Showing 10 of 100 rows

...

Other info

Follow for update

@wizwand_team Discord