A Scalable Multi-LLM Collaboration System with Retrieval-based Selection and Exploration-Exploitation-Driven Enhancement

About

Existing multi-LLM collaboration systems often encounter scalability challenges when integrating new LLMs and tasks, leading to suboptimal performance. To address this, we propose SMCS, a Scalable Multi-LLM Collaboration System designed to effectively coordinate multiple open-source LLMs. The system consists of two core components: a Retrieval-based Prior Selection (RPS) module, which dynamically selects the most suitable LLMs for each input, and an Exploration-Exploitation-Driven Posterior Enhancement (EPE) module, which fosters response diversity and selects high-quality outputs through a hybrid scoring mechanism. Experiments on eight mainstream benchmarks validate the effectiveness of our system: by integrating fifteen open-source LLMs, SMCS outperforms prevailing closed-source LLMs, e.g., GPT-4.1(+5.36%) and GPT-o3-mini(+5.28%) across multiple tasks. Remarkably, it even exceeds the average of best results on different datasets with open-source LLMs (+2.86%), significantly advancing the empirical performance frontier of open-source collaboration. The code is released at https://github.com/magent4aci/SMCS.

Shengji Tang, Jianjian Cao, Weihao Lin, Jiale Hong, Bo Zhang, Shuyue Hu, Lei Bai, Tao Chen, Wanli Ouyang, Peng Ye• 2025

Related benchmarks

Task	Dataset	Result
Medical Question Answering	MedMCQA	Accuracy76.5	591
Mathematical Reasoning	MATH 500	Top-1 Accuracy94.5	452
Reasoning	MMLU-Pro	Accuracy82.02	264
Code Generation	HumanEval	Accuracy95.12	224
Reasoning	GPQA Diamond	Accuracy65.15	185
Scientific Question Answering	GPQA Diamond	Accuracy66.16	131
Instruction Following	IFEval	Accuracy (IFEval)90	101
Mathematical Problem Solving	MATH500	Accuracy92.6	96
Science Question Answering	GPQA Diamond	Accuracy64.81	84
Code Generation	LiveCodeBench	Accuracy52.17	84

Showing 10 of 18 rows

Other info

Follow for update

@wizwand_team Discord