Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

A Scalable Multi-LLM Collaboration System with Retrieval-based Selection and Exploration-Exploitation-Driven Enhancement

About

Existing multi-LLM collaboration systems often encounter scalability challenges when integrating new LLMs and tasks, leading to suboptimal performance. To address this, we propose SMCS, a Scalable Multi-LLM Collaboration System designed to effectively coordinate multiple open-source LLMs. The system consists of two core components: a Retrieval-based Prior Selection (RPS) module, which dynamically selects the most suitable LLMs for each input, and an Exploration-Exploitation-Driven Posterior Enhancement (EPE) module, which fosters response diversity and selects high-quality outputs through a hybrid scoring mechanism. Experiments on eight mainstream benchmarks validate the effectiveness of our system: by integrating fifteen open-source LLMs, SMCS outperforms prevailing closed-source LLMs, e.g., GPT-4.1(+5.36%) and GPT-o3-mini(+5.28%) across multiple tasks. Remarkably, it even exceeds the average of best results on different datasets with open-source LLMs (+2.86%), significantly advancing the empirical performance frontier of open-source collaboration. The code is released at https://github.com/magent4aci/SMCS.

Shengji Tang, Jianjian Cao, Weihao Lin, Jiale Hong, Bo Zhang, Shuyue Hu, Lei Bai, Tao Chen, Wanli Ouyang, Peng Ye• 2025

Related benchmarks

TaskDatasetResultRank
Medical Question AnsweringMedMCQA
Accuracy76.5
521
Mathematical ReasoningMATH 500
Top-1 Accuracy94.5
384
ReasoningMMLU-Pro
Accuracy82.02
241
Code GenerationHumanEval
Accuracy95.12
217
ReasoningGPQA Diamond
Accuracy65.15
185
Scientific Question AnsweringGPQA Diamond
Accuracy66.16
123
Instruction FollowingIFEval
Accuracy (IFEval)90
89
Code GenerationLiveCodeBench
Accuracy52.17
84
Mathematical Problem SolvingMATH500
Accuracy92.6
83
Multi-task Language UnderstandingMMLU-Pro
Accuracy82.05
64
Showing 10 of 18 rows

Other info

Follow for update