Harnessing Multiple Large Language Models: A Survey on LLM Ensemble

About

LLM Ensemble -- which involves the comprehensive use of multiple large language models (LLMs), each aimed at handling user queries during downstream inference, to benefit from their individual strengths -- has gained substantial attention recently. The widespread availability of LLMs, coupled with their varying strengths and out-of-the-box usability, has profoundly advanced the field of LLM Ensemble. This paper presents the first systematic review of recent developments in LLM Ensemble. First, we introduce our taxonomy of LLM Ensemble and discuss several related research problems. Then, we provide a more in-depth classification of the methods under the broad categories of "ensemble-before-inference, ensemble-during-inference, ensemble-after-inference'', and review all relevant methods. Finally, we introduce related benchmarks and applications, summarize existing studies, and suggest several future research directions. A curated list of papers on LLM Ensemble is available at https://github.com/junchenzhi/Awesome-LLM-Ensemble.

Zhijun Chen, Xiaodong Lu, Jingzheng Li, Pengpeng Chen, Zhuoran Li, Kai Sun, Yuankai Luo, Qianren Mao, Ming Li, Likang Xiao, Dingqi Yang, Xiao Huang, Yikun Ban, Hailong Sun, Philip S. Yu• 2025

Related benchmarks

Task	Dataset	Result
Text-to-SQL	Spider (test)	Execution Accuracy45.6	256
Word Sorting	BIG-bench Hard Word Sorting (test)	Test Accuracy41.2	26
JSON Schema Generation	JSON Schema (test)	Accuracy95.2	22

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord