Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Beyond Majority Voting: LLM Aggregation by Leveraging Higher-Order Information

About

With the rapid progress of multi-agent large language model (LLM) reasoning, how to effectively aggregate answers from multiple LLMs has emerged as a fundamental challenge. Standard majority voting treats all answers equally, failing to consider latent heterogeneity and correlation across models. In this work, we design two new aggregation algorithms called Optimal Weight (OW) and Inverse Surprising Popularity (ISP), leveraging both first-order and second-order information. Our theoretical analysis shows these methods provably mitigate inherent limitations of majority voting under mild assumptions, leading to more reliable collective decisions. We empirically validate our algorithms on synthetic datasets, popular LLM fine-tuning benchmarks such as UltraFeedback and MMLU, and a real-world healthcare setting ARMMAN. Our algorithms consistently outperform standard baselines, establishing a robust, training-free framework for effective multi-agent LLM aggregation.

Rui Ai, Yuqi Pan, David Simchi-Levi, Milind Tambe, Haifeng Xu• 2025

Related benchmarks

TaskDatasetResultRank
Medical Question AnsweringMedMCQA
Accuracy69.67
521
Multiple-choice Question AnsweringHellaSwag
Accuracy82.67
196
Question AnsweringMedMCQA
Accuracy63.67
98
Toxicity DetectionToxicity Detection 64 model-persona combinations (8 models x 8 personas)
Win Count50
56
Question AnsweringMMLU Pro.Med.
Accuracy92.28
42
Question AnsweringCSQA
Accuracy86
36
Question AnsweringHH-RLHF
Accuracy56.67
22
Question AnsweringMMLU Formal Logic (test)
Accuracy64.29
22
Multi-Agent ReasoningUltraFeedback
Accuracy73.66
9
Multi-Agent ReasoningARMMAN
Accuracy85.78
9
Showing 10 of 11 rows

Other info

Follow for update