Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Mixture-of-Agents Enhances Large Language Model Capabilities

About

Recent advances in large language models (LLMs) demonstrate substantial capabilities in natural language understanding and generation tasks. With the growing number of LLMs, how to harness the collective expertise of multiple LLMs is an exciting open direction. Toward this goal, we propose a new approach that leverages the collective strengths of multiple LLMs through a Mixture-of-Agents (MoA) methodology. In our approach, we construct a layered MoA architecture wherein each layer comprises multiple LLM agents. Each agent takes all the outputs from agents in the previous layer as auxiliary information in generating its response. MoA models achieves state-of-art performance on AlpacaEval 2.0, MT-Bench and FLASK, surpassing GPT-4 Omni. For example, our MoA using only open-source LLMs is the leader of AlpacaEval 2.0 by a substantial gap, achieving a score of 65.1% compared to 57.5% by GPT-4 Omni.

Junlin Wang, Jue Wang, Ben Athiwaratkun, Ce Zhang, James Zou• 2024

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGSM8K (test)
Accuracy87.1
797
Mathematical ReasoningMATH
Accuracy80.08
535
Multi-turn Dialogue EvaluationMT-Bench
Overall Score9.13
331
Reading ComprehensionRACE high
Accuracy80.1
295
Instruction FollowingAlpacaEval 2.0
LC Win Rate88.56
281
Code GenerationMBPP (test)
Pass@176.8
276
Mathematical ReasoningAIME 2025
Accuracy86.7
227
Mathematical ReasoningMATH500 (full)
Accuracy89.4
111
Question AnsweringGPQA Diamond
Accuracy49.8
62
Mathematical ReasoningMATH 500
Accuracy73.6
26
Showing 10 of 23 rows

Other info

Follow for update