More Agents Is All You Need
About
We find that, simply via a sampling-and-voting method, the performance of large language models (LLMs) scales with the number of agents instantiated. Also, this method, termed as Agent Forest, is orthogonal to existing complicated methods to further enhance LLMs, while the degree of enhancement is correlated to the task difficulty. We conduct comprehensive experiments on a wide range of LLM benchmarks to verify the presence of our finding, and to study the properties that can facilitate its occurrence. Our code is publicly available at: https://github.com/MoreAgentsIsAllYouNeed/AgentForest
Junyou Li, Qin Zhang, Yangbin Yu, Qiang Fu, Deheng Ye• 2024
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Instruction Following | AlpacaEval | Win Rate40.5 | 420 | |
| Multi-task Language Understanding | MMLU | Accuracy90.47 | 353 | |
| Text Classification | AG News (test) | Accuracy82.47 | 293 | |
| Arithmetic Reasoning | GSM8K | Accuracy86.8 | 272 | |
| Commonsense Reasoning | CSQA | CSQA Accuracy87.63 | 195 | |
| Arithmetic Reasoning | GSM8K (test) | Accuracy77.4 | 189 | |
| Text Classification | TREC (test) | Accuracy73.2 | 122 | |
| Question Answering | ScienceQA | Accuracy71.53 | 96 | |
| Mathematical Reasoning | MAWPS (test) | Accuracy92.4 | 87 | |
| Multi-task Language Understanding | MMLU (test) | Normalized Accuracy60.92 | 87 |
Showing 10 of 30 rows