The Avengers: A Simple Recipe for Uniting Smaller Language Models to Challenge Proprietary Giants
About
Proprietary giants are increasingly dominating the race for ever-larger language models. Can open-source, smaller models remain competitive across a broad range of tasks? In this paper, we present the Avengers -- a simple recipe that leverages the collective intelligence of these smaller models. The Avengers builds upon four lightweight operations: (i) embedding: encode queries using a text embedding model; (ii) clustering: group queries based on their semantic similarity; (iii) scoring: scores each model's performance within each cluster; and (iv) voting: improve outputs via repeated sampling and voting. At inference time, each query is embedded and assigned to its nearest cluster. The top-performing model(s) within that cluster are selected to generate the response with repeated sampling. Remarkably, with 10 open-source models (~7B parameters each), the Avengers surpasses GPT-4o, 4.1, and 4.5 in average performance across 15 diverse datasets spanning mathematics, coding, logical reasoning, general knowledge, and affective tasks. In particular, it surpasses GPT-4.1 on mathematics tasks by 18.21% and on code tasks by 7.46%. Furthermore, the Avengers delivers superior out-of-distribution generalization, and remains robust across various embedding models, clustering algorithms, ensemble strategies, and values of its sole parameter -- the number of clusters.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| LLM Routing | MedMCQA | Top-1 Acc84.8 | 14 | |
| LLM Routing | MedMCQA (val) | Top-1 Acc92.7 | 14 | |
| LLM Routing | MMLU-Pro | Top-1 Acc78.6 | 14 | |
| LLM Routing | SuperGPQA | Top-1 Acc51.7 | 14 | |
| LLM Routing | SUPERGPQA (val) | Top-1 Acc0.537 | 14 | |
| LLM Routing | BBEH (val) | Top-1 Acc36.5 | 14 | |
| LLM Routing | Average across Benchmarks (val) | Avg Top-1 Acc63.4 | 14 | |
| LLM Routing | BBEH | Top-1 Accuracy32 | 14 | |
| LLM Routing | MMLU-PRO (val) | Top-1 Acc78 | 14 | |
| LLM Routing | MMLU-PRO, SUPERGPQA, MEDMCQA, BBEH (test) | MMLU-PRO Top-1 Acc71.9 | 14 |