Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

The Avengers: A Simple Recipe for Uniting Smaller Language Models to Challenge Proprietary Giants

About

Proprietary giants are increasingly dominating the race for ever-larger language models. Can open-source, smaller models remain competitive across a broad range of tasks? In this paper, we present the Avengers -- a simple recipe that leverages the collective intelligence of these smaller models. The Avengers builds upon four lightweight operations: (i) embedding: encode queries using a text embedding model; (ii) clustering: group queries based on their semantic similarity; (iii) scoring: scores each model's performance within each cluster; and (iv) voting: improve outputs via repeated sampling and voting. At inference time, each query is embedded and assigned to its nearest cluster. The top-performing model(s) within that cluster are selected to generate the response with repeated sampling. Remarkably, with 10 open-source models (~7B parameters each), the Avengers surpasses GPT-4o, 4.1, and 4.5 in average performance across 15 diverse datasets spanning mathematics, coding, logical reasoning, general knowledge, and affective tasks. In particular, it surpasses GPT-4.1 on mathematics tasks by 18.21% and on code tasks by 7.46%. Furthermore, the Avengers delivers superior out-of-distribution generalization, and remains robust across various embedding models, clustering algorithms, ensemble strategies, and values of its sole parameter -- the number of clusters.

Yiqun Zhang, Hao Li, Chenxu Wang, Linyao Chen, Qiaosheng Zhang, Peng Ye, Shi Feng, Daling Wang, Zhen Wang, Xinrun Wang, Jia Xu, Lei Bai, Wanli Ouyang, Shuyue Hu• 2025

Related benchmarks

TaskDatasetResultRank
LLM RoutingMedMCQA
Top-1 Acc84.8
14
LLM RoutingMedMCQA (val)
Top-1 Acc92.7
14
LLM RoutingMMLU-Pro
Top-1 Acc78.6
14
LLM RoutingSuperGPQA
Top-1 Acc51.7
14
LLM RoutingSUPERGPQA (val)
Top-1 Acc0.537
14
LLM RoutingBBEH (val)
Top-1 Acc36.5
14
LLM RoutingAverage across Benchmarks (val)
Avg Top-1 Acc63.4
14
LLM RoutingBBEH
Top-1 Accuracy32
14
LLM RoutingMMLU-PRO (val)
Top-1 Acc78
14
LLM RoutingMMLU-PRO, SUPERGPQA, MEDMCQA, BBEH (test)
MMLU-PRO Top-1 Acc71.9
14
Showing 10 of 10 rows

Other info

Follow for update