EmbedLLM: Learning Compact Representations of Large Language Models

About

With hundreds of thousands of language models available on Huggingface today, efficiently evaluating and utilizing these models across various downstream, tasks has become increasingly critical. Many existing methods repeatedly learn task-specific representations of Large Language Models (LLMs), which leads to inefficiencies in both time and computational resources. To address this, we propose EmbedLLM, a framework designed to learn compact vector representations, of LLMs that facilitate downstream applications involving many models, such as model routing. We introduce an encoder-decoder approach for learning such embeddings, along with a systematic framework to evaluate their effectiveness. Empirical results show that EmbedLLM outperforms prior methods in model routing both in accuracy and latency. Additionally, we demonstrate that our method can forecast a model's performance on multiple benchmarks, without incurring additional inference cost. Extensive probing experiments validate that the learned embeddings capture key model characteristics, e.g. whether the model is specialized for coding tasks, even without being explicitly trained on them. We open source our dataset, code and embedder to facilitate further research and application.

Richard Zhuang, Tianhao Wu, Zhaojin Wen, Andrew Li, Jiantao Jiao, Kannan Ramchandran• 2024

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	Game of 24	Accuracy58	147
LLM Routing	MMR-Bench	nAUC0.6863	37
Multi-Task Reasoning	Average (2WikiMultiHop, MMLU, GSM8k) (in-distribution)	Accuracy73.4	29
Correctness Prediction	PIQA	Accuracy76.86	28
Multi-hop Question Answering	MoreHopQA	Accuracy65	25
Continual routing	2WikiMultiHop	Accuracy59.2	22
Continual routing	Average	Accuracy74.5	22
Continual routing	MMLU	Accuracy73.3	22
Continual routing	GSM8K	Accuracy91.2	22
Correctness Prediction	MMLU	Accuracy65.62	18

Showing 10 of 54 rows

Other info

Follow for update

@wizwand_team Discord