Smoothie: Label Free Language Model Routing

About

Large language models (LLMs) are increasingly used in applications where LLM inputs may span many different tasks. Recent work has found that the choice of LLM is consequential, and different LLMs may be good for different input samples. Prior approaches have thus explored how engineers might select an LLM to use for each sample (i.e. routing). While existing routing methods mostly require training auxiliary models on human-annotated data, our work explores whether it is possible to perform unsupervised routing. We propose Smoothie, a weak supervision-inspired routing approach that requires no labeled data. Given a set of outputs from different LLMs, Smoothie constructs a latent variable graphical model over embedding representations of observable LLM outputs and unknown "true" outputs. Using this graphical model, we estimate sample-dependent quality scores for each LLM, and route each sample to the LLM with the highest corresponding score. We find that Smoothie's LLM quality-scores correlate with ground-truth model quality (correctly identifying the optimal model on 9/14 tasks), and that Smoothie outperforms baselines for routing by up to 10 points accuracy.

Neel Guha, Mayee F. Chen, Trevor Chow, Ishan S. Khare, Christopher R\'e• 2024

Related benchmarks

Task	Dataset	Result
Instruction Following	AlpacaEval	Win Rate34.5	420
Mathematical Reasoning	GSM8K	Accuracy (GSM8K)37.5	358
Summarization	XSum (test)	ROUGE-28.4	276
Arithmetic Reasoning	GSM8K	Accuracy91.5	272
Question Answering	TriviaQA	Accuracy68.7	238
Question Answering	SQuAD (test)	--	156
Question Answering	TriviaQA (test)	Accuracy68.3	121
Summarization	Xsum	ROUGE-212.8	108
Question Answering	SQuAD	Exact Match63.1	83
Data-to-text generation	E2E (test)	--	49

Showing 10 of 33 rows

Other info

Code

Follow for update

@wizwand_team Discord