Towards Modular LLMs by Building and Reusing a Library of LoRAs

About

The growing number of parameter-efficient adaptations of a base large language model (LLM) calls for studying whether we can reuse such trained adapters to improve performance for new tasks. We study how to best build a library of adapters given multi-task data and devise techniques for both zero-shot and supervised task generalization through routing in such library. We benchmark existing approaches to build this library and introduce model-based clustering, MBC, a method that groups tasks based on the similarity of their adapter parameters, indirectly optimizing for transfer across the multi-task dataset. To re-use the library, we present a novel zero-shot routing mechanism, Arrow, which enables dynamic selection of the most relevant adapters for new inputs without the need for retraining. We experiment with several LLMs, such as Phi-2 and Mistral, on a wide array of held-out tasks, verifying that MBC-based adapters and Arrow routing lead to superior generalization to new tasks. We make steps towards creating modular, adaptable LLMs that can match or outperform traditional joint training.

Oleksiy Ostapenko, Zhan Su, Edoardo Maria Ponti, Laurent Charlin, Nicolas Le Roux, Matheus Pereira, Lucas Caccia, Alessandro Sordoni• 2024

Related benchmarks

Task	Dataset	Result
Reasoning	BBH	Accuracy54.75	726
Time Series Forecasting	ETTm2	--	536
Reading Comprehension	BoolQ	Accuracy81.16	279
Science Question Answering	ARC-C	Accuracy54.84	261
Reasoning	ARC	Accuracy53.85	245
Science Question Answering	ARC-E	Accuracy83.38	240
Reasoning	ARC Easy	Accuracy80.53	233
Reasoning	HellaSwag (HS)	HellaSwag Accuracy71.89	209
Reasoning	WinoGrande (WG)	Accuracy65.98	168
Short-term forecasting	M4 Quarterly	MASE7.975	166

Showing 10 of 37 rows

Other info

Follow for update

@wizwand_team Discord