MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

About

This paper addresses the growing need for efficient large language models (LLMs) on mobile devices, driven by increasing cloud costs and latency concerns. We focus on designing top-quality LLMs with fewer than a billion parameters, a practical choice for mobile deployment. Contrary to prevailing belief emphasizing the pivotal role of data and parameter quantity in determining model quality, our investigation underscores the significance of model architecture for sub-billion scale LLMs. Leveraging deep and thin architectures, coupled with embedding sharing and grouped-query attention mechanisms, we establish a strong baseline network denoted as MobileLLM, which attains a remarkable 2.7%/4.3% accuracy boost over preceding 125M/350M state-of-the-art models. Additionally, we propose an immediate block-wise weight-sharing approach with no increase in model size and only marginal latency overhead. The resultant models, denoted as MobileLLM-LS, demonstrate a further accuracy enhancement of 0.7%/0.8% than MobileLLM 125M/350M. Moreover, MobileLLM model family shows significant improvements compared to previous sub-billion models on chat benchmarks, and demonstrates close correctness to LLaMA-v2 7B in API calling tasks, highlighting the capability of small models for common on-device use cases.

Zechun Liu, Changsheng Zhao, Forrest Iandola, Chen Lai, Yuandong Tian, Igor Fedorov, Yunyang Xiong, Ernie Chang, Yangyang Shi, Raghuraman Krishnamoorthi, Liangzhen Lai, Vikas Chandra• 2024

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	HellaSwag	Accuracy38.7	1896
Question Answering	ARC Challenge	--	906
Commonsense Reasoning	PIQA	Accuracy68.1	757
Question Answering	ARC Easy	Accuracy41	597
Language Modeling	LAMBADA	Accuracy34.1	412
Question Answering	SciQ	--	283
Reading Comprehension	RACE	Accuracy28.7	151
Multi-task Language Understanding	MMLU	Accuracy24.1	136
Language Modeling	WikiText (val)	Perplexity32.27	62
Zero-shot Evaluation	MobileLLM Evaluation Suite zero-shot	ARC-e64.94	23

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord