Lizard: An Efficient Linearization Framework for Large Language Models

About

We propose Lizard, a linearization framework that transforms pretrained Transformer-based Large Language Models (LLMs) into subquadratic architectures. Transformers faces severe computational and memory bottlenecks with long sequences due to the quadratic complexity of softmax attention and the growing Key-Value (KV) cache that makes inference memory-bound by context length. Lizard addresses these limitations by introducing a subquadratic attention mechanism that closely approximates softmax attention while preserving model quality. Unlike prior linearization methods constrained by fixed, non-adaptive structures, Lizard augments the architecture with compact, learnable modules that enable adaptive memory control and robust length generalization. Moreover, we introduce a hardwareaware algorithm that solves numerical instability in gated attention to accelerate training. Extensive experiments show that Lizard achieves near-lossless recovery of its teacher model's performance, significantly outperforming previous methods by up to 9.4 - 24.5 points on the 5-shot MMLU benchmark and demonstrating superior associative recall.

Chien Van Nguyen, Huy Nguyen, Ruiyi Zhang, Hanieh Deilamsalehy, Puneet Mathur, Viet Dac Lai, Haoliang Wang, Jayakumar Subramanian, Ryan A. Rossi, Trung Bui, Nikos Vlassis, Franck Dernoncourt, Thien Huu Nguyen• 2025

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	WinoGrande	Accuracy71.7	1581
Multi-task Language Understanding	MMLU	Accuracy65.1	881
Question Answering	ARC-E	Accuracy83.1	544
Commonsense Reasoning	PIQA	Accuracy82	400
Reasoning	ARC Easy	Accuracy83.5	242
Question Answering	ARC-C	--	116
Common Sense Reasoning	PIQA	Accuracy82.2	100
Common Sense Reasoning	HellaSwag	Accuracy79.3	85
Commonsense Reasoning	PIQA 1.0 (test)	Accuracy82	64
Common Sense Reasoning	HellaSwag	Accuracy (acc_n)73.6	47

Showing 10 of 19 rows

Other info

Follow for update

@wizwand_team Discord