HGRN2: Gated Linear RNNs with State Expansion

About

Hierarchically gated linear RNN (HGRN, \citealt{HGRN}) has demonstrated competitive training speed and performance in language modeling while offering efficient inference. However, the recurrent state size of HGRN remains relatively small, limiting its expressiveness. To address this issue, we introduce a simple outer product-based state expansion mechanism, which significantly enlarges the recurrent state size without introducing any additional parameters. This enhancement also provides a linear attention interpretation for HGRN2, enabling hardware-efficient training. Our extensive experiments verify the advantage of HGRN2 over HGRN consistently across different settings and competitive with other recurrent models.

Zhen Qin, Songlin Yang, Weixuan Sun, Xuyang Shen, Dong Li, Weigao Sun, Yiran Zhong• 2024

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	HellaSwag	Accuracy58.7	1896
Image Classification	ImageNet-1K	Top-1 Acc80.12	1239
Commonsense Reasoning	PIQA	Accuracy73	757
Language Modeling	WikiText	PPL14.6	740
Language Modeling	WikiText-103 (test)	Perplexity23.73	703
Language Modeling	LAMBADA	Accuracy55.4	412
Language Modeling	WikiText-103 (val)	PPL23.1	261
Commonsense Reasoning	ARC Challenge	Accuracy30.3	243
Common Sense Reasoning	ARC Easy	ARC (easy) Accuracy60.8	101
Unified Multi-task Language Understanding and Instruction Following	Open LLM Leaderboard v1 (test)	MMLU-P Accuracy11.5	19

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord