Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HGRN2: Gated Linear RNNs with State Expansion

About

Hierarchically gated linear RNN (HGRN, \citealt{HGRN}) has demonstrated competitive training speed and performance in language modeling while offering efficient inference. However, the recurrent state size of HGRN remains relatively small, limiting its expressiveness. To address this issue, we introduce a simple outer product-based state expansion mechanism, which significantly enlarges the recurrent state size without introducing any additional parameters. This enhancement also provides a linear attention interpretation for HGRN2, enabling hardware-efficient training. Our extensive experiments verify the advantage of HGRN2 over HGRN consistently across different settings and competitive with other recurrent models.

Zhen Qin, Songlin Yang, Weixuan Sun, Xuyang Shen, Dong Li, Weigao Sun, Yiran Zhong• 2024

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningHellaSwag
Accuracy58.7
1896
Image ClassificationImageNet-1K
Top-1 Acc80.12
1239
Commonsense ReasoningPIQA
Accuracy73
757
Language ModelingWikiText
PPL14.6
740
Language ModelingWikiText-103 (test)
Perplexity23.73
703
Language ModelingLAMBADA
Accuracy55.4
412
Language ModelingWikiText-103 (val)
PPL23.1
261
Commonsense ReasoningARC Challenge
Accuracy30.3
243
Common Sense ReasoningARC Easy
ARC (easy) Accuracy60.8
101
Unified Multi-task Language Understanding and Instruction FollowingOpen LLM Leaderboard v1 (test)
MMLU-P Accuracy11.5
19
Showing 10 of 14 rows

Other info

Follow for update