HGRN2: Gated Linear RNNs with State Expansion
About
Hierarchically gated linear RNN (HGRN, \citealt{HGRN}) has demonstrated competitive training speed and performance in language modeling while offering efficient inference. However, the recurrent state size of HGRN remains relatively small, limiting its expressiveness. To address this issue, we introduce a simple outer product-based state expansion mechanism, which significantly enlarges the recurrent state size without introducing any additional parameters. This enhancement also provides a linear attention interpretation for HGRN2, enabling hardware-efficient training. Our extensive experiments verify the advantage of HGRN2 over HGRN consistently across different settings and competitive with other recurrent models.
Zhen Qin, Songlin Yang, Weixuan Sun, Xuyang Shen, Dong Li, Weigao Sun, Yiran Zhong• 2024
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Commonsense Reasoning | HellaSwag | Accuracy58.7 | 1460 | |
| Image Classification | ImageNet-1K | Top-1 Acc80.12 | 836 | |
| Commonsense Reasoning | PIQA | Accuracy73 | 647 | |
| Language Modeling | WikiText | PPL14.6 | 479 | |
| Language Modeling | LAMBADA | Accuracy55.4 | 183 | |
| Commonsense Reasoning | ARC Challenge | Accuracy30.3 | 132 | |
| Common Sense Reasoning | ARC Easy | ARC (easy) Accuracy60.8 | 52 | |
| Unified Multi-task Language Understanding and Instruction Following | Open LLM Leaderboard v1 (test) | MMLU-P Accuracy11.5 | 19 | |
| Commonsense Reasoning and Knowledge Question Answering | General Ability Suite (ARC, HellaSwag, PIQA, BoolQ, WinoGrande, COPA, OBQA, SciQ) various (test) | ARC-C Accuracy28.2 | 19 | |
| Comparative Ranking | Unified Evaluation v1 (aggregate) | Average Rank6.63 | 19 |
Showing 10 of 12 rows